XGBoost uses BLAS (Basic Linear Algebra Subprograms) for some computations, which can utilize multiple threads.
XGBoost uses multiple BLAS threads automatically.
The number of BLAS threads can impact model training and prediction performance, especially when the default number of BLAS threads is not configured to the number of logical or physical CPU cores in the system, or when your system is performing other tasks at the same time as XGBoost is used to make a prediction.
This example demonstrates how to change the number of BLAS threads using the 'OMP_NUM_THREADS'
environment variable and compares prediction performance on a large test set.
This environment variable must be set before it is used. This can be achieved by setting the variable in the first lines of the program.
import os
# Set the number of OpenMP threads
os.environ['OMP_NUM_THREADS'] = '1'
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
import time
# Generate a large synthetic dataset
X, y = make_classification(n_samples=100000, n_features=100, random_state=42)
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Configure XGBoost model
model = XGBClassifier(n_estimators=100, random_state=42)
# Train and make predictions and measure time
start_time = time.perf_counter()
model.fit(X_train, y_train)
_ = model.predict(X_test)
end_time = time.perf_counter()
duration = end_time - start_time
# Print the prediction time
print(f"Time with {os.environ['OMP_NUM_THREADS']} threads: {duration:.2f} seconds")
Running this example with different values for "OMP_NUM_THREADS"
will produce results similar to the following:
Time with 1 threads: 5.74 seconds
Time with 2 threads: 3.23 seconds
Time with 3 threads: 2.52 seconds
Time with 4 threads: 2.13 seconds
...
Time with 8 threads: 2.13 seconds
In this example:
- The
OMP_NUM_THREADS
environment variable is set, specifying that XGBoost should use the given number threads. - We generate a large synthetic classification dataset using scikit-learn’s
make_classification
function. - The data is split into training and testing sets using
train_test_split
. - An
XGBClassifier
model is instantiated and trained on the training data. - Predictions are made on the test set.
- Finally, the train and prediction time is printed, along with the number of threads used.
This example demonstrates how the number of BLAS threads used by XGBoost can impact parallel training and prediction performance.
By tuning the BLAS 'OMP_NUM_THREADS'
environment variable, you may be able to find a sweet spot that optimizes speed for your particular setup.
For optimal performance, consider setting the 'OMP_NUM_THREADS'
environment variable to the number of logical or number of physical CPU cores in the system.