XGBoost utilizes Open Multi-Processing (OpenMP) for parallelization during training and prediction, which can significantly impact performance.
By default, XGBoost will use all available threads, but you can control this behavior by setting the OMP_NUM_THREADS
environment variable.
Adjusting the number of threads can be particularly useful when you want to optimize XGBoost’s performance on a specific system or when you need to balance resource usage with other processes.
This example demonstrates how to set the number of OpenMP threads used by XGBoost and measures the effect on prediction time.
This environment variable must be set before it is used. This can be achieved by setting the variable in the first lines of the program.
import os
# Set the number of OpenMP threads
os.environ['OMP_NUM_THREADS'] = '1'
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
import time
# Generate a large synthetic dataset
X, y = make_classification(n_samples=100000, n_features=100, random_state=42)
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Configure XGBoost model
model = XGBClassifier(n_estimators=100, random_state=42)
# Train and make predictions and measure time
start_time = time.perf_counter()
model.fit(X_train, y_train)
_ = model.predict(X_test)
end_time = time.perf_counter()
duration = end_time - start_time
# Print the prediction time
print(f"Time with {os.environ['OMP_NUM_THREADS']} threads: {duration:.2f} seconds")
Running this example with different values for "OMP_NUM_THREADS"
will produce results similar to the following:
Time with 1 threads: 5.74 seconds
Time with 2 threads: 3.23 seconds
Time with 3 threads: 2.52 seconds
Time with 4 threads: 2.13 seconds
...
Time with 8 threads: 2.13 seconds
In this example:
- The
OMP_NUM_THREADS
environment variable is set, specifying that XGBoost should use the given number threads. - We generate a large synthetic classification dataset using scikit-learn’s
make_classification
function. - The data is split into training and testing sets using
train_test_split
. - An
XGBClassifier
model is instantiated and trained on the training data. - Predictions are made on the test set.
- Finally, the train and prediction time is printed, along with the number of threads used.
By experimenting with different values for OMP_NUM_THREADS
, you can find the optimal number of threads for your specific system and workload.
Keep in mind that the ideal number of threads may vary depending on factors such as the number of cores, the presence of other running processes, and the size and complexity of your dataset.