When using XGBoost for inference, the performance can be significantly improved by setting the OMP_NUM_THREADS
environment variable.
This variable controls the number of threads used by OpenMP during XGBoost’s inference phase. By setting it to the number of logical or physical CPU cores available on your system, you can potentially speed up the inference process.
This environment variable must be set before it is used. This can be achieved by setting the variable in the first lines of the program.
This example demonstrates how to benchmark XGBoost inference for a given OMP_NUM_THREADS
value.
import os
os.environ["OMP_NUM_THREADS"] = "1"
import time
from xgboost import XGBClassifier
from sklearn.datasets import make_classification
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000000, n_features=20, random_state=42)
# Train an XGBClassifier model
model = XGBClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
# Benchmark time taken for inference
start_time = time.perf_counter()
_ = model.predict(X)
end_time = time.perf_counter()
duration = end_time - start_time
# Report time taken
print(f'OMP_NUM_THREADS={os.environ["OMP_NUM_THREADS"]}: {duration:3f}')
Running the above example for different values for the OMP_NUM_THREADS
may produce results as follows:
OMP_NUM_THREADS=1: 1.791561
OMP_NUM_THREADS=2: 0.917487
OMP_NUM_THREADS=4: 0.517983
OMP_NUM_THREADS=8: 0.394388
In this example, we:
- Generate a synthetic dataset using
sklearn.datasets.make_classification
. - Train an
XGBClassifier
model on the dataset. - Make predictions for the entire training dataset.
- Print the execution time result, showing the number of threads and time taken
Generally setting the OMP_NUM_THREADS
to the number of logical CPU cores in your system (the default) gives the best results for inference with a single XGBoost model.
The actual results will depend on your system’s specifications and the dataset used. Experiment with different OMP_NUM_THREADS
settings to find the optimal configuration for your specific use case.
By setting OMP_NUM_THREADS
to the number of physical or logical cores, you can potentially achieve faster inference times compared to the default setting. This can be particularly beneficial when dealing with large datasets or when real-time inference performance is critical.