XGBoost Configure "OMP_NUM_THREADS" for Inference

When using XGBoost for inference, the performance can be significantly improved by setting the OMP_NUM_THREADS environment variable.

This variable controls the number of threads used by OpenMP during XGBoost’s inference phase. By setting it to the number of logical or physical CPU cores available on your system, you can potentially speed up the inference process.

This environment variable must be set before it is used. This can be achieved by setting the variable in the first lines of the program.

This example demonstrates how to benchmark XGBoost inference for a given OMP_NUM_THREADS value.

import os
os.environ["OMP_NUM_THREADS"] = "1"
import time
from xgboost import XGBClassifier
from sklearn.datasets import make_classification

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000000, n_features=20, random_state=42)

# Train an XGBClassifier model
model = XGBClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

# Benchmark time taken for inference
start_time = time.perf_counter()
_ = model.predict(X)
end_time = time.perf_counter()
duration = end_time - start_time

# Report time taken
print(f'OMP_NUM_THREADS={os.environ["OMP_NUM_THREADS"]}: {duration:3f}')

Running the above example for different values for the OMP_NUM_THREADS may produce results as follows:

OMP_NUM_THREADS=1: 1.791561
OMP_NUM_THREADS=2: 0.917487
OMP_NUM_THREADS=4: 0.517983
OMP_NUM_THREADS=8: 0.394388

In this example, we:

Generate a synthetic dataset using sklearn.datasets.make_classification.
Train an XGBClassifier model on the dataset.
Make predictions for the entire training dataset.
Print the execution time result, showing the number of threads and time taken

Generally setting the OMP_NUM_THREADS to the number of logical CPU cores in your system (the default) gives the best results for inference with a single XGBoost model.

The actual results will depend on your system’s specifications and the dataset used. Experiment with different OMP_NUM_THREADS settings to find the optimal configuration for your specific use case.

By setting OMP_NUM_THREADS to the number of physical or logical cores, you can potentially achieve faster inference times compared to the default setting. This can be particularly beneficial when dealing with large datasets or when real-time inference performance is critical.

See Also