XGBoost Releases GIL During Inference (prediction)

Not only does XGBoost efficiently manage Python’s Global Interpreter Lock (GIL) during training, but it also does so during inference and prediction.

This means you can get significant speedups when making predictions with pre-trained XGBoost models using Python’s ThreadPoolExecutor, compared to predicting with each model sequentially.

This example demonstrates the performance gain of predicting with multiple XGBoost models in parallel threads versus predicting with them one by one.

import os
os.environ['OMP_NUM_THREADS'] = '1'  # Avoid contention between threads
import numpy as np
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
from concurrent.futures import ThreadPoolExecutor
import time

# Generate a synthetic dataset for a classification problem
X, y = make_classification(n_samples=1000000, n_features=20, random_state=42)

# Train multiple XGBoost models with different hyperparameters
models = [
    XGBClassifier(n_estimators=100, max_depth=3, learning_rate=0.1, n_jobs=1).fit(X, y),
    XGBClassifier(n_estimators=200, max_depth=4, learning_rate=0.05, n_jobs=1).fit(X, y),
    XGBClassifier(n_estimators=150, max_depth=5, learning_rate=0.08, n_jobs=1).fit(X, y),
    XGBClassifier(n_estimators=180, max_depth=3, learning_rate=0.12, n_jobs=1).fit(X, y),
]

# Define a function to make predictions with a single model
def predict_with_model(model):
    model.predict(X)

# Make predictions sequentially
def predict_sequential():
    for model in models:
        predict_with_model(model)

# Make predictions in parallel using ThreadPoolExecutor
def predict_parallel():
    with ThreadPoolExecutor(max_workers=4) as executor:
        executor.map(predict_with_model, models)

# Time the sequential predictions
start_sequential = time.perf_counter()
predict_sequential()
end_sequential = time.perf_counter()
print(f"Sequential prediction time: {end_sequential - start_sequential:.2f} seconds")

# Time the parallel predictions
start_parallel = time.perf_counter()
predict_parallel()
end_parallel = time.perf_counter()
print(f"Parallel prediction time: {end_parallel - start_parallel:.2f} seconds")

# Print the speedup achieved with parallel prediction
speedup = (end_sequential - start_sequential) / (end_parallel - start_parallel)
print(f"Parallel prediction is {speedup:.2f} times faster than sequential prediction")

Running this code, you might see output similar to:

Sequential prediction time: 6.17 seconds
Parallel prediction time: 2.17 seconds
Parallel prediction is 2.85 times faster than sequential prediction

Here’s what’s happening in the code:

We set OMP_NUM_THREADS to 1 to avoid contention between processes.
We generate a synthetic dataset using make_classification from scikit-learn.
We train multiple XGBoost models with different hyperparameters.
We define predict_with_model, a function that takes a model and makes predictions on the dataset.
We define predict_sequential to make predictions with each model sequentially.
We define predict_parallel to make predictions with the models concurrently using ThreadPoolExecutor.
We time the sequential and parallel predictions and print the execution times.
We print the speedup achieved with parallel prediction.

The key observation is that parallel prediction is significantly faster than sequential prediction. This speedup is possible because XGBoost releases the GIL when executing the computationally intensive prediction code in its native backend, allowing the Python threads to run concurrently.

The exact speedup factor will depend on your system’s specifics and the complexity of your models. However, this example clearly demonstrates XGBoost’s ability to efficiently leverage multi-threading for inference and prediction, even when used through the Python API. This can lead to substantial performance improvements, especially when working with many models or large datasets.

See Also