XGBoost is a powerful and efficient library for gradient boosting, but did you know that it also smartly manages Python’s Global Interpreter Lock (GIL) during training?
The GIL is a mechanism in Python that prevents multiple native threads from executing Python bytecodes at once, which can limit parallelism.
However, XGBoost is designed to release the GIL when performing computationally intensive tasks in its native code, allowing it to efficiently utilize multiple threads for training even when called from Python.
This example demonstrates how training XGBoost models in parallel using Python’s ThreadPoolExecutor
can be significantly faster than training them sequentially, thanks to XGBoost’s ability to release the GIL.
import os
os.environ['OMP_NUM_THREADS'] = '1' # Avoid contention between processes
import numpy as np
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
from concurrent.futures import ThreadPoolExecutor
import time
# Generate a synthetic dataset for a classification problem
X, y = make_classification(n_samples=100000, n_features=20, random_state=42)
# Define a function to train a single XGBoost model
def train_model(params):
model = XGBClassifier(**params)
model.fit(X, y)
# Create a list of different hyperparameter configurations
param_sets = [
{'n_estimators': 100, 'max_depth': 3, 'learning_rate': 0.1, 'n_jobs': 1},
{'n_estimators': 200, 'max_depth': 4, 'learning_rate': 0.05, 'n_jobs': 1},
{'n_estimators': 150, 'max_depth': 5, 'learning_rate': 0.08, 'n_jobs': 1},
{'n_estimators': 180, 'max_depth': 3, 'learning_rate': 0.12, 'n_jobs': 1},
]
# Train models sequentially
def train_sequential():
for params in param_sets:
train_model(params)
# Train models in parallel using ThreadPoolExecutor
def train_parallel():
with ThreadPoolExecutor(max_workers=4) as executor:
executor.map(train_model, param_sets)
# Time the sequential training
start_sequential = time.perf_counter()
train_sequential()
end_sequential = time.perf_counter()
print(f"Sequential training time: {end_sequential - start_sequential:.2f} seconds")
# Time the parallel training
start_parallel = time.perf_counter()
train_parallel()
end_parallel = time.perf_counter()
print(f"Parallel training time: {end_parallel - start_parallel:.2f} seconds")
# Print the speedup achieved with parallel training
speedup = (end_sequential - start_sequential) / (end_parallel - start_parallel)
print(f"Parallel training is {speedup:.2f} times faster than sequential training")
Running this code, you might see output like:
Sequential training time: 5.27 seconds
Parallel training time: 2.07 seconds
Parallel training is 2.54 times faster than sequential training
The exact speedup factor will depend on your system’s specifics.
Here’s what’s happening in the code:
- We set
OMP_NUM_THREADS
to 1 to avoid contention between processes. - We generate a synthetic dataset for a classification problem using
make_classification
from scikit-learn. - We define
train_model
, a function that takes hyperparameters and trains an XGBoost model. - We create
param_sets
, a list of hyperparameter configurations to use for training. - We define
train_sequential
to train the models one after the other. - We define
train_parallel
to train the models concurrently usingThreadPoolExecutor
. - We time the sequential and parallel training and print the execution times.
- We print the speedup achieved with parallel training.
The key takeaway is that parallel training is significantly faster than sequential training, even though we’re using Python threads. This is possible because XGBoost releases the GIL when doing the heavy lifting in its native code, allowing true parallelism.
Keep in mind that the specific speedup will vary depending on factors like your CPU, the size of your dataset, and the hyperparameters used. Nonetheless, this example showcases XGBoost’s ability to efficiently leverage multi-threading even in a Python context.