Training multiple XGBoost models on different datasets or with different hyperparameters can be time-consuming when done sequentially.
However, by leveraging Python’s ThreadPoolExecutor
, you can train multiple models in parallel, potentially reducing the overall training time significantly.
To achieve optimal parallelization, it’s crucial to disable BLAS threads and set n_jobs=1
(or a small number) for each model to avoid contention.
This example demonstrates how to train multiple XGBoost models in parallel using ThreadPoolExecutor
and compares the execution time against sequential training.
import os
# Avoid contention between processes
os.environ['OMP_NUM_THREADS'] = '1'
import numpy as np
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
from concurrent.futures import ThreadPoolExecutor
import time
# List of hyperparameter configurations
def get_params(n_jobs):
return [
{'n_estimators': 100, 'max_depth': 3, 'learning_rate': 0.1, 'n_jobs': n_jobs},
{'n_estimators': 200, 'max_depth': 4, 'learning_rate': 0.05, 'n_jobs': n_jobs},
{'n_estimators': 150, 'max_depth': 5, 'learning_rate': 0.08, 'n_jobs': n_jobs},
{'n_estimators': 180, 'max_depth': 3, 'learning_rate': 0.12, 'n_jobs': n_jobs},
]
# Train single XGBoost model
def train_model(params):
global X, y
model = XGBClassifier(**params)
model.fit(X, y)
# Sequential model training
def train_sequential(param_sets):
for params in param_sets:
train_model(params)
# Parallel model training using multiprocessing
def train_parallel(param_sets):
with ThreadPoolExecutor(4) as p:
_ = [p.submit(train_model, ps) for ps in param_sets]
# Generate synthetic classification dataset
X, y = make_classification(n_samples=1000000, n_features=20, random_state=42)
# Time the sequential training
start_sequential = time.perf_counter()
train_sequential(get_params(4))
end_sequential = time.perf_counter()
print(f"Sequential training time: {end_sequential - start_sequential:.2f} seconds")
# Time the parallel training
start_parallel = time.perf_counter()
train_parallel(get_params(2))
end_parallel = time.perf_counter()
print(f"Parallel training time: {end_parallel - start_parallel:.2f} seconds")
# Calculate speedup
speedup = (end_sequential - start_sequential) / (end_parallel - start_parallel)
print(f"Parallel training is {speedup:.2f} times faster than sequential training")
You may see output that looks like the following:
Sequential training time: 17.81 seconds
Parallel training time: 13.56 seconds
Parallel training is 1.31 times faster than sequential training
The specific speedup factor will depend on the system where the code is run.
Here’s what’s happening:
- We configure BLAS to be single-threaded via the
'OMP_NUM_THREADS'
environment variable. - We generate a synthetic dataset using
sklearn.datasets.make_classification
. - We define a function
train_model
that takes hyperparameters as input and returns a trained XGBClassifier model. - We create a list of different hyperparameter configurations to train models with a configurable number of threads (
n_jobs
). - We define two functions:
train_sequential
for sequential model training andtrain_parallel
for parallel model training usingThreadPoolExecutor
. - We time the execution of sequential model training with
n_jobs=4
for each model and parallel model training withn_jobs=2
for each model but 4 models trained at a time. - We print the execution times and the speedup achieved with parallel training.
The train_parallel
function uses ThreadPoolExecutor
to distribute the model training workload across multiple threads. The submit
function issues a task to the process pool, and the max_workers
parameter specifies the number of threads to use.
Vary the number of worker threads used by ThreadPoolExecutor
and the number of threads set via n_jobs
for the sequential and parallel functions to optimize performance on your system.