Training multiple XGBoost models sequentially, each with its own set of hyperparameters, can be a time-consuming process. However, by leveraging Python’s multiprocessing
module and a process pool, you can train these models in parallel, potentially reducing the overall execution time.
For optimal performance, it’s important that each process generates its own training data to avoid the overhead of inter-process communication (IPC) when passing large datasets. Additionally, setting n_jobs=1
(or a small number) for each model ensures there is no contention between the processes.
This example demonstrates how to train multiple XGBoost models in parallel using a process pool and compares the execution time against sequential training where each model uses n_jobs=4
.
import os
# Avoid contention between processes
os.environ['OMP_NUM_THREADS'] = '1'
import numpy as np
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
from concurrent.futures import ProcessPoolExecutor
import time
# Generate synthetic classification dataset
def generate_data():
X, y = make_classification(n_samples=1000000, n_features=20, random_state=42)
return X, y
# List of hyperparameter configurations
def get_params(n_jobs):
return [
{'n_estimators': 100, 'max_depth': 3, 'learning_rate': 0.1, 'n_jobs': n_jobs},
{'n_estimators': 200, 'max_depth': 4, 'learning_rate': 0.05, 'n_jobs': n_jobs},
{'n_estimators': 150, 'max_depth': 5, 'learning_rate': 0.08, 'n_jobs': n_jobs},
{'n_estimators': 180, 'max_depth': 3, 'learning_rate': 0.12, 'n_jobs': n_jobs},
]
# Train single XGBoost model
def train_model(params):
X, y = generate_data()
model = XGBClassifier(**params)
model.fit(X, y)
# Sequential model training
def train_sequential(param_sets):
for params in param_sets:
train_model(params)
# Parallel model training using multiprocessing
def train_parallel(param_sets):
with ProcessPoolExecutor(4) as p:
_ = [p.submit(train_model, ps) for ps in param_sets]
if __name__ == '__main__':
# Time the sequential training
start_sequential = time.perf_counter()
train_sequential(get_params(4))
end_sequential = time.perf_counter()
print(f"Sequential training time: {end_sequential - start_sequential:.2f} seconds")
# Time the parallel training
start_parallel = time.perf_counter()
train_parallel(get_params(2))
end_parallel = time.perf_counter()
print(f"Parallel training time: {end_parallel - start_parallel:.2f} seconds")
# Calculate speedup
speedup = (end_sequential - start_sequential) / (end_parallel - start_parallel)
print(f"Parallel training is {speedup:.2f} times faster than sequential training")
You may see output that looks like the following:
Sequential training time: 25.66 seconds
Parallel training time: 16.93 seconds
Parallel training is 1.52 times faster than sequential training
The specific speedup factor will depend on the system where the code is run.
The code does the following:
- Sets the
'OMP_NUM_THREADS'
environment variable to 1 to avoid contention between processes. - Defines a
generate_data
function that creates a synthetic classification dataset usingsklearn.datasets.make_classification
. - Defines a list of hyperparameter configurations for the models.
- Defines a
train_model
function that generates data and trains a single XGBoost model with given hyperparameters. - Defines a
train_sequential
function for sequential model training. - Defines a
train_parallel
function for parallel model training using amultiprocessing
process pool with 4 processes. - Times the sequential training with
n_jobs=4
for each model. - Times the parallel training with 4 models concurrently, each with
n_jobs=2
. - Prints the execution times and calculates the speedup achieved through parallelization.
Vary the number of worker threads used by ProcessPoolExecutor
and the number of threads set via n_jobs
for the sequential and parallel functions to optimize performance on your system.