XGBoost Train Multiple Models in Parallel with Joblib

Training multiple XGBoost models on different datasets or with various hyperparameters can be time-consuming when done sequentially.

However, by leveraging the power of Joblib, a Python library for easy parallelization, you can speed up the training process by running multiple models in parallel across the available CPU cores.

This example demonstrates how to use Joblib to train multiple XGBoost models concurrently on a synthetic binary classification dataset. We’ll compare the execution time of parallel training against sequential training to showcase the potential speedup.

from joblib import Parallel, delayed
from xgboost import XGBClassifier
from sklearn.datasets import make_classification
import numpy as np
import time

# Generate synthetic classification dataset
X, y = make_classification(n_samples=1000000, n_classes=2, n_features=20, random_state=42)

# List of hyperparameter configurations
def get_params(n_jobs):
    return [
        {'n_estimators': 100, 'max_depth': 3, 'learning_rate': 0.1, 'n_jobs': n_jobs},
        {'n_estimators': 200, 'max_depth': 4, 'learning_rate': 0.05, 'n_jobs': n_jobs},
        {'n_estimators': 150, 'max_depth': 5, 'learning_rate': 0.08, 'n_jobs': n_jobs},
        {'n_estimators': 180, 'max_depth': 3, 'learning_rate': 0.12, 'n_jobs': n_jobs},
    ]

# Train single XGBoost model
def train_model(params):
    model = XGBClassifier(**params)
    model.fit(X, y)

# Sequential model training
def train_sequential(param_sets):
    for params in param_sets:
        train_model(params)

# Parallel model training using joblib
def train_parallel(param_sets, workers):
    with Parallel(n_jobs=workers, backend='threading') as parallel:
        parallel(delayed(train_model)(params) for params in param_sets)

# Time the sequential training
start_sequential = time.perf_counter()
train_sequential(get_params(4))
end_sequential = time.perf_counter()
print(f"Sequential training time: {end_sequential - start_sequential:.2f} seconds")

# Time the parallel training
start_parallel = time.perf_counter()
train_parallel(get_params(1), workers=4)
end_parallel = time.perf_counter()
print(f"Parallel training time: {end_parallel - start_parallel:.2f} seconds")

# Calculate speedup
speedup = (end_sequential - start_sequential) / (end_parallel - start_parallel)
print(f"Parallel training is {speedup:.2f} times faster than sequential training")

Running this code, you might see output similar to:

Sequential training time: 19.94 seconds
Parallel training time: 18.94 seconds
Parallel training is 1.05 times faster than sequential training

The exact speedup will depend on your system’s hardware and the specific dataset and models used.

Here’s a breakdown of the code:

We generate a synthetic binary classification dataset using sklearn’s make_classification function.
We define a list of hyperparameter configurations to train our models with.
The train_model function takes a set of hyperparameters, initializes an XGBClassifier with those parameters, and trains the model on our dataset.
We define two functions: train_sequential for sequential model training and train_parallel for parallel training using Joblib.
In train_parallel, we use Parallel and delayed to distribute the model training across the specified number of CPU cores (n_jobs). The backend='threading' option tells Joblib to use multi-threading for parallelization.
We time the execution of both sequential and parallel training and print the results.
Finally, we calculate the speedup achieved by parallel training over sequential training.

Experiment with the number of jobs (n_jobs) passed to Parallel to find the optimal setting for your system. Set n_jobs appropriately for the to prevent resource contention between the parallel processes.

By leveraging Joblib, you can easily parallelize your XGBoost model training and potentially achieve significant speedups, especially when working with larger datasets or more complex models.

See Also