XGBoosting Home | About | Contact | Examples

XGBoost Train Multiple Models in Parallel (multiprocessing)

Training multiple XGBoost models sequentially, each with its own set of hyperparameters, can be a time-consuming process. However, by leveraging Python’s multiprocessing module and a process pool, you can train these models in parallel, potentially reducing the overall execution time.

For optimal performance, it’s important that each process generates its own training data to avoid the overhead of inter-process communication (IPC) when passing large datasets. Additionally, setting n_jobs=1 (or a small number) for each model ensures there is no contention between the processes.

This example demonstrates how to train multiple XGBoost models in parallel using a process pool and compares the execution time against sequential training where each model uses n_jobs=4.

import os
# Avoid contention between processes
os.environ['OMP_NUM_THREADS'] = '1'
import numpy as np
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
from concurrent.futures import ProcessPoolExecutor
import time

# Generate synthetic classification dataset
def generate_data():
    X, y = make_classification(n_samples=1000000, n_features=20, random_state=42)
    return X, y

# List of hyperparameter configurations
def get_params(n_jobs):
    return [
        {'n_estimators': 100, 'max_depth': 3, 'learning_rate': 0.1, 'n_jobs': n_jobs},
        {'n_estimators': 200, 'max_depth': 4, 'learning_rate': 0.05, 'n_jobs': n_jobs},
        {'n_estimators': 150, 'max_depth': 5, 'learning_rate': 0.08, 'n_jobs': n_jobs},
        {'n_estimators': 180, 'max_depth': 3, 'learning_rate': 0.12, 'n_jobs': n_jobs},
    ]

# Train single XGBoost model
def train_model(params):
    X, y = generate_data()
    model = XGBClassifier(**params)
    model.fit(X, y)

# Sequential model training
def train_sequential(param_sets):
    for params in param_sets:
        train_model(params)

# Parallel model training using multiprocessing
def train_parallel(param_sets):
    with ProcessPoolExecutor(4) as p:
        _ = [p.submit(train_model, ps) for ps in param_sets]

if __name__ == '__main__':
    # Time the sequential training
    start_sequential = time.perf_counter()
    train_sequential(get_params(4))
    end_sequential = time.perf_counter()
    print(f"Sequential training time: {end_sequential - start_sequential:.2f} seconds")

    # Time the parallel training
    start_parallel = time.perf_counter()
    train_parallel(get_params(2))
    end_parallel = time.perf_counter()
    print(f"Parallel training time: {end_parallel - start_parallel:.2f} seconds")

    # Calculate speedup
    speedup = (end_sequential - start_sequential) / (end_parallel - start_parallel)
    print(f"Parallel training is {speedup:.2f} times faster than sequential training")

You may see output that looks like the following:

Sequential training time: 25.66 seconds
Parallel training time: 16.93 seconds
Parallel training is 1.52 times faster than sequential training

The specific speedup factor will depend on the system where the code is run.

The code does the following:

  1. Sets the 'OMP_NUM_THREADS' environment variable to 1 to avoid contention between processes.
  2. Defines a generate_data function that creates a synthetic classification dataset using sklearn.datasets.make_classification.
  3. Defines a list of hyperparameter configurations for the models.
  4. Defines a train_model function that generates data and trains a single XGBoost model with given hyperparameters.
  5. Defines a train_sequential function for sequential model training.
  6. Defines a train_parallel function for parallel model training using a multiprocessing process pool with 4 processes.
  7. Times the sequential training with n_jobs=4 for each model.
  8. Times the parallel training with 4 models concurrently, each with n_jobs=2.
  9. Prints the execution times and calculates the speedup achieved through parallelization.

Vary the number of worker threads used by ProcessPoolExecutor and the number of threads set via n_jobs for the sequential and parallel functions to optimize performance on your system.



See Also