XGBRegressor Faster Than GradientBoostingRegressor

When it comes to regression tasks, XGBoost is renowned for its exceptional performance and speed.

But just how much faster is XGBoostRegressor compared to scikit-learn’s GradientBoostingRegressor?

To find out, let’s compare the training times of these two regressors on a synthetic dataset.

from sklearn.datasets import make_regression
from sklearn.ensemble import GradientBoostingRegressor
from xgboost import XGBRegressor
import time

# Generate a synthetic regression dataset
X, y = make_regression(n_samples=10000, n_features=100, noise=0.1, random_state=42)

# Initialize the regressors with comparable hyperparameters
xgb_reg = XGBRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gb_reg = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# Fit XGBoostRegressor and measure the training time
start_time = time.perf_counter()
xgb_reg.fit(X, y)
xgb_time = time.perf_counter() - start_time
print(f"XGBoostRegressor training time: {xgb_time:.2f} seconds")

# Fit GradientBoostingRegressor and measure the training time
start_time = time.perf_counter()
gb_reg.fit(X, y)
gb_time = time.perf_counter() - start_time
print(f"GradientBoostingRegressor training time: {gb_time:.2f} seconds")

We begin by generating a synthetic regression dataset using scikit-learn’s make_regression function with 10,000 samples and 100 features. The noise parameter is set to 0.1 to introduce some randomness in the target values.

Next, we initialize XGBoostRegressor and GradientBoostingRegressor with identical hyperparameters:

n_estimators=100: The number of boosting rounds or weak learners.
learning_rate=0.1: The learning rate or step size for each boosting round.
max_depth=3: The maximum depth of each decision tree.
random_state=42: For reproducibility.

We then fit each regressor on the dataset and measure the training time using the time module. The start_time is recorded before fitting the model, and the elapsed time is calculated after the fit completes.

Finally, we print the training times for both regressors.

Running this code may yield output similar to:

XGBoostRegressor training time: 0.29 seconds
GradientBoostingRegressor training time: 31.41 seconds

Below is an updated comparison that repeats each each experiment many times and plots the distributions.

import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.ensemble import GradientBoostingRegressor
from xgboost import XGBRegressor

# Generate a synthetic binary classification dataset
X, y = make_regression(n_samples=10000, n_features=100, noise=0.1, random_state=42)

# Initialize the classifiers with comparable hyperparameters
xgb_reg = XGBRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gb_reg = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# Lists to store training times
xgb_times = []
gb_times = []

# Run the benchmark 10 times
for i in range(10):
    # Measure training time for XGBRegressor
    start_time = time.perf_counter()
    xgb_reg.fit(X, y)
    xgb_duration = time.perf_counter() - start_time
    xgb_times.append(xgb_duration)

    # Measure training time for GradientBoostingRegressor
    start_time = time.perf_counter()
    gb_reg.fit(X, y)
    gb_duration = time.perf_counter() - start_time
    gb_times.append(gb_duration)

    # Report progress
    print(f'> {i} xgb: {xgb_duration:.3f}, gb: {gb_duration:.3f}')

# Calculate mean and standard deviation of training times
xgb_mean = np.mean(xgb_times)
xgb_std = np.std(xgb_times)
gb_mean = np.mean(gb_times)
gb_std = np.std(gb_times)

# Print mean and standard deviation of training times
print(f"XGBRegressor mean training time: {xgb_mean:.2f} seconds (std: {xgb_std:.2f})")
print(f"GradientBoostingRegressor mean training time: {gb_mean:.2f} seconds (std: {gb_std:.2f})")

# Plot the distributions as side-by-side boxplots using matplotlib
plt.figure(figsize=(10, 6))
plt.boxplot([xgb_times, gb_times], labels=['XGBoost', 'GradientBoosting'])
plt.ylabel('Training Time (seconds)')
plt.title('Training Time Comparison')
plt.show()

The results may look something like the following:

> 0 xgb: 0.230, gb: 29.681
> 1 xgb: 0.235, gb: 30.014
> 2 xgb: 0.299, gb: 30.599
> 3 xgb: 0.237, gb: 29.754
> 4 xgb: 0.227, gb: 29.639
> 5 xgb: 0.278, gb: 29.329
> 6 xgb: 0.217, gb: 29.743
> 7 xgb: 0.228, gb: 29.593
> 8 xgb: 0.252, gb: 31.222
> 9 xgb: 0.267, gb: 30.684
XGBRegressor mean training time: 0.25 seconds (std: 0.03)
GradientBoostingRegressor mean training time: 30.03 seconds (std: 0.57)

The exact times may differ based on your hardware and system setup, but you should notice that XGBoostRegressor trains much faster than GradientBoostingRegressor, often by a significant margin.

This speed advantage is one of the key reasons why XGBoost is favored by data scientists and machine learning practitioners for handling large-scale datasets and complex regression tasks. By leveraging XGBoost’s efficiency, you can experiment with more features, fine-tune hyperparameters, and develop high-performing models in a fraction of the time compared to other gradient boosting implementations.

See Also