XGBClassifier Faster Than GradientBoostingClassifier

XGBoost is known for its speed and performance, but just how much faster is it compared to other gradient boosting implementations?

Let’s find out by comparing the training time of XGBoostClassifier and GradientBoostingClassifier from scikit-learn on a synthetic dataset.

from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier
import time

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=10000, n_classes=2, random_state=42)

# Initialize the classifiers with comparable hyperparameters
xgb_clf = XGBClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gb_clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# Fit XGBoostClassifier and measure the training time
start_time = time.perf_counter()
xgb_clf.fit(X, y)
xgb_time = time.perf_counter() - start_time
print(f"XGBoostClassifier training time: {xgb_time:.2f} seconds")

# Fit GradientBoostingClassifier and measure the training time
start_time = time.perf_counter()
gb_clf.fit(X, y)
gb_time = time.perf_counter() - start_time
print(f"GradientBoostingClassifier training time: {gb_time:.2f} seconds")

In this example, we first generate a synthetic binary classification dataset using scikit-learn’s make_classification function with 10,000 samples.

Next, we initialize XGBoostClassifier and GradientBoostingClassifier with comparable hyperparameters:

n_estimators=100: The number of boosting rounds or weak learners.
learning_rate=0.1: The learning rate or step size for each boosting round.
max_depth=3: The maximum depth of each decision tree.
random_state=42: For reproducibility.

We then fit each classifier on the dataset and measure the training time using the time module. The start_time is recorded before fitting the model, and the elapsed time is calculated after the fit completes.

Finally, we print the training times for both classifiers.

On running this code, you might see output similar to:

XGBoostClassifier training time: 0.06 seconds
GradientBoostingClassifier training time: 6.31 seconds

Below is an updated comparison that repeats each each experiment many times and plots the distributions.

import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=10000, n_classes=2, random_state=42)

# Initialize the classifiers with comparable hyperparameters
xgb_clf = XGBClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gb_clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# Lists to store training times
xgb_times = []
gb_times = []

# Run the benchmark 10 times
for i in range(10):
    # Measure training time for XGBoostClassifier
    start_time = time.perf_counter()
    xgb_clf.fit(X, y)
    xgb_duration = time.perf_counter() - start_time
    xgb_times.append(xgb_duration)

    # Measure training time for GradientBoostingClassifier
    start_time = time.perf_counter()
    gb_clf.fit(X, y)
    gb_duration = time.perf_counter() - start_time
    gb_times.append(gb_duration)

    # Report progress
    print(f'> {i} xgb: {xgb_duration:.3f}, gb: {gb_duration:.3f}')

# Calculate mean and standard deviation of training times
xgb_mean = np.mean(xgb_times)
xgb_std = np.std(xgb_times)
gb_mean = np.mean(gb_times)
gb_std = np.std(gb_times)

# Print mean and standard deviation of training times
print(f"XGBoostClassifier mean training time: {xgb_mean:.2f} seconds (std: {xgb_std:.2f})")
print(f"GradientBoostingClassifier mean training time: {gb_mean:.2f} seconds (std: {gb_std:.2f})")

# Plot the distributions as side-by-side boxplots using matplotlib
plt.figure(figsize=(10, 6))
plt.boxplot([xgb_times, gb_times], labels=['XGBoost', 'GradientBoosting'])
plt.ylabel('Training Time (seconds)')
plt.title('Training Time Comparison')
plt.show()

The results may look something like the following:

> 0 xgb: 0.083, gb: 6.010
> 1 xgb: 0.061, gb: 6.038
> 2 xgb: 0.060, gb: 5.992
> 3 xgb: 0.058, gb: 6.092
> 4 xgb: 0.071, gb: 6.141
> 5 xgb: 0.061, gb: 6.135
> 6 xgb: 0.061, gb: 6.059
> 7 xgb: 0.062, gb: 6.053
> 8 xgb: 0.059, gb: 6.028
> 9 xgb: 0.057, gb: 6.054
XGBoostClassifier mean training time: 0.06 seconds (std: 0.01)
GradientBoostingClassifier mean training time: 6.06 seconds (std: 0.05)

The actual times may vary depending on your hardware and system configuration, but you should observe that XGBoostClassifier trains significantly faster than GradientBoostingClassifier, often by a factor of 10x or 100x or more.

This speed advantage, combined with XGBoost’s high performance and flexibility, makes it a popular choice among data scientists and machine learning practitioners for tasks ranging from tabular data to large-scale datasets. By leveraging XGBoost’s efficiency, you can iterate faster, experiment with more features and hyperparameters, and ultimately build better models in less time.

See Also