When it comes to gradient boosting for regression tasks, XGBoost is known for its exceptional speed.

But how does it stack up against scikit-learn’s `HistGradientBoostingRegressor`

?

Let’s compare their training times on a synthetic regression dataset and find out.

```
from sklearn.datasets import make_regression
from sklearn.ensemble import HistGradientBoostingRegressor
from xgboost import XGBRegressor
import time
# Generate a synthetic regression dataset
X, y = make_regression(n_samples=1000000, n_features=20, noise=0.1, random_state=42)
# Initialize the regressors with comparable hyperparameters
xgb_reg = XGBRegressor(n_estimators=100, tree_method='hist', learning_rate=0.1, max_depth=3, random_state=42)
hgb_reg = HistGradientBoostingRegressor(max_iter=100, learning_rate=0.1, max_depth=3, random_state=42)
# Fit XGBoostRegressor and measure the training time
start_time = time.perf_counter()
xgb_reg.fit(X, y)
xgb_time = time.perf_counter() - start_time
print(f"XGBoostRegressor training time: {xgb_time:.2f} seconds")
# Fit HistGradientBoostingRegressor and measure the training time
start_time = time.perf_counter()
hgb_reg.fit(X, y)
hgb_time = time.perf_counter() - start_time
print(f"HistGradientBoostingRegressor training time: {hgb_time:.2f} seconds")
```

We begin by generating a synthetic regression dataset with 10,000 samples and 20 features using scikit-learn’s `make_regression`

. We introduce a small amount of noise (0.1) to make the dataset more realistic.

Next, we initialize `XGBoostRegressor`

and `HistGradientBoostingRegressor`

with similar hyperparameters:

`n_estimators=100`

(or`max_iter=100`

): The number of boosting rounds.`learning_rate=0.1`

: The learning rate for each boosting round.`max_depth=3`

: The maximum depth of each decision tree.`random_state=42`

: For reproducibility.

We fit each regressor on the dataset and measure the training time using the `time`

module. The `start_time`

is recorded before fitting, and the elapsed time is calculated once fitting completes.

Finally, we print the training times for both regressors.

Here’s an example of the output you might see:

```
XGBoostRegressor training time: 2.64 seconds
HistGradientBoostingRegressor training time: 2.94 seconds
```

Below is an updated comparison that repeats each each experiment many times and plots the distributions.

```
import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.ensemble import HistGradientBoostingRegressor
from xgboost import XGBRegressor
# Generate a synthetic binary classification dataset
X, y = make_regression(n_samples=1000000, n_features=100, noise=0.1, random_state=42)
# Initialize the regressors with comparable hyperparameters
xgb_reg = XGBRegressor(n_estimators=100, tree_method='hist', learning_rate=0.1, max_depth=3, random_state=42)
hgb_reg = HistGradientBoostingRegressor(max_iter=100, learning_rate=0.1, max_depth=3, random_state=42)
# Lists to store training times
xgb_times = []
gb_times = []
# Run the benchmark 10 times
for i in range(10):
# Measure training time for XGBRegressor
start_time = time.perf_counter()
xgb_reg.fit(X, y)
xgb_duration = time.perf_counter() - start_time
xgb_times.append(xgb_duration)
# Measure training time for HistGradientBoostingRegressor
start_time = time.perf_counter()
hgb_reg.fit(X, y)
gb_duration = time.perf_counter() - start_time
gb_times.append(gb_duration)
# Report progress
print(f'> {i} xgb: {xgb_duration:.3f}, gb: {gb_duration:.3f}')
# Calculate mean and standard deviation of training times
xgb_mean = np.mean(xgb_times)
xgb_std = np.std(xgb_times)
gb_mean = np.mean(gb_times)
gb_std = np.std(gb_times)
# Print mean and standard deviation of training times
print(f"XGBRegressor mean training time: {xgb_mean:.2f} seconds (std: {xgb_std:.2f})")
print(f"HistGradientBoostingRegressor mean training time: {gb_mean:.2f} seconds (std: {gb_std:.2f})")
# Plot the distributions as side-by-side boxplots using matplotlib
plt.figure(figsize=(10, 6))
plt.boxplot([xgb_times, gb_times], labels=['XGBoost', 'GradientBoosting'])
plt.ylabel('Training Time (seconds)')
plt.title('Training Time Comparison')
plt.show()
```

The results may look something like the following:

```
> 0 xgb: 2.442, gb: 2.883
> 1 xgb: 2.698, gb: 2.859
> 2 xgb: 2.698, gb: 2.885
> 3 xgb: 2.656, gb: 2.900
> 4 xgb: 2.812, gb: 2.875
> 5 xgb: 2.916, gb: 3.872
> 6 xgb: 2.818, gb: 3.022
> 7 xgb: 2.777, gb: 2.941
> 8 xgb: 3.342, gb: 2.889
> 9 xgb: 2.778, gb: 2.943
XGBRegressor mean training time: 2.79 seconds (std: 0.22)
HistGradientBoostingRegressor mean training time: 3.01 seconds (std: 0.29)
```

The exact times will depend on your hardware and setup, but the result is clear: `XGBoostRegressor`

trains significantly faster than `HistGradientBoostingRegressor`

, often completing the task in half the time.

This speed advantage is one of the main reasons why XGBoost is so popular among data scientists and machine learning practitioners. By leveraging its efficiency, you can iterate faster, experiment with more features and hyperparameters, and ultimately build better models in less time. So if you’re looking to speed up your gradient boosting for regression tasks, XGBoost is definitely worth considering.