XGBoosting Home | About | Contact | Examples

XGBRFRegressor Faster Than RandomForestRegressor

When it comes to regression tasks, both XGBoost and scikit-learn offer random forest implementations.

However, XGBoost’s XGBRFRegressor is known for its superior speed compared to scikit-learn’s RandomForestRegressor.

Let’s put this to the test by comparing their training times on a synthetic dataset.

from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRFRegressor
import time

# Generate a synthetic regression dataset
X, y = make_regression(n_samples=10000, n_features=100, noise=0.1, random_state=42)

# Initialize the regressors with comparable hyperparameters
xgbrf_reg = XGBRFRegressor(n_estimators=100, max_depth=10, random_state=42)
rf_reg = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)

# Fit XGBRFRegressor and measure the training time
start_time = time.perf_counter()
xgbrf_reg.fit(X, y)
xgbrf_time = time.perf_counter() - start_time
print(f"XGBRFRegressor training time: {xgbrf_time:.2f} seconds")

# Fit RandomForestRegressor and measure the training time
start_time = time.perf_counter()
rf_reg.fit(X, y)
rf_time = time.perf_counter() - start_time
print(f"RandomForestRegressor training time: {rf_time:.2f} seconds")

We begin by generating a synthetic regression dataset using scikit-learn’s make_regression function. We create a dataset with 10,000 samples and 100 features, and add some noise to make it more realistic.

Next, we initialize XGBRFRegressor and RandomForestRegressor with identical hyperparameters:

We then fit each regressor on the dataset and measure the training time using the time module. The start_time is recorded before fitting the model, and the elapsed time is calculated after the fit completes.

Finally, we print the training times for both regressors.

On executing this code, you might see output resembling:

XGBRFRegressor training time: 14.45 seconds
RandomForestRegressor training time: 56.66 seconds

The exact times may differ based on your hardware and system setup, but you should notice that XGBRFRegressor trains considerably faster than RandomForestRegressor.

This speed advantage allows you to train models faster, iterate more quickly, and experiment with a wider range of hyperparameters and features. By leveraging XGBoost’s performance, you can streamline your machine learning workflow and develop high-quality models in less time, making it a top choice for data scientists and machine learning practitioners tackling regression problems.



See Also