Random forest is an ensemble learning method that constructs multiple decision trees and combines their predictions to improve regression performance.
XGBoost’s XGBRFRegressor
class implements the random forest algorithm for regression tasks, leveraging the power and efficiency of the XGBoost library.
This example demonstrates how to fit a random forest regressor using XGBRFRegressor
on a synthetic regression dataset. We’ll generate the dataset, split it into train and test sets, define the model parameters, train the regressor, and evaluate its performance.
import numpy as np
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error, r2_score
import xgboost as xgb
# Generate a synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split data into train and test sets
train_size = int(0.8 * len(X))
X_train, y_train = X[:train_size], y[:train_size]
X_test, y_test = X[train_size:], y[train_size:]
# Define XGBRFRegressor parameters
params = {
'n_estimators': 100,
'subsample': 0.8,
'colsample_bynode': 0.8,
'max_depth': 3,
'random_state': 42
}
# Instantiate XGBRFRegressor with the defined parameters
model = xgb.XGBRFRegressor(**params)
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate model performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")
In this example, we start by generating a synthetic regression dataset using sklearn.datasets.make_regression()
. We then split the data into training and test sets.
Next, we define the XGBRFRegressor
parameters in a dictionary. The 'n_estimators'
parameter sets the number of trees in the forest, while 'subsample'
and 'colsample_bynode'
introduce randomness by sampling observations and features, respectively. The 'max_depth'
parameter limits the depth of each tree.
We create an instance of the XGBRFRegressor
with the defined parameters and train the model using the fit()
method on the training data. After training, we make predictions on the test set using the predict()
method.
Finally, we evaluate the model’s performance using the mean squared error and R-squared metrics from sklearn.metrics
. These metrics provide insights into the model’s effectiveness in predicting the continuous target variable.
By following this example, you can quickly fit an XGBoost random forest regressor using the XGBRFRegressor
class, while controlling the model’s hyperparameters and evaluating its performance on a regression task.