The xgboost.XGBRFRegressor
class extends XGBoost’s capabilities by implementing a random forest regressor, offering an alternative approach to regression tasks while maintaining the performance and efficiency of the XGBoost framework.
This example demonstrates how to use XGBRFRegressor
to train a model on the California Housing dataset, covering the essential steps: loading data, splitting into train/test sets, defining model parameters, training the model, and evaluating its performance.
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import xgboost as xgb
# Load the California Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define XGBRFRegressor model parameters
params = {
'objective': 'reg:squarederror',
'max_depth': 3,
'learning_rate': 0.1,
'subsample': 0.8,
'colsample_bynode': 0.8,
'num_parallel_tree': 100,
'random_state': 42
}
# Instantiate XGBRFRegressor with the parameters
model = xgb.XGBRFRegressor(**params)
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate model performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")
First, we load the California Housing dataset using sklearn.datasets.fetch_california_housing()
and split the data into training and test sets with sklearn.model_selection.train_test_split()
.
Next, we define the XGBRFRegressor
model parameters in a dictionary. The 'objective'
parameter is set to 'reg:squarederror'
for regression tasks. Other parameters, such as 'max_depth'
, 'learning_rate'
, and 'num_parallel_tree'
, control the model’s complexity and training process.
We create an instance of the XGBRFRegressor
with the defined parameters and train the model using the fit()
method on the training data. After training, we make predictions on the test set using the predict()
method.
Finally, we evaluate the model’s performance using metrics from sklearn.metrics
. We calculate the mean squared error (MSE) and the coefficient of determination (R-squared), printing them to showcase the model’s effectiveness.
By following this example, you can efficiently train an XGBoost random forest model for regression tasks using the xgboost.XGBRFRegressor
class, maintaining control over the model’s hyperparameters and easily assessing its performance.