How to Use XGBoost XGBRegressor

The xgboost.XGBRegressor class offers a streamlined approach to training powerful XGBoost models for regression tasks, seamlessly integrating with the scikit-learn library.

This example showcases how to use XGBRegressor to train a model on the Boston Housing dataset, demonstrating the key steps involved: loading data, splitting into train/test sets, defining model parameters, training the model, and evaluating its performance.

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import xgboost as xgb

# Load the Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define XGBRegressor model parameters
params = {
    'objective': 'reg:squarederror',
    'max_depth': 3,
    'learning_rate': 0.1,
    'n_estimators': 100,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'random_state': 42
}

# Instantiate XGBRegressor with the parameters
model = xgb.XGBRegressor(**params)

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate model performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")

First, we load the Housing dataset using sklearn.datasets.fetch_california_housing() and split the data into training and test sets with sklearn.model_selection.train_test_split().

Next, we define the XGBRegressor model parameters in a dictionary. The 'objective' parameter is set to 'reg:squarederror' for regression tasks. Other parameters, such as 'max_depth', 'learning_rate', and 'n_estimators', control the model’s complexity and training process.

We create an instance of the XGBRegressor with the defined parameters and train the model using the fit() method on the training data. After training, we make predictions on the test set using the predict() method.

Finally, we evaluate the model’s performance using metrics from sklearn.metrics. We calculate the mean squared error (MSE) and the coefficient of determination (R-squared), printing them to showcase the model’s effectiveness.

By following this example, you can efficiently train an XGBoost model for regression tasks using the xgboost.XGBRegressor class, maintaining control over the model’s hyperparameters and easily assessing its performance.

See Also