XGBoost "best_score" Property

The best_score attribute in XGBoost provides a convenient way to monitor the model’s performance on a validation set during training when using early stopping. By accessing best_score, you can determine the best performance achieved by the model and prevent overfitting.

Early stopping is a regularization technique that halts the training process when the model’s performance on a validation set stops improving for a specified number of consecutive iterations. This helps to avoid overfitting and improves the model’s generalization ability.

Here’s an example that demonstrates how to access and utilize the best_score attribute in XGBoost:

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

# Load the California Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Set up an XGBoost model with early stopping
model = XGBRegressor(n_estimators=1000, learning_rate=0.1, subsample=0.8, colsample_bytree=0.8,
                     early_stopping_rounds=10, eval_metric='rmse', random_state=42)

# Train the model with early stopping
model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=True)

# Access the best_score attribute
best_score = model.best_score
best_iteration = model.best_iteration

print(f"Best RMSE: {best_score:.4f}")
print(f"Best iteration: {best_iteration}")

In this example, we load the California Housing dataset and split it into training and validation sets. We then set up an XGBRegressor model with early stopping by specifying the early_stopping_rounds parameter. The eval_set parameter is used to provide the validation set for monitoring the model’s performance during training.

By setting verbose=True during the fit() method, we can observe the model’s progress and see the validation RMSE at each iteration.

After training, we access the best_score attribute, which represents the best RMSE achieved on the validation set. We also retrieve the best_iteration attribute, which indicates the iteration at which the best score was obtained.

The best_score attribute is particularly useful for understanding the model’s performance and determining if early stopping has effectively prevented overfitting. A lower best_score value indicates better performance on the validation set.

It’s important to choose an appropriate evaluation metric for best_score based on the problem type. In this example, we use ‘rmse’ (Root Mean Squared Error) since it is a regression problem. For classification problems, metrics like ‘auc’ (Area Under the ROC Curve) or ’logloss’ (Log Loss) can be used.

Adjusting the early_stopping_rounds parameter allows you to control the trade-off between preventing overfitting and allowing sufficient training. A smaller value will stop training earlier, while a larger value will allow the model to continue training for more iterations before stopping.

By leveraging the best_score attribute in XGBoost, you can monitor the model’s performance, prevent overfitting, and select the best model based on its performance on the validation set.

See Also