The best_score
attribute in XGBoost provides a convenient way to monitor the model’s performance on a validation set during training when using early stopping. By accessing best_score
, you can determine the best performance achieved by the model and prevent overfitting.
Early stopping is a regularization technique that halts the training process when the model’s performance on a validation set stops improving for a specified number of consecutive iterations. This helps to avoid overfitting and improves the model’s generalization ability.
Here’s an example that demonstrates how to access and utilize the best_score
attribute in XGBoost:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
# Load the California Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target
# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# Set up an XGBoost model with early stopping
model = XGBRegressor(n_estimators=1000, learning_rate=0.1, subsample=0.8, colsample_bytree=0.8,
early_stopping_rounds=10, eval_metric='rmse', random_state=42)
# Train the model with early stopping
model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=True)
# Access the best_score attribute
best_score = model.best_score
best_iteration = model.best_iteration
print(f"Best RMSE: {best_score:.4f}")
print(f"Best iteration: {best_iteration}")
In this example, we load the California Housing dataset and split it into training and validation sets. We then set up an XGBRegressor model with early stopping by specifying the early_stopping_rounds
parameter. The eval_set
parameter is used to provide the validation set for monitoring the model’s performance during training.
By setting verbose=True
during the fit()
method, we can observe the model’s progress and see the validation RMSE at each iteration.
After training, we access the best_score
attribute, which represents the best RMSE achieved on the validation set. We also retrieve the best_iteration
attribute, which indicates the iteration at which the best score was obtained.
The best_score
attribute is particularly useful for understanding the model’s performance and determining if early stopping has effectively prevented overfitting. A lower best_score
value indicates better performance on the validation set.
It’s important to choose an appropriate evaluation metric for best_score
based on the problem type. In this example, we use ‘rmse’ (Root Mean Squared Error) since it is a regression problem. For classification problems, metrics like ‘auc’ (Area Under the ROC Curve) or ’logloss’ (Log Loss) can be used.
Adjusting the early_stopping_rounds
parameter allows you to control the trade-off between preventing overfitting and allowing sufficient training. A smaller value will stop training earlier, while a larger value will allow the model to continue training for more iterations before stopping.
By leveraging the best_score
attribute in XGBoost, you can monitor the model’s performance, prevent overfitting, and select the best model based on its performance on the validation set.