Early stopping is a regularization technique that helps prevent overfitting in XGBoost models by halting the training process when the model’s performance on a validation set stops improving.
By setting the min_delta
parameter in the xgboost.callback.EarlyStopping
callback, you can control the minimum improvement required to continue training, effectively tuning the sensitivity of the early stopping mechanism.
This example demonstrates how to configure the early stopping tolerance using the native XGBoost API with xgboost.callback.EarlyStopping(min_delta=0.01)
:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import xgboost as xgb
# Load the breast cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target
# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# Set up early stopping callback with min_delta=0.01
early_stop = xgb.callback.EarlyStopping(rounds=10, min_delta=0.01)
# Define XGBoost parameters with early stopping callback
params = {
'objective': 'binary:logistic',
'eval_metric': 'error',
'learning_rate': 0.1,
'max_depth': 3,
'subsample': 0.8,
'colsample_bytree': 0.8,
'callbacks': [early_stop]
}
# Train the model with early stopping
model = xgb.XGBClassifier(**params)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
# Print the best iteration and best score
print(f"Best iteration: {model.best_iteration}")
print(f"Best score: {model.best_score}")
The min_delta
parameter sets the minimum improvement required to continue training. A smaller min_delta
will allow training to continue for more rounds, while a larger value will be stricter and stop earlier. In this example, we set min_delta=0.01
, meaning that training will stop if the validation error does not improve by at least 0.01 for 10 consecutive rounds (as specified by the rounds
parameter).
Choosing an appropriate min_delta
value depends on the problem and the evaluation metric. It’s a good idea to start with a default value like 0.01 and adjust based on the model’s performance on the validation set. If the model is stopping too early, you may want to decrease min_delta
to allow for more fine-tuning. Conversely, if the model is overfitting, increasing min_delta
can help stop training earlier.
In addition to min_delta
, it’s essential to monitor other early stopping parameters like early_stopping_rounds
(or rounds
in the native API) and eval_metric
. The early_stopping_rounds
parameter determines the number of consecutive rounds without improvement before stopping, while eval_metric
specifies the evaluation metric to use for early stopping.
By carefully tuning the early stopping tolerance and related parameters, you can effectively regularize your XGBoost models and find the optimal balance between model complexity and generalization performance.