Configure XGBoost Early Stopping Tolerance

Early stopping is a regularization technique that helps prevent overfitting in XGBoost models by halting the training process when the model’s performance on a validation set stops improving.

By setting the min_delta parameter in the xgboost.callback.EarlyStopping callback, you can control the minimum improvement required to continue training, effectively tuning the sensitivity of the early stopping mechanism.

This example demonstrates how to configure the early stopping tolerance using the native XGBoost API with xgboost.callback.EarlyStopping(min_delta=0.01):

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import xgboost as xgb

# Load the breast cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Set up early stopping callback with min_delta=0.01
early_stop = xgb.callback.EarlyStopping(rounds=10, min_delta=0.01)

# Define XGBoost parameters with early stopping callback
params = {
    'objective': 'binary:logistic',
    'eval_metric': 'error',
    'learning_rate': 0.1,
    'max_depth': 3,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'callbacks': [early_stop]
}

# Train the model with early stopping
model = xgb.XGBClassifier(**params)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])

# Print the best iteration and best score
print(f"Best iteration: {model.best_iteration}")
print(f"Best score: {model.best_score}")

The min_delta parameter sets the minimum improvement required to continue training. A smaller min_delta will allow training to continue for more rounds, while a larger value will be stricter and stop earlier. In this example, we set min_delta=0.01, meaning that training will stop if the validation error does not improve by at least 0.01 for 10 consecutive rounds (as specified by the rounds parameter).

Choosing an appropriate min_delta value depends on the problem and the evaluation metric. It’s a good idea to start with a default value like 0.01 and adjust based on the model’s performance on the validation set. If the model is stopping too early, you may want to decrease min_delta to allow for more fine-tuning. Conversely, if the model is overfitting, increasing min_delta can help stop training earlier.

In addition to min_delta, it’s essential to monitor other early stopping parameters like early_stopping_rounds (or rounds in the native API) and eval_metric. The early_stopping_rounds parameter determines the number of consecutive rounds without improvement before stopping, while eval_metric specifies the evaluation metric to use for early stopping.

By carefully tuning the early stopping tolerance and related parameters, you can effectively regularize your XGBoost models and find the optimal balance between model complexity and generalization performance.

See Also