Combining early stopping regularization with cross-validation in XGBoost is a powerful technique to prevent overfitting and improve model generalization.
By using early stopping within each fold of cross-validation, you can automatically tune the optimal number of boosting rounds for your model.
This requires that we manually traverse each fold and separate out a validation set used to determine the number of rounds. We are effectively evaluating the expected performance the model under the condition of early stopping. The final model would be fit in the same manner using a hold out validation dataset.
This example demonstrates how to implement this approach using XGBoost and scikit-learn.
# XGBoosting
# XGBoost Early Stopping With Cross-Validation
import xgboost as xgb
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score
# Create a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)
# Configure cross-validation and early stopping
n_splits = 5
early_stopping_rounds = 10
kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
# Perform cross-validation with early stopping
test_scores = []
best_rounds = []
for train_index, test_index in kf.split(X, y):
X_train_fold, X_test_fold = X[train_index], X[test_index]
y_train_fold, y_test_fold = y[train_index], y[test_index]
# Split train set into train and validation
X_train_fold, X_val, y_train_fold, y_val = train_test_split(X_train_fold, y_train_fold, test_size=0.2, random_state=42)
# Prepare the model
model = xgb.XGBClassifier(n_estimators=100, learning_rate=0.1, objective='binary:logistic', early_stopping_rounds=early_stopping_rounds, random_state=42)
# Fit model on train fold and use validation for early stopping
model.fit(X_train_fold, y_train_fold, eval_set=[(X_val,y_val)], verbose=False)
# Predict on test set
y_pred_test = model.predict(X_test_fold)
test_score = accuracy_score(y_test_fold, y_pred_test)
print(f'>{test_score}')
test_scores.append(test_score)
print(f"CV Average Accuracy: {np.mean(test_scores)}")
We begin by creating a synthetic binary classification dataset using scikit-learn’s make_classification
function. We then configure the cross-validation and early stopping parameters, specifying the number of splits (n_splits
) and the number of rounds to wait for improvement (early_stopping_rounds
). We use StratifiedKFold
to ensure that the class distribution is preserved in each fold.
Inside the cross-validation loop, we split the data into train and test folds based on the indices provided by StratifiedKFold
. We further split the train fold into a training set and a validation set using train_test_split
. This validation set will be used for early stopping.
It is important that the test set is not used for early stopping as it will make the accuracy estimate on the test set invalid.
We create an instance of the XGBClassifier
with the desired hyperparameters, including the number of estimators, learning rate, and objective function. We set early_stopping_rounds
to the predefined value.
We then fit the model on the training fold using model.fit()
, specifying the validation set (X_val
, y_val
) for early stopping via the eval_set
parameter. The model will monitor the performance on the validation set and stop training if no improvement is observed for early_stopping_rounds
consecutive rounds.
After training, we predict on the test fold using model.predict()
and calculate the accuracy score using accuracy_score
from scikit-learn. We print the accuracy score for each fold and append it to the test_scores
list.
Finally, we compute and print the average accuracy score across all folds using np.mean(test_scores)
.
By integrating early stopping with stratified k-fold cross-validation, you can automatically tune the regularization strength of your XGBoost model for each fold. This approach helps prevent overfitting, improves generalization, and provides a more robust estimate of the model’s performance. It is particularly useful when working with imbalanced datasets or when you want to ensure that your model’s performance is consistent across different subsets of the data.