Early stopping is a simple yet effective regularization technique that prevents overfitting in XGBoost models by stopping the training process when the model’s performance on a validation set stops improving.
By avoiding unnecessary boosting rounds, early stopping helps find the optimal number of iterations automatically, reducing training time and improving generalization.
Implementing early stopping in XGBoost is straightforward.
Simply set the early_stopping_rounds
parameter to the number of rounds to wait for improvement and provide a validation set using the eval_set
parameter during training.
XGBoost will monitor the model’s performance on the validation set at each boosting round and stop training if no improvement is observed for the specified number of consecutive rounds.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
# Create a synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split the dataset into training, validation, and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)
# Define the XGBoost regressor with early stopping
xgb_model = XGBRegressor(objective='reg:squarederror', n_estimators=1000, early_stopping_rounds=10)
# Train the model with early stopping
xgb_model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=True)
# Predict on the test set
y_pred = xgb_model.predict(X_test)
The appropriate value for early_stopping_rounds
depends on the dataset size and complexity. It is typically set between 10 and 100 rounds. Higher values allow more patience for improvement but may lead to overfitting if set too high. When in doubt, start with a value of 10 and adjust based on the model’s performance.
When training with early stopping enabled, keep an eye on the training log. XGBoost will display the performance metrics for the validation set at each boosting round. Look for the round with the best validation performance. Training will stop if no improvement is observed for early_stopping_rounds
consecutive rounds.
Early stopping provides a simple and effective way to prevent overfitting in XGBoost models. By automatically finding the optimal number of boosting rounds, it saves training time and helps improve the model’s generalization performance. Combine early stopping with other regularization techniques like L1 and L2 regularization to further enhance your XGBoost models.