XGBoosting Home | About | Contact | Examples

Configure XGBoost "early_stopping_rounds" Parameter

The early_stopping_rounds parameter in XGBoost allows for early termination of the training process if the model’s performance on a validation set does not improve for a specified number of rounds. This parameter helps prevent overfitting and saves computational resources by stopping the training when the model’s performance plateaus.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# Split the dataset into training, validation, and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with early_stopping_rounds
model = XGBClassifier(n_estimators=1000, early_stopping_rounds=10, eval_metric='logloss')

# Fit the model with early stopping
model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=True)

# Report the best score on the best iteration
print(f'Best score {model.best_score}, Best iteration {model.best_iteration}')

# Make predictions using the model at the best iteration
predictions = model.predict(X_test)

The best performing model as defined on the validation set is stored. It’s performance is available via the best_score property and its iteration is available via the best_iteration variable.

Making predictions with the fit model will use the model stored at the best_iteration.

Understanding the “early_stopping_rounds” Parameter

The early_stopping_rounds parameter specifies the number of rounds (iterations) to continue training after the last improvement in the model’s performance on the validation set. Early stopping helps prevent overfitting by terminating training when the model’s performance on unseen data (validation set) stops improving. This technique also saves computational resources by avoiding unnecessary iterations.

Configuring “early_stopping_rounds”

When configuring early_stopping_rounds, there is a trade-off between allowing the model to continue learning and preventing overfitting. A smaller value may stop training too early, before the model has fully learned from the data. On the other hand, a larger value may allow the model to overfit by continuing to train after it has reached its optimal performance on the validation set.

The optimal value for early_stopping_rounds depends on the size and complexity of the dataset and model. As a general guideline, monitor the model’s performance on the validation set during training to determine an appropriate value. If the performance plateaus for a considerable number of rounds, it may indicate that the model has reached its optimal point and further training could lead to overfitting.

Practical Tips



See Also