XGBoost Early Stopping Get Best Round (Iteration)

When using early stopping in XGBoost, the training process automatically stops at the iteration that achieves the best performance on the validation set.

This iteration number can be retrieved and used to train a new model with the optimal number of boosting rounds, avoiding the need to repeat the early stopping process.

Retrieving the best iteration from an early-stopped XGBoost model is straightforward.

After training a model with early stopping, access the best_iteration attribute to get the iteration number with the best validation score.

Use this value to set the n_estimators parameter when training the final model.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

# Create a synthetic dataset for regression
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split the data into train, validation, and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Train an XGBoost model with early stopping
xgb_model = XGBRegressor(n_estimators=1000, early_stopping_rounds=10)
xgb_model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=False)

# Retrieve the best iteration number
best_iteration = xgb_model.best_iteration

# Train a new model with the best iteration number
final_model = XGBRegressor(n_estimators=best_iteration)
final_model.fit(X_train, y_train)

# Evaluate the final model on the test set
score = final_model.score(X_test, y_test)
print(f"Best iteration: {best_iteration}, Test R^2 score: {score:.4f}")

By retrieving the best iteration from the early-stopped model, you can efficiently train a final model with the optimal number of boosting rounds. This approach saves time and resources compared to running early stopping again or manually tuning the n_estimators parameter.

When training the final model, use the entire training dataset (combining the training and validation sets used for early stopping). This ensures that the final model benefits from all available training data.

Retrieving the best iteration from an early-stopped XGBoost model is a simple yet effective way to streamline the model training process. By leveraging the information gained during early stopping, you can train a final model with the optimal number of boosting rounds, leading to improved performance and reduced training time.

See Also