XGBoost Early Stopping Report Verbose Output

Setting the verbose parameter to fit() in XGBoost allows you to monitor the model’s performance on the validation set during training when using early stopping.

The parameter is enabled by default, but can be disabled verbose=False to hide verbose output. It is a good idea to manually enable verbose output when using early stopping.

This provides valuable insights into the model’s learning progress and helps identify the optimal number of boosting rounds.

Verbose reporting is especially useful when tuning the early_stopping_rounds parameter to strike the right balance between model complexity and generalization performance.

Here’s a concise example demonstrating how to enable verbose reporting with early stopping in XGBoost:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Create a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, n_features=20, random_state=42)

# Split the data into train, validation, and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Define an XGBoost classifier with early stopping and verbose reporting
xgb_clf = XGBClassifier(n_estimators=100, early_stopping_rounds=10)

# Train the model with early stopping and verbose reporting
xgb_clf.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=True)

# Print the optimal number of boosting rounds
print(f"Optimal number of boosting rounds: {xgb_clf.best_iteration}")

When you run this code, XGBoost will display the performance metrics for the validation set at each boosting round.

The output will look similar to this:

[0]	validation_0-logloss:0.53078
[1]	validation_0-logloss:0.43842
[2]	validation_0-logloss:0.38369
[3]	validation_0-logloss:0.33073
[4]	validation_0-logloss:0.30570
[5]	validation_0-logloss:0.29084
[6]	validation_0-logloss:0.28200
[7]	validation_0-logloss:0.26735
[8]	validation_0-logloss:0.25545
[9]	validation_0-logloss:0.25256
[10]	validation_0-logloss:0.24748
[11]	validation_0-logloss:0.25274
[12]	validation_0-logloss:0.24572
[13]	validation_0-logloss:0.24988
[14]	validation_0-logloss:0.24946
[15]	validation_0-logloss:0.24711
[16]	validation_0-logloss:0.24230
[17]	validation_0-logloss:0.24797
[18]	validation_0-logloss:0.24864
[19]	validation_0-logloss:0.24689
[20]	validation_0-logloss:0.24670
[21]	validation_0-logloss:0.25180
[22]	validation_0-logloss:0.24848
[23]	validation_0-logloss:0.25263
[24]	validation_0-logloss:0.24901
[25]	validation_0-logloss:0.25342
[26]	validation_0-logloss:0.25279
Optimal number of boosting rounds: 16

In this example, the validation error is reported at each boosting round. The model’s performance improves until round 16, after which no further improvement is observed for 10 consecutive rounds (as specified by early_stopping_rounds). Therefore, training stops at round 26, and the optimal number of boosting rounds is determined to be 16.

By enabling verbose reporting, you can closely monitor the model’s performance during training and gain insights into how the validation error changes with each boosting round. This information is invaluable when tuning the early_stopping_rounds parameter and determining the optimal model complexity for your specific problem.

See Also