XGBoost "best_iteration" Property

The best_iteration attribute in XGBoost plays a crucial role in implementing early stopping, a technique that prevents overfitting and improves model generalization by halting the training process when the model’s performance on a validation set stops improving.

This example demonstrates how to access and utilize the best_iteration attribute to make predictions using the optimal number of boosting rounds determined by early stopping.

import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

# Load the Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an XGBoost model with early stopping
model = XGBRegressor(n_estimators=1000, learning_rate=0.1, subsample=0.8, colsample_bytree=0.8, early_stopping_rounds=10, random_state=42)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=True)

# Access the best_iteration attribute
best_iteration = model.best_iteration

# Make predictions using the best_iteration
y_pred = model.predict(X_val, iteration_range=(0, best_iteration))

print(f"Best iteration: {best_iteration}")

Early stopping is a regularization technique that helps prevent overfitting by monitoring the model’s performance on a validation set during training. The early_stopping_rounds parameter specifies the number of consecutive iterations without improvement in the validation metric before training is stopped.

The best_iteration attribute stores the optimal number of boosting rounds determined by early stopping. It represents the iteration at which the model achieved the best performance on the validation set.

To effectively utilize the best_iteration attribute, consider the following tips:

Use best_iteration to save the trained model with the optimal number of boosting rounds. This ensures that the saved model represents the best-performing state.
When making predictions with the trained model, use the iteration_range parameter in the predict() method to specify the range of iterations up to best_iteration. This guarantees that predictions are made using the model’s best performance.
Monitor the best_iteration value during training to assess the model’s convergence and progress. A relatively small best_iteration compared to the total number of iterations might indicate that the model is converging quickly or that the learning rate is too high.

By leveraging the best_iteration attribute, you can ensure that your XGBoost model is using the optimal number of boosting rounds determined by early stopping, leading to improved generalization and prevention of overfitting.

See Also