Early stopping is a useful technique to prevent overfitting in XGBoost.
However, setting early_stopping_rounds
in the fit()
method is deprecated and will result in a warning:
UserWarning: `early_stopping_rounds` in `fit` method is deprecated for better
compatibility with scikit-learn, use `early_stopping_rounds` in constructor or`set_params` instead.
Instead, you should set it as a parameter when initializing the model.
If early_stopping_rounds
is set in both as a model configuration parameter and in fit()
, then and error is reported:
ValueError: 2 different `early_stopping_rounds` are provided.
Use the one in constructor or `set_params` instead.
Here’s how to properly configure early_stopping_rounds
in XGBoost:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split data into train, validation, and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)
# Initialize XGBRegressor with early_stopping_rounds
model = XGBRegressor(n_estimators=1000, learning_rate=0.01, early_stopping_rounds=10, random_state=42)
# Train model with early stopping
model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=False)
# Make predictions and evaluate performance
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
# Print the optimal number of rounds
print(f"Optimal number of rounds: {model.best_iteration}")
In this example:
We generate a synthetic regression dataset using
make_regression
.We split the data into train, validation, and test sets. The validation set will be used for early stopping.
We initialize an
XGBRegressor
withearly_stopping_rounds=10
in the model parameters. This means training will stop if the validation score doesn’t improve for 10 consecutive rounds.We train the model using
fit()
, passing the validation set aseval_set
. This allows XGBoost to monitor the validation score during training.We make predictions on the test set and evaluate the model’s mean squared error.
Finally, we print the optimal number of rounds (trees) found by early stopping, accessible through
model.best_iteration
.
By setting early_stopping_rounds
in the model parameters and providing a validation set, we can leverage early stopping to find the optimal number of rounds and prevent overfitting, all while avoiding the deprecated fit()
parameter.