When using early stopping with XGBoost, the final model should be fit on all available data to maximize performance before being used for predictions on new records.
There are a few approaches we can use to fit a final XGBoost model for inference on out-of-sample data when using early stopping.
Use Hold-Out Data as Validation Set
In this simple approach, portion of the dataset must be held apart in a validation dataset to allow early stopping to correctly determine when to stop training.
The example below shows how to train a final XGBoost classifier with early stopping on the full dataset and then make a prediction on a single out-of-sample record.
# XGBoosting.com
# XGBoost Fit Final Model With Early Stopping and Predict on Out-Of-Sample Data
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
# Generate a synthetic dataset for binary classification
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
n_redundant=2, random_state=42)
# Split data into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# Create an XGBClassifier with early stopping
model = XGBClassifier(n_estimators=1000, learning_rate=0.1, random_state=42,
early_stopping_rounds=10, eval_metric='error')
# Fit the model on the full dataset, using the validation set for early stopping
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
# Define a new input record
new_record = [[1.2, 3.4, -0.5, 2.1, 0.3, -1.8, 0.6, 1.4, -0.7, 2.2]]
# Make a prediction on the new record
prediction = model.predict(new_record)
print(f"Predicted class for the new record: {prediction[0]}")
Here’s a step-by-step breakdown:
We generate a synthetic binary classification dataset using scikit-learn’s
make_classification
.The data is split into training and validation sets using
train_test_split
. The validation set will be used for early stopping.An
XGBClassifier
is instantiated with a largen_estimators
andearly_stopping_rounds
set. Theeval_metric
is set to ’error’ for binary classification.The model is fit on
X_train
andy_train
, witheval_set
specifying the validation data to be used for early stopping.A
new_record
is defined as a 2D list, representing a single out-of-sample data point.model.predict
is called onnew_record
to obtain the predicted class label.The predicted class label for the new record is printed.
By fitting the final XGBoost model on all available data using early stopping, we ensure it has seen all relevant patterns during training while still avoiding overfitting. The resulting model is well-suited for making predictions on previously unseen records.
Use Early Stopping to Choose n_estimators
For Final Model
An alternative approach to fitting a final XGBoost model with early stopping is to first fit the model with early stopping to determine the optimal number of rounds, then refit the model on the full dataset using this number of rounds.
This ensures the model sees all available data during training while still leveraging the benefits of early stopping.
Here’s an example demonstrating this approach:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
# Generate a synthetic dataset for binary classification
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
n_redundant=2, random_state=42)
# Split data into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# Create an XGBClassifier with early stopping
model = XGBClassifier(n_estimators=1000, learning_rate=0.1, random_state=42,
early_stopping_rounds=10, eval_metric='error')
# Fit the model with early stopping
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
# Get the optimal number of rounds
best_rounds = model.best_iteration
# Refit the model on the full dataset with the optimal number of rounds
final_model = XGBClassifier(n_estimators=best_rounds, learning_rate=0.1, random_state=42)
final_model.fit(X, y)
# Define a new input record
new_record = [[1.2, 3.4, -0.5, 2.1, 0.3, -1.8, 0.6, 1.4, -0.7, 2.2]]
# Make a prediction on the new record using the refitted model
prediction = final_model.predict(new_record)
print(f"Predicted class for the new record: {prediction[0]}")
Here’s how this approach works:
As before, we generate a synthetic binary classification dataset and split it into training and validation sets.
We create an
XGBClassifier
with early stopping parameters and fit it on the training data, using the validation set for early stopping.After fitting, we retrieve the best number of rounds found during early stopping using
model.best_iteration
. This represents the optimal number of rounds to prevent overfitting.We create a new
XGBClassifier
calledfinal_model
, settingn_estimators
tobest_rounds
. This model will be trained on the full dataset.final_model
is fit on the entire datasetX
andy
, using the optimal number of rounds determined by early stopping.As before, we define a
new_record
and usefinal_model.predict
to make a prediction on this unseen data point.The predicted class label for the new record is printed.
By refitting the XGBoost model on the full dataset using the best number of rounds found via early stopping, we ensure the final model benefits from seeing all available data during training while still maintaining the regularization effect of early stopping to prevent overfitting.