Callbacks in XGBoost allow for customizing the training process, such as implementing early stopping or logging evaluation metrics.
As of XGBoost version 1.3, specifying callbacks in the fit()
method is deprecated and results in a warning:
UserWarning: `callbacks` in `fit` method is deprecated for better compatibility
with scikit-learn, use `callbacks` in constructor or`set_params` instead.
Specifying the callbacks
parameter both as a model parameter and a parameter to fit()
results in an error:
ValueError: 2 different `callbacks` are provided. Use the one in constructor or `set_params` instead.
Instead, callbacks should be specified as model parameters.
Here’s how you can configure the early stopping and evaluation log printing callbacks as model parameters:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from xgboost import callback
# Generate synthetic multi-class classification dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=3, n_redundant=1, random_state=42)
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize XGBClassifier with callbacks
model = XGBClassifier(
objective='multi:softmax',
num_class=3,
callbacks=[
callback.EarlyStopping(rounds=10, min_delta=1e-3, verbose=True),
callback.EvaluationMonitor(period=1, show_stdv=True)
]
)
# Fit model with evaluation set and metric
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], eval_metric='merror')
# Print the evaluation log
print(model.evals_result())
The EarlyStopping
callback stops training if the specified metric (in this case, ‘merror’ for multiclass error rate) does not improve by at least min_delta
for rounds
number of iterations. Setting verbose=True
prints a message when early stopping is triggered.
The EvaluationMonitor
callback logs the evaluation metrics during training. Setting period=1
logs the metrics after each boosting round, and show_stdv=True
includes the standard deviation of the metrics.
By configuring the callbacks as model parameters and specifying the eval_set
and eval_metric
in fit()
, we can monitor the model’s performance on the test set during training and stop training early if the performance plateaus.
The evals_result()
method returns a dictionary of evaluation results, which can be used to analyze the model’s performance across the training rounds.
By using callbacks, you can gain more control over the training process and make informed decisions about when to stop training, ultimately leading to more efficient model development.