Incrementally training an XGBoost model round-by-round allows you to monitor its performance improvement over time and potentially identify an optimal number of training iterations.
This example demonstrates how to train an XGBoost classifier incrementally, one round at a time, while reporting the training and testing accuracy after each round.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier
# Generate synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=8, n_redundant=2,
n_classes=2, weights=[0.6, 0.4], random_state=42)
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize XGBoost classifier
model = XGBClassifier(n_estimators=1, random_state=42)
model.fit(X_train, y_train)
# Train model incrementally for multiple rounds
num_rounds = 10
for i in range(num_rounds):
model.fit(X_train, y_train, xgb_model=model.get_booster())
# Make predictions on train and test data
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
# Calculate and print accuracy scores
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)
print(f"Round {i+1}/{num_rounds} - Train Accuracy: {train_accuracy:.4f}, Test Accuracy: {test_accuracy:.4f}")
Let’s break down the key steps:
Generate a synthetic binary classification dataset using
make_classification
from scikit-learn.Split the data into training and test sets.
Initialize an XGBoost classifier with
n_estimators=1
, meaning eachfit
call will perform one boosting iteration.Train the model incrementally for
num_rounds
iterations:- For each round, call
model.fit
with thexgb_model
parameter set to the current booster, allowing the model to continue training from its previous state. - Make predictions on the training and test data using the model at its current state.
- Calculate the training and testing accuracy scores using
accuracy_score
from scikit-learn. - Print the round number and the corresponding accuracy scores.
- For each round, call
The output will display the training and testing accuracy for each round, allowing you to observe how the model’s performance improves with each additional training iteration.
Note that the specific hyperparameters used for the XGBoost model and the number of training rounds (num_rounds
) can be adjusted based on your specific dataset and requirements.
By incrementally training the model and evaluating its performance at each round, you can gain insights into how the model learns over time and potentially identify an optimal stopping point to prevent overfitting.