Configure XGBoost "iteration_range" Parameter for predict()

The iteration_range parameter to xgboost.predict() in the native API and predict() in the scikit-learn API allows you to make predictions using a specific range of boosting rounds from a trained XGBoost model.

This is useful for analyzing model performance at different stages of training, as you can see how the predictions evolve as more boosting rounds are used.

The iteration_range parameter was formally called ntree_limit, which is now deprecated.

Here’s an example that demonstrates how to use iteration_range to make predictions using different ranges of boosting rounds:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
import numpy as np
import matplotlib.pyplot as plt

# Generate a synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an XGBoost model with a large number of boosting rounds
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X_train, y_train)

# Make predictions using different iteration ranges
iteration_ranges = [(0, 10), (10, 20), (20, 30)]
predictions = []

for start, end in iteration_ranges:
    preds = model.predict(X_test, iteration_range=(start, end))
    predictions.append(preds)

# Visualize the predictions vs actual values for each iteration range
fig, axes = plt.subplots(1, 3, figsize=(15, 5), sharey=True)
fig.suptitle("Predictions vs Actual Values for Different Iteration Ranges")

for i, (start, end) in enumerate(iteration_ranges):
    axes[i].scatter(y_test, predictions[i])
    axes[i].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
    axes[i].set_xlabel("Actual Values")
    axes[i].set_ylabel("Predicted Values")
    axes[i].set_title(f"Rounds {start} to {end}")

plt.tight_layout()
plt.show()

In this example, we generate a synthetic regression dataset using scikit-learn’s make_regression function and split it into train and test sets.

We train an XGBoost regressor with 100 boosting rounds. Then, we define a list of iteration ranges we want to use for making predictions. In this case, we use ranges [0, 10], [10, 20], and [20, 30].

We iterate over these ranges and use iteration_range parameter in model.predict() to make predictions using only the specified range of boosting rounds.

Finally, we visualize the predictions vs actual values for each iteration range using a scatter plot. This allows us to see how the model’s predictions evolve as more boosting rounds are used.

Note that the exact mechanism of how iteration_range works under the hood in terms of the trained boosters it uses for prediction is not covered here, as it would require more in-depth knowledge of XGBoost’s internal workings.

Using iteration_range can be handy for understanding how your model’s performance changes with the number of boosting rounds, helping you find a good balance between model complexity and generalization.

See Also