XGBoost Confidence Interval using Bootstrap and Percentiles

Estimating confidence intervals for XGBoost model performance metrics is crucial for quantifying the uncertainty associated with these estimates.

The bootstrap method, a nonparametric resampling technique, can be used to estimate confidence intervals for various performance metrics without making distributional assumptions.

This example demonstrates how to use the bootstrap to estimate a 95% confidence interval for the accuracy of an XGBoost model trained on a synthetic binary classification dataset.

# XGBoosting.com
# Evaluate XGBoost Model Confidence Interval using Bootstrap Method
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
import numpy as np

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=2, random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define a function to compute bootstrap replicates of the accuracy metric
def bootstrap_accuracy(model, X, y, n_bootstraps=1000):
    accuracies = []
    for _ in range(n_bootstraps):
        idx = np.random.choice(len(X), size=len(X), replace=True)
        X_boot, y_boot = X[idx], y[idx]
        model.fit(X_boot, y_boot)
        accuracies.append(model.score(X_test, y_test))
    return np.array(accuracies)

# Instantiate an XGBClassifier with default hyperparameters
model = XGBClassifier(random_state=42)

# Compute the bootstrap confidence interval for accuracy
accuracies = bootstrap_accuracy(model, X_train, y_train)
ci_low, ci_high = np.percentile(accuracies, [2.5, 97.5])

print(f"Mean Accuracy: {accuracies.mean():.3f}")
print(f"95% CI: [{ci_low:.3f}, {ci_high:.3f}]")

The code first generates a synthetic binary classification dataset using scikit-learn’s make_classification function. The data is then split into train and test sets.

Next, we define a bootstrap_accuracy function that takes a model, training data, and the number of bootstrap replicates as input. This function resamples the training data with replacement, fits the model on each bootstrap sample, and computes the accuracy on the test set. The function returns an array of bootstrap accuracy replicates.

An XGBClassifier is instantiated with default hyperparameters, and the bootstrap_accuracy function is called with the model and training data to compute the bootstrap replicates of accuracy.

Finally, the 2.5th and 97.5th percentiles of the bootstrap accuracy replicates are computed to obtain the 95% confidence interval bounds. The mean accuracy and confidence interval are printed.

See Also