XGBoost Confidence Interval using k-Fold Cross-Validation

K-fold cross-validation is a widely used technique for assessing the performance of machine learning models, providing a more robust estimate compared to a single train-test split.

By combining k-fold cross-validation with confidence interval estimation, we can quantify the uncertainty associated with model performance metrics, enabling more informed decision-making.

This example demonstrates how to estimate confidence intervals for various performance metrics (accuracy, precision, recall, and F1-score) of an XGBoost model trained on a synthetic multi-class classification dataset using k-fold cross-validation.

# XGBoosting.com
# Estimate XGBoost Model Performance Confidence Intervals using K-Fold Cross-Validation
from sklearn.datasets import make_classification
from sklearn.model_selection import KFold, cross_val_score
from xgboost import XGBClassifier
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Generate a synthetic multi-class classification dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=10, n_informative=6, n_redundant=2, random_state=42)

# Define a function to compute cross-validated performance metrics and confidence intervals
def cv_metrics_ci(model, X, y, cv=5, scoring=['accuracy', 'precision_micro', 'recall_micro', 'f1_micro']):
    metrics = {}
    for metric in scoring:
        scores = cross_val_score(model, X, y, cv=cv, scoring=metric)
        ci_low, ci_high = np.percentile(scores, [2.5, 97.5])
        metrics[metric] = {'mean': scores.mean(), 'ci_low': ci_low, 'ci_high': ci_high}
    return metrics

# Instantiate an XGBClassifier with default hyperparameters
model = XGBClassifier(random_state=42)

# Compute cross-validated performance metrics and confidence intervals
metrics = cv_metrics_ci(model, X, y, cv=5)

# Print the results
for metric, scores in metrics.items():
    print(f"{metric.capitalize()}:")
    print(f"  Mean: {scores['mean']:.3f}")
    print(f"  95% CI: [{scores['ci_low']:.3f}, {scores['ci_high']:.3f}]")

The code begins by generating a synthetic multi-class classification dataset using scikit-learn’s make_classification function. The dataset consists of 1000 samples, 3 classes, and 10 features.

Next, we define a cv_metrics_ci function that takes a model, input data (X and y), the number of cross-validation folds (default=5), and a list of scoring metrics as input. The function computes cross-validated scores for each specified metric using scikit-learn’s cross_val_score function and calculates the 95% confidence interval using the 2.5th and 97.5th percentiles of the scores. The function returns a dictionary containing the mean and confidence interval bounds for each metric.

An XGBClassifier is instantiated with default hyperparameters, and the cv_metrics_ci function is called with the model, input data, and desired scoring metrics (accuracy, precision, recall, and F1-score) to compute the cross-validated performance metrics and their confidence intervals.

Finally, the mean and 95% confidence interval for each performance metric are printed, providing a comprehensive assessment of the XGBoost model’s performance on the synthetic multi-class classification dataset.

See Also