K-fold cross-validation is a widely used technique for assessing the performance of machine learning models, providing a more robust estimate compared to a single train-test split.

By combining k-fold cross-validation with confidence interval estimation, we can quantify the uncertainty associated with model performance metrics, enabling more informed decision-making.

This example demonstrates how to estimate confidence intervals for various performance metrics (accuracy, precision, recall, and F1-score) of an XGBoost model trained on a synthetic multi-class classification dataset using k-fold cross-validation.

```
# XGBoosting.com
# Estimate XGBoost Model Performance Confidence Intervals using K-Fold Cross-Validation
from sklearn.datasets import make_classification
from sklearn.model_selection import KFold, cross_val_score
from xgboost import XGBClassifier
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Generate a synthetic multi-class classification dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=10, n_informative=6, n_redundant=2, random_state=42)
# Define a function to compute cross-validated performance metrics and confidence intervals
def cv_metrics_ci(model, X, y, cv=5, scoring=['accuracy', 'precision_micro', 'recall_micro', 'f1_micro']):
metrics = {}
for metric in scoring:
scores = cross_val_score(model, X, y, cv=cv, scoring=metric)
ci_low, ci_high = np.percentile(scores, [2.5, 97.5])
metrics[metric] = {'mean': scores.mean(), 'ci_low': ci_low, 'ci_high': ci_high}
return metrics
# Instantiate an XGBClassifier with default hyperparameters
model = XGBClassifier(random_state=42)
# Compute cross-validated performance metrics and confidence intervals
metrics = cv_metrics_ci(model, X, y, cv=5)
# Print the results
for metric, scores in metrics.items():
print(f"{metric.capitalize()}:")
print(f" Mean: {scores['mean']:.3f}")
print(f" 95% CI: [{scores['ci_low']:.3f}, {scores['ci_high']:.3f}]")
```

The code begins by generating a synthetic multi-class classification dataset using scikit-learn’s `make_classification`

function. The dataset consists of 1000 samples, 3 classes, and 10 features.

Next, we define a `cv_metrics_ci`

function that takes a model, input data (X and y), the number of cross-validation folds (default=5), and a list of scoring metrics as input. The function computes cross-validated scores for each specified metric using scikit-learn’s `cross_val_score`

function and calculates the 95% confidence interval using the 2.5th and 97.5th percentiles of the scores. The function returns a dictionary containing the mean and confidence interval bounds for each metric.

An `XGBClassifier`

is instantiated with default hyperparameters, and the `cv_metrics_ci`

function is called with the model, input data, and desired scoring metrics (accuracy, precision, recall, and F1-score) to compute the cross-validated performance metrics and their confidence intervals.

Finally, the mean and 95% confidence interval for each performance metric are printed, providing a comprehensive assessment of the XGBoost model’s performance on the synthetic multi-class classification dataset.