XGBoost Evaluate Model using Nested k-Fold Cross-Validation

Nested k-fold cross-validation is a powerful technique for getting an unbiased estimate of your XGBoost model’s performance while simultaneously tuning its hyperparameters.

It involves an inner loop for hyperparameter tuning within each fold of the outer cross-validation loop, providing a more robust evaluation than regular k-fold cross-validation with fixed hyperparameters.

from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier
import numpy as np

# Generate a synthetic classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# Define the hyperparameter grid for XGBoost
param_grid = {
    'max_depth': [3, 5, 7],
    'learning_rate': [0.1, 0.01, 0.001],
    'n_estimators': [50, 100, 200]
}

# Create the outer and inner cross-validation objects
outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
inner_cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)

# Perform nested cross-validation
outer_scores = []

for train_idx, test_idx in outer_cv.split(X, y):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    # Perform hyperparameter tuning with inner cross-validation
    model = XGBClassifier(random_state=42)
    grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=inner_cv, scoring='accuracy')
    grid_search.fit(X_train, y_train)

    # Train the model with the best hyperparameters on the outer training fold
    best_model = grid_search.best_estimator_
    best_model.fit(X_train, y_train)

    # Evaluate the model on the outer validation fold
    y_pred = best_model.predict(X_test)
    score = accuracy_score(y_test, y_pred)
    outer_scores.append(score)

# Report the mean and standard deviation of the scores across the outer folds
print(f"Nested cross-validation scores: {outer_scores}")
print(f"Mean score: {np.mean(outer_scores):.3f} +/- {np.std(outer_scores):.3f}")

Here’s what’s happening:

We generate a synthetic binary classification dataset using scikit-learn’s make_classification function.
We define a hyperparameter grid for XGBoost, specifying different values for max_depth, learning_rate, and n_estimators.
We create a StratifiedKFold object for the outer cross-validation loop and another one for the inner loop.
We iterate over the outer cross-validation splits. For each split:
- We perform hyperparameter tuning using GridSearchCV with the inner cross-validation object.
- We train a new XGBoost model with the best hyperparameters found in the inner loop on the outer training fold.
- We evaluate the model on the outer validation fold and store the accuracy score.
We report the individual scores from each outer fold, as well as their mean and standard deviation.

By using nested cross-validation, we obtain an unbiased estimate of our model’s performance while also finding the best hyperparameters for each outer fold. This helps ensure that our model’s performance is not overly optimistic due to information leakage between the hyperparameter tuning and model evaluation steps.

See Also