XGBoost Evaluate Model using Leave-One-Out Cross-Validation (LOOCV)

Evaluate

Leave-One-Out Cross-Validation (LOOCV) is a cross-validation technique that’s particularly useful when working with small datasets. In LOOCV, the model is trained on n-1 samples and tested on the single left-out sample. This process is repeated for each data point, providing a robust estimate of the model’s performance. Scikit-learn’s LeaveOneOut class makes it straightforward to implement LOOCV.

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score, LeaveOneOut
from xgboost import XGBClassifier

# Load a small dataset (digits with 2 classes)
X, y = load_digits(n_class=2, return_X_y=True)

# Create an XGBClassifier
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)

# Create a LeaveOneOut object
cv = LeaveOneOut()

# Perform LOOCV
cv_scores = cross_val_score(model, X, y, cv=cv, scoring='accuracy')

# Print the cross-validation scores
print("Cross-validation scores:", cv_scores)
print(f"Mean cross-validation score: {np.mean(cv_scores):.2f}")

Here’s what’s happening:

We load a small dataset (digits with 2 classes) and create an XGBClassifier with specified hyperparameters.
We create a LeaveOneOut object, which will be used as the cross-validation strategy.
We use cross_val_score() to perform LOOCV, specifying the model, input features (X), target variable (y), the LeaveOneOut object (cv), and the scoring metric (accuracy).
We print the individual cross-validation scores and their mean.

LOOCV is a computationally expensive procedure, as it requires training and evaluating the model n times (where n is the number of data points). However, for small datasets where k-fold cross-validation might not provide enough test samples, LOOCV can be a valuable tool for assessing your model’s performance.

See Also