Leave-One-Out Cross-Validation (LOOCV) is a cross-validation technique that’s particularly useful when working with small datasets. In LOOCV, the model is trained on n-1 samples and tested on the single left-out sample. This process is repeated for each data point, providing a robust estimate of the model’s performance. Scikit-learn’s LeaveOneOut
class makes it straightforward to implement LOOCV.
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score, LeaveOneOut
from xgboost import XGBClassifier
# Load a small dataset (digits with 2 classes)
X, y = load_digits(n_class=2, return_X_y=True)
# Create an XGBClassifier
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
# Create a LeaveOneOut object
cv = LeaveOneOut()
# Perform LOOCV
cv_scores = cross_val_score(model, X, y, cv=cv, scoring='accuracy')
# Print the cross-validation scores
print("Cross-validation scores:", cv_scores)
print(f"Mean cross-validation score: {np.mean(cv_scores):.2f}")
Here’s what’s happening:
- We load a small dataset (digits with 2 classes) and create an XGBClassifier with specified hyperparameters.
- We create a
LeaveOneOut
object, which will be used as the cross-validation strategy. - We use
cross_val_score()
to perform LOOCV, specifying the model, input features (X), target variable (y), theLeaveOneOut
object (cv), and the scoring metric (accuracy). - We print the individual cross-validation scores and their mean.
LOOCV is a computationally expensive procedure, as it requires training and evaluating the model n times (where n is the number of data points). However, for small datasets where k-fold cross-validation might not provide enough test samples, LOOCV can be a valuable tool for assessing your model’s performance.