Plot Calibration Curve with XGBoost

While the CalibratedClassifierCV class in scikit-learn provides a convenient way to calibrate the probabilities output by an XGBoost model, it’s also important to visually assess the effectiveness of the calibration.

Plotting the calibration curve is a valuable tool for this purpose, and scikit-learn’s calibration_curve function makes it easy to generate these plots.

A calibration curve is a graphical tool used to assess the calibration of a probabilistic classifier, such as an XGBoost model. It plots the predicted probabilities against the actual proportion of positive instances within each bin of predicted probabilities. In a well-calibrated model, the predicted probabilities should align with the observed frequencies, resulting in a calibration curve that follows the diagonal line. Deviations from the diagonal indicate miscalibration, with curves above the diagonal suggesting underconfidence and curves below the diagonal indicating overconfidence. The calibration curve provides a visual way to evaluate the reliability of a model’s probability estimates and can guide efforts to improve calibration through techniques like Platt scaling or isotonic regression.

Here’s how you can create a calibration plot for your XGBoost model:

from xgboost import XGBClassifier
from sklearn.calibration import CalibratedClassifierCV
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.calibration import calibration_curve
from sklearn.metrics import brier_score_loss
import matplotlib.pyplot as plt

# Generate synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train XGBoost classifier
model = XGBClassifier(random_state=42)
model.fit(X_train, y_train)

# Calibrate probabilities using CalibratedClassifierCV
calibrated_clf = CalibratedClassifierCV(estimator=model, method='sigmoid', cv=5)
calibrated_clf.fit(X_train, y_train)

# Predict calibrated probabilities
calibrated_probs = calibrated_clf.predict_proba(X_test)

# Compute calibration curve and Brier score
prob_true, prob_pred = calibration_curve(y_test, calibrated_probs[:, 1], n_bins=10)
brier_score = brier_score_loss(y_test, calibrated_probs[:, 1])

# Plot perfectly calibrated curve
plt.plot([0, 1], [0, 1], linestyle='--', label='Perfectly calibrated')

# Plot calibration curve
plt.plot(prob_pred, prob_true, marker='.', label='XGBoost')

plt.xlabel('Predicted probability')
plt.ylabel('Fraction of positives')
plt.legend(loc='lower right')
plt.title(f'Calibration plot (Brier score = {brier_score:.3f})')
plt.show()

Let’s break this down step-by-step:

First, we use the calibration_curve function to compute the true and predicted probabilities. This function takes in the true labels (y_test) and the predicted probabilities for the positive class (calibrated_probs[:, 1]). The n_bins parameter determines the number of bins to discretize the probability range into.
We also calculate the Brier score using brier_score_loss. The Brier score is a quantitative measure of calibration performance, where lower values indicate better calibration.
Next, we plot the diagonal line representing perfect calibration. If a model is perfectly calibrated, its calibration curve should align with this diagonal.
We then plot the actual calibration curve of our XGBoost model using the true and predicted probabilities computed by calibration_curve.
Finally, we add labels for the axes, a legend, and a title that includes the Brier score.

The resulting plot will show how well the calibrated probabilities from your XGBoost model align with the true probabilities. If the curve lies close to the diagonal line, it indicates good calibration. Deviations above the diagonal suggest underconfidence (the model’s predictions are less extreme than the true probabilities), while deviations below the diagonal indicate overconfidence (the model’s predictions are more extreme than the true probabilities).

While the Brier score provides a quantitative summary of calibration performance, visually inspecting the calibration curve can offer valuable insights into where and how the model’s calibration may be falling short. This information can guide further efforts to improve the model’s probability estimates.

See Also