Explain XGBoost Predictions with ELI5 Library

Understanding the factors driving predictions is crucial for trusting and debugging machine learning models.

ELI5 is a Python library that can explain the predictions of various ML models, including those built with XGBoost.

This example demonstrates how to use ELI5 to interpret an XGBoost classifier trained on a synthetic dataset, providing both global feature importances and explanations for individual predictions.

First, install ELI5 using your preferred Python package manager, such as pip:

pip install eli5

Then, use ELI5 to interpret your XGBoost model:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
import eli5
from eli5.sklearn import PermutationImportance

# Generate a synthetic multiclass classification dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=10, n_informative=5, n_redundant=5, random_state=42)

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an XGBoost classifier
model = XGBClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Show global feature importances
print(eli5.format_as_text(eli5.explain_weights(model)))

# Explain the prediction for a single instance
instance_idx = 42
instance = X_test[instance_idx]
print(eli5.format_as_text(eli5.explain_prediction(model, instance)))

In this example:

We generate a synthetic multiclass classification dataset using scikit-learn’s make_classification function with 1000 samples, 3 classes, and 10 features (5 informative, 5 redundant).
We split the data into train and test sets and train an XGBoost classifier on the training data.
We use ELI5’s explain_weights function to display the global feature importances of the model. This shows which features have the most significant impact on the model’s predictions overall.
We select a single instance from the test set and use ELI5’s explain_prediction function to explain the model’s prediction for this specific instance. This breaks down how each feature contributes to the model’s prediction for this particular data point.

The output of explain_weights will show the global feature importances, ranked in descending order. The output of explain_prediction will show the feature values for the selected instance and how each feature contributes to the predicted class probabilities.

By examining the global feature importances, you can gain insights into which features the model relies on most heavily in general. By explaining individual predictions, you can understand why the model made a specific prediction for a particular instance.

ELI5 provides a straightforward way to interpret XGBoost models, helping to build trust in the model’s predictions and identify potential issues or biases. However, keep in mind that feature importances can be influenced by collinearity among features, and individual explanations may not capture the full complexity of the model’s behavior.

Note: this example may not work with the latest version of scikit-learn, and will produce the following error:

ImportError: cannot import name 'if_delegate_has_method' from 'sklearn.utils.metaestimators'

To fix this error, you must use an older version of the scikit-learn library (such as version 1.2.2), until the eli5 library is updated:

pip install scikit-learn==1.2.2

See Also