XGBoost plot_importance() With Feature Names

Visualizing feature importances is a key step in understanding how your XGBClassifier model makes predictions. The plot_importance() function provides a convenient way to directly plot feature importances from a trained model.

In this example, we’ll demonstrate how to use plot_importance() to visualize feature importances while including the actual feature names from the dataset on the plot, providing a clear and informative view of the model’s decision-making process.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier, plot_importance
import matplotlib.pyplot as plt

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=42)
feature_names = [f'feature_{i}' for i in range(X.shape[1])]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit XGBClassifier
model = XGBClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Set the feature names
model.get_booster().feature_names = feature_names

# Plot feature importances
plt.figure(figsize=(10, 6))
plot_importance(model.get_booster(), importance_type='weight')
plt.show()

The plot may look as follows:

xgboost plot_importance with names

First, we generate a synthetic binary classification dataset using scikit-learn’s make_classification function. We set n_samples to 1000 and n_features to 10, with 5 informative and 5 redundant features. We also create a list of feature names, feature_names, to use when plotting.

Next, we split the data into training and testing sets using train_test_split, allocating 20% of the data for testing.

We then create an instance of XGBClassifier with 100 estimators and fit it on the training data.

Next, we set the feature_names property on the Booster model and then call the standalone plot_importance() function with our model and plot the feature importance with our feature names.

The resulting plot will display the feature importances as a bar graph, with the synthetic feature names on the x-axis, providing a clear visual representation of the relative importance of each feature in the XGBClassifier model’s decision-making process. The plot_importance() function automatically handles the plotting details, making it a convenient choice for quickly visualizing feature importances.

See Also