Saving a plot of XGBoost feature importance scores to a file enables easy sharing and inclusion in reports or presentations, enhancing collaboration and communication among team members.
This example demonstrates how to train an XGBoost classifier, calculate feature importance scores, create a plot of the scores using matplotlib, and save the plot to a file in a specified format.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
import matplotlib.pyplot as plt
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=3,
n_classes=2, random_state=42)
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train an XGBoost classifier
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X_train, y_train)
# Calculate feature importance scores
importances = model.feature_importances_
# Create a plot of the feature importance scores
fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(range(len(importances)), importances)
ax.set_xlabel("Feature")
ax.set_ylabel("Importance Score")
ax.set_title("XGBoost Feature Importance")
# Save the plot to a file
plt.tight_layout()
plt.savefig("xgboost_feature_importance.png", dpi=300)
Here’s how the code works:
Generate a synthetic dataset using scikit-learn’s
make_classification
function, specifying the number of samples, features, informative features, redundant features, and classes.Split the dataset into training and test sets using
train_test_split
.Initialize an XGBoost classifier with desired hyperparameters and train it on the training data.
Calculate the feature importance scores using the
feature_importances_
attribute of the trained model.Create a plot of the feature importance scores using matplotlib. Set the figure size, create a bar plot, and add labels and a title.
Save the plot to a file using
plt.savefig()
. Specify the filename, including the desired format extension (e.g., “.png”), and set thedpi
parameter for high-resolution output.
By following these steps, you can easily create and save a plot of XGBoost feature importance scores, facilitating the sharing and communication of insights gained from the model.