XGBoost "weight" Feature Importance

Importance

XGBoost provides several ways to calculate feature importance, including the “weight” method, which is based on the number of times a feature is used to split the data across all trees.

This example demonstrates how to configure XGBoost to use the “weight” method and retrieve the feature importance scores using scikit-learn’s implementation of XGBoost.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Load the Breast Cancer Wisconsin dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an XGBClassifier with the importance_type parameter set to "weight"
model = XGBClassifier(n_estimators=100, learning_rate=0.1, importance_type="weight", random_state=42)

# Train the model
model.fit(X_train, y_train)

# Retrieve the "weight" feature importance scores
importance_scores = model.feature_importances_

# Print the feature importance scores along with feature names
for feature, score in zip(data.feature_names, importance_scores):
    print(f"{feature}: {score}")

In this example, we load the Breast Cancer Wisconsin dataset and split it into train and test sets.

We then create an instance of scikit-learn’s XGBClassifier with the importance_type parameter set to "weight". This configures XGBoost to calculate feature importance based on the number of times a feature is used to split the data across all trees.

After training the model on the training data, we retrieve the “weight” feature importance scores using the feature_importances_ attribute of the trained model. This attribute returns an array of importance scores, where each score corresponds to a feature in the dataset.

Finally, we print the feature importance scores along with their corresponding feature names using the feature_names attribute of the loaded dataset.

By setting the importance_type parameter to "weight" when creating an XGBoost model with scikit-learn, you can easily configure the model to calculate feature importance based on the number of times a feature is used to split the data. The feature_importances_ attribute allows you to retrieve these scores after training, providing insights into the relative importance of each feature in the model’s decision-making process.

See Also