XGBoost "total_cover" Feature Importance

Importance

XGBoost offers several methods to calculate feature importance, one of which is the “total_cover” method. This method measures the total coverage of a feature, which is the number of samples affected by the feature across all trees in the model.

This example demonstrates how to configure XGBoost to use the “total_cover” method and retrieve the feature importance scores using scikit-learn’s implementation of XGBoost.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Load the Breast Cancer Wisconsin dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an XGBClassifier with the importance_type parameter set to "total_cover"
model = XGBClassifier(n_estimators=100, learning_rate=0.1, importance_type="total_cover", random_state=42)

# Train the model
model.fit(X_train, y_train)

# Retrieve the "total_cover" feature importance scores
importance_scores = model.feature_importances_

# Print the feature importance scores along with feature names
for feature, score in zip(data.feature_names, importance_scores):
    print(f"{feature}: {score}")

In this example, we load the Breast Cancer Wisconsin dataset and split it into train and test sets.

We then create an instance of scikit-learn’s XGBClassifier with the importance_type parameter set to "total_cover". This configures XGBoost to calculate feature importance based on the total coverage of each feature across all trees in the model.

After training the model on the training data, we retrieve the “total_cover” feature importance scores using the feature_importances_ attribute of the trained model. This attribute returns an array of importance scores, where each score corresponds to a feature in the dataset.

Finally, we print the feature importance scores along with their corresponding feature names using the feature_names attribute of the loaded dataset.

By setting the importance_type parameter to "total_cover" when creating an XGBoost model with scikit-learn, you can easily configure the model to calculate feature importance based on the total coverage of each feature. The feature_importances_ attribute allows you to retrieve these scores after training, providing insights into the relative importance of each feature in the model’s decision-making process.

See Also