XGBoost "cover" Feature Importance

Importance

XGBoost offers multiple methods to calculate feature importance, including the “cover” method, which is based on the average coverage of the feature when it is used in trees.

This example shows how to configure XGBoost to use the “cover” method and retrieve the feature importance scores using scikit-learn’s implementation of XGBoost.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Load the Breast Cancer Wisconsin dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an XGBClassifier with the importance_type parameter set to "cover"
model = XGBClassifier(n_estimators=100, learning_rate=0.1, importance_type="cover", random_state=42)

# Train the model
model.fit(X_train, y_train)

# Retrieve the "cover" feature importance scores
importance_scores = model.feature_importances_

# Print the feature importance scores along with feature names
for feature, score in zip(data.feature_names, importance_scores):
    print(f"{feature}: {score}")

The “cover” method differs from the “weight” method in how it calculates feature importance.

While the “weight” method is based on the number of times a feature is used to split the data across all trees, the “cover” method considers the average coverage of the feature when it is used in trees. The exact differences in calculation and interpretation between these two methods are not well-documented in the XGBoost documentation.

To configure XGBoost to use the “cover” method, set the importance_type parameter to "cover" when creating an instance of XGBClassifier. After training the model, the “cover” feature importance scores can be retrieved using the feature_importances_ attribute, just like in the “weight” method example.

By setting importance_type to "cover", you can easily switch between different feature importance calculation methods in XGBoost using scikit-learn. This allows you to compare and contrast the results from different methods and gain a more comprehensive understanding of the relative importance of features in your model.

See Also