XGBoosting Home | About | Contact | Examples

XGBoost Feature Importance with get_fscore()

The get_fscore() method in XGBoost allows you to retrieve feature importance scores after training a model.

This example demonstrates how to use get_fscore() on a real dataset to obtain and interpret feature importance.

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
import xgboost as xgb

# Load the Wine dataset
data = load_wine()
X, y = data.data, data.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create DMatrix objects
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set XGBoost parameters
params = {
    'objective': 'multi:softmax',
    'num_class': 3,
    'max_depth': 3,
    'learning_rate': 0.1,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
}

# Train the XGBoost model
model = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtest, 'test')])

# Get feature importance scores
importance_scores = model.get_fscore()
print("Feature Importance Scores:")
for feature, score in importance_scores.items():
    print(f"{feature}: {score}")

# Sort feature importance scores in descending order
sorted_scores = sorted(importance_scores.items(), key=lambda x: x[1], reverse=True)
print("\nFeature Importance Scores (Sorted):")
for feature, score in sorted_scores:
    print(f"{feature}: {score}")

The get_fscore() method returns a dictionary where the keys are the feature indices and the values are their corresponding importance scores.

By default, these scores represent the number of times a feature is used to split the data across all trees in the model.

The relative magnitude of the scores indicates the relative importance of each feature. Features with higher scores have a greater influence on the model’s predictions.

After obtaining the scores, we sort them in descending order to make it easier to identify the most important features.

Keep in mind that XGBoost provides other importance types, such as 'weight', 'gain', and 'cover', which offer different perspectives on feature importance. You can specify the desired importance type using the importance_type parameter in get_fscore().

By utilizing the get_fscore() method, you can easily access feature importance information and use it to gain insights into your XGBoost models.



See Also