XGBoosting Home | About | Contact | Examples

XGBoost Configure "ndcg" Eval Metric

Normalized Discounted Cumulative Gain (NDCG) is a widely used metric for evaluating the performance of learning to rank models.

It assesses the quality of the ranked results by considering the relevance scores and the position of each item in the ranked list.

This example demonstrates how to specify NDCG as the evaluation metric when training an XGBoost ranking model using the native API.

Using an appropriate metric like NDCG is crucial for tuning and assessing the effectiveness of ranking models.

import xgboost as xgb
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate a synthetic dataset for ranking
X, y = make_classification(n_samples=1000, n_classes=5, n_informative=5, n_clusters_per_class=1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Group data points to form queries in a ranking context
# Ensuring the sum of groups equals the number of training data points (80% of 1000)
group_sizes = [200, 200, 200, 200]

# Convert data into DMatrix, specifying group information for ranking
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Define parameters for the model using the 'rank:pairwise' objective and 'ndcg' evaluation metric
params = {
    'objective': 'rank:pairwise',
    'eval_metric': 'ndcg',
    'learning_rate': 0.1,
    'gamma': 0.1,
    'min_child_weight': 0.1,
    'max_depth': 6

# Train the model
bst = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtest, 'test')])

By specifying 'ndcg' as the eval_metric, XGBoost will calculate the NDCG score at each boosting iteration.

This allows for monitoring the model’s performance during training and helps in selecting the optimal number of boosting rounds.

The NDCG scores provide insight into how well the model is ranking the items based on their relevance. Higher NDCG scores indicate better ranking quality, with more relevant items appearing at the top of the ranked list.

See Also