Learning to rank is a machine learning technique used to rank a list of items based on their relevance to a given query.
It’s commonly used in applications such as search engines, recommender systems, and online advertising.
The “map” (Mean Average Precision) metric is a popular choice for evaluating the performance of ranking models, as it considers both the precision and the order of the ranked items.
In this example, we’ll demonstrate how to use the “map” metric with XGBoost’s native API to train and evaluate a ranking model on a synthetic dataset.
import xgboost as xgb
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# Generate a synthetic dataset for ranking
X, y = make_classification(n_samples=1000, n_features=20, n_informative=5, n_clusters_per_class=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Group data points to form queries in a ranking context
group_sizes = [200, 200, 200, 200] # Ensuring the sum of groups equals the number of training data points (80% of 1000)
# Convert data into DMatrix, specifying group information for ranking
dtrain = xgb.DMatrix(X_train, label=y_train)
dtrain.set_group(group_sizes)
dtest = xgb.DMatrix(X_test, label=y_test)
# Define parameters for the model using the 'rank:map' objective
params = {
'objective': 'rank:map',
'eval_metric': 'map',
'learning_rate': 0.1,
'gamma': 0.1,
'min_child_weight': 0.1,
'max_depth': 6
}
# Train the model
bst = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtest, 'test')])
In this example, we first generate a synthetic dataset using scikit-learn’s make_classification
function. We then split the data into training and test sets.
To simulate a ranking problem, we group the training data points into queries of equal size using the group_sizes
list. The sum of the group sizes should equal the number of training data points.
We convert the training data into an xgb.DMatrix
object, specifying the group information using the set_group
method. This is crucial for the ranking objective.
Next, we define the XGBoost parameters, setting the objective to 'rank:map'
to optimize for the “map” metric and the eval_metric
parameter to 'map'
to report the map metric on the test set each boosting iteration. We then train the model using xgb.train
.