The “pre” evaluation metric, which stands for Precision at Top N, is a useful tool for evaluating the quality of top-ranked items in ranking problems.
This example demonstrates how to use the “pre” metric with XGBoost’s native API to train and evaluate a ranking model.
import xgboost as xgb
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# Generate a synthetic dataset for ranking
X, y = make_classification(n_samples=1000, n_classes=2, n_informative=5, n_clusters_per_class=1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Convert data into DMatrix format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Define parameters for the model using the 'rank:pairwise' objective
params = {
'objective': 'rank:pairwise',
'eval_metric': 'pre',
'learning_rate': 0.1,
'gamma': 0.1,
'min_child_weight': 0.1,
'max_depth': 6
}
# Train the model and evaluate on the test set
bst = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtest, 'test')])
In this example, we first generate a synthetic dataset suitable for a ranking problem using scikit-learn’s make_classification
function. We then split the data into training and test sets.
Next, we convert the data into XGBoost’s DMatrix
format, which is required for training and evaluating models using the native API.
We define the parameters for our XGBoost model, setting the objective to 'rank:pairwise'
for ranking problems. Importantly, we set the 'eval_metric'
parameter to 'pre'
, which tells XGBoost to use the Precision at Top N metric for evaluation during training.
We then train the model for 100 rounds, evaluating on the test set at each iteration.
By using the “pre” evaluation metric, we can effectively train and evaluate our XGBoost ranking model, focusing on the quality of the top-ranked items.