When using XGBoost for binary classification, the predict()
method returns probabilities by default.
To get class labels, you need to apply a threshold to these probabilities.
Here’s a quick example of how to convert probability predictions to class labels using XGBoost’s native Python API.
# XGBoosting.com
# Convert XGBoost probability predictions to class labels
import numpy as np
from sklearn.datasets import make_classification
import xgboost as xgb
# Generate a small synthetic dataset for binary classification
X, y = make_classification(n_samples=10, n_classes=2, random_state=42)
# Convert data to DMatrix format
dtrain = xgb.DMatrix(X, label=y)
# Set XGBoost parameters
params = {
'objective': 'binary:logistic',
'eval_metric': 'error',
'seed': 42
}
# Train the model
model = xgb.train(params, dtrain, num_boost_round=10)
# Get probability predictions
prob_preds = model.predict(dtrain)
# Apply a threshold to get class labels
threshold = 0.5
class_preds = (prob_preds > threshold).astype(int)
# Print the first 5 probability predictions and their class labels
print("Probabilities:", prob_preds[:5])
print("Class Labels: ", class_preds[:5])
The key steps:
- Convert your data to XGBoost’s
DMatrix
format. - Set the
objective
parameter to'binary:logistic'
for binary classification. - Train the model using
xgb.train()
. - Get probability predictions using
model.predict()
. - Apply a threshold (here,
0.5
) to the probabilities. - Convert the boolean result to integer type to get the class labels.
By default, XGBoost uses 0.5
as the threshold for binary classification. Adjust this value if you need to tune your model’s sensitivity or specificity.