Configure XGBoost "binary:logitraw" Objective

The "binary:logitraw" objective in XGBoost is used for binary classification tasks when you need direct access to the model’s raw, untransformed scores.

These scores can be interpreted as log-odds ratios, which are the logarithm of the ratio between the probability of the positive class and the probability of the negative class.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
import numpy as np

# Generate a synthetic dataset for binary classification
X, y = make_classification(n_samples=1000, n_classes=2, n_features=10, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize an XGBClassifier with the "binary:logitraw" objective
model = XGBClassifier(objective="binary:logitraw", n_estimators=100, learning_rate=0.1)

# Fit the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred_raw = model.predict(X_test)

# Apply the sigmoid function to the raw predictions to get probabilities
y_pred_prob = 1 / (1 + np.exp(-y_pred_raw))

# Convert probabilities to class labels
y_pred = (y_pred_prob > 0.5).astype(int)

# Calculate the accuracy of the predictions
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

It’s important to note that the "binary:logitraw" objective outputs raw scores, not probabilities. To obtain probabilities, you need to apply the sigmoid function to the raw scores. The sigmoid function maps the raw scores to a range between 0 and 1, which can be interpreted as the probability of the positive class.

When using the "binary:logitraw" objective, consider the following tips:

Scale the input features to a similar range to improve convergence and model performance.
Use appropriate evaluation metrics for binary classification, such as accuracy, precision, recall, F1-score, and ROC AUC, to assess the model’s performance.
Tune hyperparameters like max_depth, learning_rate, and n_estimators to optimize performance.

See Also