Configure XGBoost Objective "binary:logistic" vs "binary:logitraw"

This example explores the differences between the XGBoost objectives "binary:logistic" and "binary:logitraw".

We’ll explain when to use each, how they affect model output and performance, and provide a complete Python code example that illustrates these distinctions using synthetic datasets.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_classes=2, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model with binary:logistic
model_logistic = XGBClassifier(objective='binary:logistic', n_estimators=100, learning_rate=0.1)
model_logistic.fit(X_train, y_train)
y_pred_logistic = model_logistic.predict(X_test)
acc_logistic = accuracy_score(y_test, y_pred_logistic)

# Model with binary:logitraw
model_logitraw = XGBClassifier(objective='binary:logitraw', n_estimators=100, learning_rate=0.1)
model_logitraw.fit(X_train, y_train)
y_pred_logitraw = model_logitraw.predict(X_test)
acc_logitraw = accuracy_score(y_test, y_pred_logitraw)

print(f"Accuracy with 'binary:logistic': {acc_logistic:.4f}")
print(f"Accuracy with 'binary:logitraw': {acc_logitraw:.4f}")

Analysis of outcomes: The model outputs demonstrate how the logistic transformation in "binary:logistic" differs from the raw score output in "binary:logitraw".

The logistic objective provides probability estimates of class membership, making it ideal for applications where you need to measure the likelihood of outcomes. In contrast, the logitraw objective outputs model scores before logistic transformation, which can be useful for custom threshold tuning or as input for other probabilistic methods.

Best Practices and Tips: When deciding between these two objectives, consider your end goal. Use "binary:logistic" for probability estimates and when you need a direct interpretation of the outputs as probabilities.

Opt for "binary:logitraw" when you need raw scores for further calibration or specific threshold adjustments.

Regardless of the objective, always ensure that your data is well-preprocessed and consider hyperparameter tuning, focusing on learning_rate and max_depth, to optimize the performance of your model.

See Also