The "binary:logitraw"
objective in XGBoost is used for binary classification tasks when you need direct access to the model’s raw, untransformed scores.
These scores can be interpreted as log-odds ratios, which are the logarithm of the ratio between the probability of the positive class and the probability of the negative class.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
import numpy as np
# Generate a synthetic dataset for binary classification
X, y = make_classification(n_samples=1000, n_classes=2, n_features=10, random_state=42)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize an XGBClassifier with the "binary:logitraw" objective
model = XGBClassifier(objective="binary:logitraw", n_estimators=100, learning_rate=0.1)
# Fit the model on the training data
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred_raw = model.predict(X_test)
# Apply the sigmoid function to the raw predictions to get probabilities
y_pred_prob = 1 / (1 + np.exp(-y_pred_raw))
# Convert probabilities to class labels
y_pred = (y_pred_prob > 0.5).astype(int)
# Calculate the accuracy of the predictions
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
It’s important to note that the "binary:logitraw"
objective outputs raw scores, not probabilities. To obtain probabilities, you need to apply the sigmoid function to the raw scores. The sigmoid function maps the raw scores to a range between 0 and 1, which can be interpreted as the probability of the positive class.
When using the "binary:logitraw"
objective, consider the following tips:
- Scale the input features to a similar range to improve convergence and model performance.
- Use appropriate evaluation metrics for binary classification, such as accuracy, precision, recall, F1-score, and ROC AUC, to assess the model’s performance.
- Tune hyperparameters like
max_depth
,learning_rate
, andn_estimators
to optimize performance.