The "multi:softprob"
objective in XGBoost is used for multi-class classification problems where the target variable is a categorical variable with more than two classes.
This objective outputs a vector of class probabilities for each input sample, which is obtained by applying the softmax function to the raw predicted scores.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
# Generate a synthetic dataset for multi-class classification
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=5, random_state=42)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize an XGBClassifier with the "multi:softprob" objective
model = XGBClassifier(objective="multi:softprob", num_class=3, n_estimators=100, learning_rate=0.1)
# Fit the model on the training data
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
y_pred_proba = model.predict_proba(X_test)
# Print predicted class probabilities for a few test samples
print("Predicted class probabilities:")
print(y_pred_proba[:5])
# Calculate the accuracy of the predictions
accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy: {accuracy:.4f}")
The "multi:softprob"
objective should be used when you need probability estimates for each class in a multi-class classification problem. This is different from the "multi:softmax"
objective, which outputs raw scores before the softmax transformation.
When using the "multi:softprob"
objective, consider the following tips:
- Ensure that the target variable is appropriately encoded as integers starting from 0.
- If the dataset is imbalanced, consider setting the
"scale_pos_weight"
parameter to balance the importance of each class. - You can adjust the prediction threshold if needed to optimize for certain metrics.
- Evaluate the model’s performance using log loss in addition to accuracy, as log loss takes into account the predicted probabilities.
By using the "multi:softprob"
objective, you can obtain class probability estimates for each input sample, which can be useful in various scenarios such as when you need to rank the classes by their likelihood or when you want to set custom decision thresholds based on the problem at hand.