Training a separate XGBoost model for each label in a multi-label classification task provides the flexibility to tune parameters individually for each label’s model.
Predictions are then generated by combining the outputs from each model.
# XGBoosting.com
# XGBoost for Multi-Label Classification Manually
import numpy as np
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
# Generate a synthetic multi-label dataset
X, y = make_multilabel_classification(n_samples=1000, n_classes=5, n_labels=2,
allow_unlabeled=True, random_state=42)
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Function to train an XGBoost model for a single label
def train_model(X, y):
model = XGBClassifier(n_estimators=10, objective='binary:logistic', random_state=42)
model.fit(X, y)
return model
# Train a separate model for each label
models = []
for i in range(y_train.shape[1]):
model = train_model(X_train, y_train[:, i])
models.append(model)
# Generate predictions by applying each label's model
predictions = []
for model in models:
pred = model.predict(X_test)
predictions.append(pred)
# Combine predictions from all models
final_predictions = np.array(predictions).T
print(final_predictions[:5])
By training a separate XGBoost model for each label, you can customize the parameters for each model to better suit the characteristics of that particular label. The train_model
function encapsulates the process of initializing and training an XGBClassifier
for a single label.
The trained models are stored in a list, which is then looped over to generate predictions for each label. Finally, the predictions from all models are combined using a transpose operation to obtain the final multi-label predictions.
This approach offers more control over the individual models but may be more computationally expensive than using XGBoost’s built-in multi-label classification capabilities.