XGBoost for Multi-Label Classification Manually

Training a separate XGBoost model for each label in a multi-label classification task provides the flexibility to tune parameters individually for each label’s model.

Predictions are then generated by combining the outputs from each model.

# XGBoosting.com
# XGBoost for Multi-Label Classification Manually
import numpy as np
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate a synthetic multi-label dataset
X, y = make_multilabel_classification(n_samples=1000, n_classes=5, n_labels=2,
                                      allow_unlabeled=True, random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Function to train an XGBoost model for a single label
def train_model(X, y):
    model = XGBClassifier(n_estimators=10, objective='binary:logistic', random_state=42)
    model.fit(X, y)
    return model

# Train a separate model for each label
models = []
for i in range(y_train.shape[1]):
    model = train_model(X_train, y_train[:, i])
    models.append(model)

# Generate predictions by applying each label's model
predictions = []
for model in models:
    pred = model.predict(X_test)
    predictions.append(pred)

# Combine predictions from all models
final_predictions = np.array(predictions).T

print(final_predictions[:5])

By training a separate XGBoost model for each label, you can customize the parameters for each model to better suit the characteristics of that particular label. The train_model function encapsulates the process of initializing and training an XGBClassifier for a single label.

The trained models are stored in a list, which is then looped over to generate predictions for each label. Finally, the predictions from all models are combined using a transpose operation to obtain the final multi-label predictions.

This approach offers more control over the individual models but may be more computationally expensive than using XGBoost’s built-in multi-label classification capabilities.

See Also