XGBoosting Home | About | Contact | Examples

XGBoost for Multi-Label Classification With MultiOutputClassifier

XGBoost can be combined with scikit-learn’s MultiOutputClassifier to efficiently train separate models for each label in a multi-label classification task.

This approach allows you to leverage XGBoost’s performance while handling each label independently.

# XGBoosting.com
# Fit an XGBoost Model for Multi-Label Classification using MultiOutputClassifier
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputClassifier
from xgboost import XGBClassifier

# Generate a synthetic multi-label dataset
X, y = make_multilabel_classification(n_samples=1000, n_classes=5, n_labels=2, allow_unlabeled=True, random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize XGBClassifier
base_classifier = XGBClassifier(
    n_estimators=10,
    objective='binary:logistic',
    random_state=42
)

# Wrap the base classifier in a MultiOutputClassifier
multi_label_classifier = MultiOutputClassifier(base_classifier)

# Fit the MultiOutputClassifier
multi_label_classifier.fit(X_train, y_train)

# Generate predictions
predictions = multi_label_classifier.predict(X_test)

print(predictions[:5])

By wrapping an XGBClassifier in scikit-learn’s MultiOutputClassifier, you can train a separate XGBoost model for each label in your multi-label classification task. This approach is straightforward to implement and can be effective when the labels are not strongly correlated.

Note that this method may not capture dependencies between labels as effectively as XGBoost’s built-in multi-output tree strategy. However, it can still be a valuable approach, particularly when you need more fine-grained control over the individual models for each label.



See Also