Configure XGBoost "multi_strategy" Parameter

The multi_strategy parameter in XGBoost enables efficient handling of multi-output regression and multi-label classification tasks.

By setting tree_method='hist' and multi_strategy='multi_output_tree', you can configure the your XGBoost model for these types of problems.

This configuration leverages histogram-based split finding and constructs a single tree for all outputs or labels.

Multi-Label Classification

from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, f1_score

# Generate a multi-label classification dataset
X, y = make_multilabel_classification(n_samples=1000, n_classes=5, n_labels=2, allow_unlabeled=True, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize an XGBoost classifier with multi_strategy='multi_output_tree'
model = XGBClassifier(n_estimators=100, tree_method='hist', multi_strategy='multi_output_tree')

# Train the model on the multi-label dataset
model.fit(X_train, y_train)

# Make predictions using the trained model
y_pred = model.predict(X_test)

# Evaluate the model's performance using accuracy and F1 score
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='micro')

print(f"Accuracy: {accuracy:.4f}")
print(f"F1 Score (micro): {f1:.4f}")

In this example, we generate a synthetic multi-label classification dataset using make_multilabel_classification from scikit-learn. We then split the dataset into training and test sets.

Next, we initialize an XGBoost classifier with tree_method='hist' and multi_strategy='multi_output_tree'. This configuration enables the model to efficiently handle the multi-label task by constructing a single tree for all labels.

We train the model on the multi-label dataset using model.fit() and make predictions on the test set using model.predict().

Finally, we evaluate the model’s performance using two common metrics for multi-label classification: accuracy and F1 score. Accuracy measures the proportion of correctly predicted labels, while the F1 score is the harmonic mean of precision and recall. In this example, we use the ‘micro’ average for the F1 score, which calculates metrics globally by counting the total true positives, false negatives, and false positives across all labels.

Also See:

XGBoost for Multi-Label Classification Native Support

Multi-Output Regression

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Generate a multi-output regression dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize an XGBoost regressor with multi_strategy='multi_output_tree'
model = XGBRegressor(n_estimators=100, tree_method='hist', multi_strategy='multi_output_tree')

# Train the model on the multi-output dataset
model.fit(X_train, y_train)

# Make predictions using the trained model
y_pred = model.predict(X_test)

# Evaluate the model's performance using mean squared error
mse = mean_squared_error(y_test, y_pred)

print(f"Mean Squared Error: {mse:.4f}")

In this multi-output regression example, we generate a synthetic dataset using make_regression from scikit-learn, specifying n_targets=3 to create a multi-output problem. We then split the dataset into training and test sets.

We initialize an XGBoost regressor with tree_method='hist' and multi_strategy='multi_output_tree'. This configuration allows the model to handle the multi-output regression task by constructing a single tree for all outputs.

We train the model on the multi-output dataset using model.fit() and make predictions on the test set using model.predict().

To evaluate the model’s performance, we use the mean squared error (MSE) metric, which measures the average squared difference between the predicted and actual values across all outputs.

Also See:

XGBoost for Multiple-Output Regression Native Support

By setting multi_strategy='multi_output_tree' in combination with tree_method='hist', XGBoost can efficiently handle multi-output regression and multi-label classification tasks, optimizing performance and resource usage.

Multi-Label Classification

Multi-Output Regression

See Also