XGBoost for Multiple-Output Regression with "multi_strategy"

XGBoost offers native support for multiple output regression (multi-out regression) tasks through the use of the tree_method="hist" and multi_strategy="multi_output_tree" parameters.

By leveraging these parameters, you can efficiently train an XGBoost model to predict multiple continuous target variables simultaneously without relying on external wrappers like scikit-learn’s MultiOutputRegressor.

This example demonstrates how to train an XGBoost model for multiple output regression using the native support provided by XGBoost. We’ll generate a synthetic dataset, prepare the data, initialize the model with the appropriate parameters, train it, and evaluate its performance.

# XGBoosting.com
# Train an XGBoost Model for Multiple Output Regression with Native Support
from xgboost import XGBRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate a synthetic multi-output regression dataset
X, y = make_regression(n_samples=1000,
                       n_features=10,
                       n_targets=3,
                       noise=0.1,
                       random_state=42,
                       n_informative=5)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize an XGBRegressor model with native multi-output support
model = XGBRegressor(n_estimators=100,
                     learning_rate=0.1,
                     tree_method="hist",
                     multi_strategy="multi_output_tree",
                     random_state=42)

# Fit the XGBRegressor on the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model's performance using mean squared error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

Here’s how it works:

Generate a synthetic multi-output regression dataset with 10 input features and 3 output targets.
Split the data into training and testing sets using train_test_split.
Initialize an XGBRegressor model with tree_method="hist" and multi_strategy="multi_output_tree" for native multi-output support.
Fit the XGBRegressor on the training data using fit().
Make predictions on the test set using predict().
Evaluate the model’s performance using Mean Squared Error (MSE).

By setting tree_method="hist" and multi_strategy="multi_output_tree", XGBoost internally handles the multiple output regression task without the need for external wrappers. This approach can provide improved performance and simplify the code required for training and prediction.

This example serves as a starting point for training XGBoost models for multi-output regression using native support. Depending on your specific dataset and requirements, you may need to preprocess the data, tune hyperparameters, or use different evaluation metrics to achieve optimal results.

See Also