XGBoost for Multivariate Time Series Forecasting

This example demonstrates how to train an XGBoost model for multivariate time series forecasting, where we use multiple input time series to predict a single future value.

We’ll cover data preparation, model initialization, training, and making predictions using a synthetic dataset.

# XGBoosting.com
# Train an XGBoost Model for Multivariate Time Series Forecasting
import numpy as np
import pandas as pd
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Generate a synthetic multivariate time series dataset
series1 = np.sin(0.1 * np.arange(200)) + np.random.randn(200) * 0.1
series2 = np.cos(0.2 * np.arange(200)) + np.random.randn(200) * 0.1
series3 = np.sin(0.3 * np.arange(200)) + np.cos(0.1 * np.arange(200)) + np.random.randn(200) * 0.1

# Prepare data for supervised learning
df = pd.DataFrame({'series1': series1, 'series2': series2, 'series3': series3})
for i in range(1, 4):
    df[f'series1_lag{i}'] = df['series1'].shift(i)
    df[f'series2_lag{i}'] = df['series2'].shift(i)
    df[f'series3_lag{i}'] = df['series3'].shift(i)
df = df.dropna()

X = df.drop(columns=['series1', 'series2', 'series3']).values
y = df['series1'].values  # Predict series1

# Chronological split of data into train and test sets
split_index = int(len(X) * 0.8)
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# Initialize an XGBRegressor model
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)

# Fit the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

This example extends the univariate time series forecasting example to handle multivariate data.

Here’s a step-by-step breakdown:

Generate synthetic multivariate time series data using sine and cosine waves with added noise.
Prepare the data for supervised learning by creating a DataFrame with the time series and generating lagged features for each input series (here, we use lags of 1, 2, and 3).
Split the data chronologically into train and test sets to maintain the temporal order.
Initialize an XGBRegressor model with chosen hyperparameters.
Fit the model on the training data using fit().
Make predictions on the test set using predict().
Evaluate the model’s performance using Mean Squared Error (MSE).

By modifying the data preparation and model architecture, you can adapt this example to handle various multivariate time series forecasting tasks. Hyperparameter tuning can further improve the model’s performance for specific use cases.

See Also