XGBoost for Time Series Predict Out-Of-Sample

Forecasting future values is a critical aspect of time series analysis.

XGBoost, a powerful gradient boosting framework, can be effectively used to make out-of-sample forecasts.

In this example, we’ll walk through the process of preparing data, training an XGBoost model, and making predictions on unseen future time steps.

import numpy as np
from xgboost import XGBRegressor

# Generate a synthetic time series dataset
def generate_time_series_data(n_steps, n_features):
    X = np.random.rand(n_steps, n_features)
    y = 0.5 * X[:, 0] - 0.2 * X[:, 1] + 0.1 * np.random.randn(n_steps)
    return X, y

# Set the number of time steps and features
n_steps = 1000
n_features = 3

# Generate the dataset
X, y = generate_time_series_data(n_steps, n_features)

# Split the data into training and testing sets
train_size = int(0.8 * n_steps)
X_train, y_train = X[:train_size], y[:train_size]
X_test, y_test = X[train_size:], y[train_size:]

# Create and train the XGBoost model
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X_train, y_train)

# Make out-of-sample predictions
out_of_sample_size = 10
X_out_of_sample = np.random.rand(out_of_sample_size, n_features)
y_pred_out_of_sample = model.predict(X_out_of_sample)

# Evaluate the model's performance (example metric: MAE)
mae = np.mean(np.abs(y_pred_out_of_sample - y_test[:out_of_sample_size]))
print(f"Mean Absolute Error (MAE) on out-of-sample data: {mae:.4f}")

In this example, we generate a synthetic time series dataset using a linear combination of features and random noise. The dataset consists of 1000 time steps, each with 3 features.

We split the dataset into training and testing sets, using 80% of the data for training and the remaining 20% for testing. The testing set represents the out-of-sample data that the model has not seen during training.

Next, we create an XGBRegressor model and train it on the training data using 100 estimators and a learning rate of 0.1.

To make out-of-sample predictions, we generate a new set of input features (X_out_of_sample) with the desired number of future time steps. In this example, we generate 10 out-of-sample time steps. We then use the trained model to predict the corresponding target values (y_pred_out_of_sample).

Finally, we evaluate the model’s performance on the out-of-sample data using an appropriate metric. In this case, we calculate the Mean Absolute Error (MAE) between the predicted values and the actual out-of-sample values from the testing set.

By following this approach, you can leverage XGBoost to make accurate out-of-sample forecasts on time series data. Remember to preprocess your data, select appropriate features, and tune the model’s hyperparameters for optimal performance.

See Also