XGBoost Evaluate Model for Time Series using Walk-Forward Validation

This example demonstrates how to evaluate an XGBoost model for time series forecasting using walk-forward validation, a technique that assesses the model’s performance on unseen data by iteratively splitting the data into train and test sets. We’ll use a synthetic time series dataset to illustrate the process.

Walk-forward validation is crucial for time series forecasting because it mimics the real-world scenario where models are trained on historical data and used to make predictions on future, unseen data.

By using this validation method, we can get a more realistic estimate of the model’s performance.

# XGBoosting.com
# Evaluate XGBoost for Time Series Forecasting Using Walk-Forward Validation
import numpy as np
import pandas as pd
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Generate a synthetic time series dataset
series = np.sin(0.1 * np.arange(200)) + np.random.randn(200) * 0.1

# Prepare data for supervised learning
df = pd.DataFrame(series, columns=['value'])
for i in range(1, 4):
    df[f'lag_{i}'] = df['value'].shift(i)
df = df.dropna()

X = df.drop(columns=['value']).values
y = df['value'].values

# Define the number of lags and the test size for each iteration
n_lags = 3
n_test = 1

# Initialize lists to store predictions and actual values
predictions = []
actual = []

# Perform walk-forward validation
for i in range(len(X) - n_lags - n_test + 1):
    X_train, X_test = X[i:i+n_lags], X[i+n_lags:i+n_lags+n_test]
    y_train, y_test = y[i:i+n_lags], y[i+n_lags:i+n_lags+n_test]

    model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    predictions.extend(y_pred)
    actual.extend(y_test)

# Calculate the Mean Squared Error
mse = mean_squared_error(actual, predictions)
print(f"Mean Squared Error: {mse:.4f}")

In this example, we:

Generate a synthetic time series dataset using a sine wave with added noise.
Prepare the data for supervised learning by creating lagged features.
Define the number of lags (n_lags) and the test size (n_test) for each iteration of the walk-forward validation.
Initialize lists to store the predictions and actual values.
Perform walk-forward validation:
- Split the data into train and test sets for each iteration
- Train the XGBoost model on the training set
- Make one-step predictions on the test set
- Store the predictions and actual values
Calculate the Mean Squared Error (MSE) on the collected predictions and actual values.

By using walk-forward validation, we can assess how well the XGBoost model generalizes to unseen data and get a more realistic estimate of its performance for time series forecasting tasks. This example can be easily adapted to work with real-world time series datasets and extended to include additional evaluation metrics or visualization of the results.

See Also