This example demonstrates how to evaluate an XGBoost model for time series forecasting using walk-forward validation, a technique that assesses the model’s performance on unseen data by iteratively splitting the data into train and test sets. We’ll use a synthetic time series dataset to illustrate the process.
Walk-forward validation is crucial for time series forecasting because it mimics the real-world scenario where models are trained on historical data and used to make predictions on future, unseen data.
By using this validation method, we can get a more realistic estimate of the model’s performance.
# XGBoosting.com
# Evaluate XGBoost for Time Series Forecasting Using Walk-Forward Validation
import numpy as np
import pandas as pd
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
# Generate a synthetic time series dataset
series = np.sin(0.1 * np.arange(200)) + np.random.randn(200) * 0.1
# Prepare data for supervised learning
df = pd.DataFrame(series, columns=['value'])
for i in range(1, 4):
df[f'lag_{i}'] = df['value'].shift(i)
df = df.dropna()
X = df.drop(columns=['value']).values
y = df['value'].values
# Define the number of lags and the test size for each iteration
n_lags = 3
n_test = 1
# Initialize lists to store predictions and actual values
predictions = []
actual = []
# Perform walk-forward validation
for i in range(len(X) - n_lags - n_test + 1):
X_train, X_test = X[i:i+n_lags], X[i+n_lags:i+n_lags+n_test]
y_train, y_test = y[i:i+n_lags], y[i+n_lags:i+n_lags+n_test]
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
predictions.extend(y_pred)
actual.extend(y_test)
# Calculate the Mean Squared Error
mse = mean_squared_error(actual, predictions)
print(f"Mean Squared Error: {mse:.4f}")
In this example, we:
- Generate a synthetic time series dataset using a sine wave with added noise.
- Prepare the data for supervised learning by creating lagged features.
- Define the number of lags (
n_lags
) and the test size (n_test
) for each iteration of the walk-forward validation. - Initialize lists to store the predictions and actual values.
- Perform walk-forward validation:
- Split the data into train and test sets for each iteration
- Train the XGBoost model on the training set
- Make one-step predictions on the test set
- Store the predictions and actual values
- Calculate the Mean Squared Error (MSE) on the collected predictions and actual values.
By using walk-forward validation, we can assess how well the XGBoost model generalizes to unseen data and get a more realistic estimate of its performance for time series forecasting tasks. This example can be easily adapted to work with real-world time series datasets and extended to include additional evaluation metrics or visualization of the results.