Adding rolling mean features to time series data can help capture short-term trends and improve the performance of forecasting models like XGBoost. This example demonstrates how to calculate rolling means and use them as additional features in an XGBoost model for time series forecasting, using a synthetic dataset to illustrate the process step-by-step.
import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from xgboost import XGBRegressor
# Generate a synthetic time series dataset
X, y = make_regression(n_samples=1000, n_features=1, noise=0.1, random_state=42)
X = X.flatten()
df = pd.DataFrame({'X': X, 'y': y})
# Calculate rolling means with different window sizes
df['rolling_mean_3'] = df['y'].rolling(window=3).mean()
df['rolling_mean_7'] = df['y'].rolling(window=7).mean()
df['rolling_mean_14'] = df['y'].rolling(window=14).mean()
# Prepare the data for supervised learning
df = df.dropna()
X = df[['X', 'rolling_mean_3', 'rolling_mean_7', 'rolling_mean_14']].values
y = df['y'].values
# Split the data into train and test sets chronologically
split_index = int(len(X) * 0.8)
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]
# Initialize an XGBRegressor model
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
# Fit the model on the training data
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
Here’s what the code does step-by-step:
- Generate a synthetic time series dataset using scikit-learn’s
make_regression
function. - Calculate rolling means with different window sizes (3, 7, and 14) using pandas’
rolling()
andmean()
functions. - Add the rolling means as new features to the original dataset.
- Prepare the data for supervised learning by dropping rows with missing values and creating input features (X) and target variable (y).
- Split the data chronologically into train and test sets.
- Initialize an
XGBRegressor
model, fit it on the training data, and make predictions on the test set. - Evaluate the model’s performance using Mean Squared Error (MSE).
Rolling means can help capture short-term trends and patterns in time series data, which can be valuable features for forecasting models. By including rolling means with different window sizes, the model can learn from trends at various time scales and potentially improve its predictions.
This example provides a starting point for incorporating rolling mean features into your XGBoost time series forecasting pipeline. You can extend it by experimenting with different window sizes, adding other types of features, or using more advanced data preparation techniques based on your specific problem and dataset characteristics.