XGBoosting Home | About | Contact | Examples

XGBoost Seasonal Difference Transform Time Series Data

Seasonal patterns and trends in time series data can pose challenges for forecasting models like XGBoost, which assume stationarity.

Seasonal differencing is a technique that can help remove these components, making the series stationary and more suitable for modeling.

This example demonstrates how to apply seasonal differencing to a synthetic time series dataset with seasonal patterns and trend, prepare the differenced data for supervised learning with lagged features, and train an XGBoost model to forecast future values.

# XGBoosting.com
# Apply Seasonal Differencing to Make a Time Series Stationary for XGBoost Forecasting
import numpy as np
import pandas as pd
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Generate a 1D synthetic time series dataset with seasonal patterns and trend
n_samples = 1000
time = np.arange(n_samples)
seasonal_pattern = np.sin(2 * np.pi * time / 50)  # Seasonal component with a period of 50
trend = time * 0.05  # Linear trend
noise = np.random.normal(loc=0, scale=0.5, size=n_samples)  # Random noise
series = seasonal_pattern + trend + noise

# Apply seasonal differencing to remove seasonality and trend
seasonal_period = 50  # Adjust this as needed based on the known seasonality of your dataset
diff_series = pd.Series(series).diff(seasonal_period).dropna()

# Prepare data for supervised learning
df = pd.DataFrame(diff_series, columns=['diff_value'])
for i in range(1, seasonal_period+1):
    df[f'diff_value_lag{i}'] = df['diff_value'].shift(i)
df = df.dropna()

X = df.drop('diff_value', axis=1).values
y = df['diff_value'].values

# Chronological split of data into train and test sets
split_index = int(len(X) * 0.8)  # 80% of data for training
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# Initialize an XGBRegressor model
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)

# Fit the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

Here’s what the code does step-by-step:

  1. Generate a synthetic time series dataset with seasonal patterns and trend adding a sine wave and linear trend.
  2. Apply seasonal differencing with a period of 50 using diff() to remove seasonality and trend, making the series stationary.
  3. Prepare the differenced data for supervised learning by creating a DataFrame with the differenced series and lagged features (lags 1 to 50).
  4. Split the data chronologically into train and test sets.
  5. Initialize an XGBRegressor model, fit it on the training data, and make predictions on the test set.
  6. Evaluate the model’s performance using Mean Squared Error (MSE).

Seasonal differencing is an effective technique for removing seasonal patterns and trends from time series data, making it stationary for forecasting models like XGBoost. The seasonal period used for differencing should be chosen based on the known or observed seasonality in the data.

This example provides a foundation for using seasonal differencing with XGBoost for time series forecasting. You can extend it to handle more complex scenarios by experimenting with different seasonal periods, incorporating additional features, or using more advanced model architectures.



See Also