XGBoost Power Transform Time Series Data

Time series data often exhibits nonstationary behavior, where the statistical properties (e.g., mean, variance, or seasonality) change over time. However, many forecasting models, including XGBoost, assume stationarity for optimal performance.

One way to tackle nonstationarity is to apply a power transform, such as the Box-Cox transform, which can help stabilize the variance and make the series more stationary.

This example demonstrates how to use the Box-Cox power transform to make a nonstationary univariate time series stationary, prepare the transformed data for supervised learning with lagged features, and train an XGBoost model to forecast future values.

# XGBoosting.com
# Apply Box-Cox Power Transform to Make a Time Series Stationary for XGBoost Forecasting
import numpy as np
import pandas as pd
from scipy.stats import boxcox
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Generate a nonstationary synthetic time series dataset
np.random.seed(42)
series = np.sin(0.1 * np.arange(500)) + 0.1 * np.arange(500) + np.random.randn(500) * 0.1

# Apply Box-Cox power transform to make the series stationary
transformed_series, _ = boxcox(series + 1)  # Add 1 to avoid negative values

# Prepare data for supervised learning
df = pd.DataFrame(transformed_series, columns=['value'])
lag_orders = [1, 2, 3]  # Experiment with different lag orders
for lag in lag_orders:
   df[f'lag_{lag}'] = df['value'].shift(lag)
df = df.dropna()

X = df.drop('value', axis=1).values
y = df['value'].values

# Chronological split of data into train and test sets
split_index = int(len(X) * 0.8)  # 80% of data for training
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# Initialize an XGBRegressor model
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)

# Fit the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

Here’s what the code does step-by-step:

Generate a nonstationary synthetic time series using scikit-learn’s make_friedman1 function.
Apply the Box-Cox power transform using scipy.stats.boxcox to make the series more stationary.
Prepare the transformed data for supervised learning by creating a DataFrame with the transformed series and lagged features.
Split the data chronologically into train and test sets.
Initialize an XGBRegressor model, fit it on the training data, and make predictions on the test set.
Evaluate the model’s performance using Mean Squared Error (MSE).

The Box-Cox power transform is a versatile technique for making nonstationary time series more stationary by stabilizing the variance. However, it’s essential to note that the appropriate transformation parameter (lambda) may vary for different series, and the transformed series might not always be perfectly stationary. Always visualize your data and consider the problem context when applying power transforms.

This example serves as a foundation for using power transforms with XGBoost for nonstationary time series forecasting. You can extend it to handle more complex scenarios by experimenting with different transform parameters, incorporating additional features, or using more advanced model architectures.

See Also