XGBoosting Home | About | Contact | Examples

XGBoost Detrend Transform Time Series Data

Many real-world time series exhibit nonstationary behavior, where the mean, variance, or both change over time.

This can be due to factors like trends, seasonality, or structural breaks. However, most forecasting models, including XGBoost, assume stationarity. One way to handle nonstationarity is to apply detrending, which removes the underlying trend from the series, making it stationary.

This example demonstrates how to use detrending to make a nonstationary univariate time series stationary, prepare the detrended data for supervised learning with lagged features, and train an XGBoost model to forecast future values.

# XGBoosting.com
# Apply Detrending to Make a Time Series Stationary for XGBoost Forecasting
from sklearn.datasets import make_regression
from scipy.signal import detrend
import pandas as pd
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Generate a nonstationary synthetic time series dataset
X, y = make_regression(n_samples=1000, n_features=1, noise=5, random_state=42)
X = X.flatten()
y = y + 0.01 * X  # Add a linear trend to make it nonstationary

# Apply detrending to remove the linear trend
detrended_y = detrend(y)

# Prepare data for supervised learning
df = pd.DataFrame({'y': detrended_y, 'X': X})
df['y_lag1'] = df['y'].shift(1)
df['y_lag2'] = df['y'].shift(2)
df['y_lag3'] = df['y'].shift(3)
df = df.dropna()

X_features = df[['y_lag1', 'y_lag2', 'y_lag3']].values
y_target = df['y'].values

# Chronological split of data into train and test sets
split_index = int(len(X_features) * 0.8)  # 80% of data for training
X_train, X_test = X_features[:split_index], X_features[split_index:]
y_train, y_test = y_target[:split_index], y_target[split_index:]

# Initialize an XGBRegressor model
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)

# Fit the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

Here’s what the code does step-by-step:

  1. Generate a nonstationary synthetic time series using scikit-learn’s make_regression function and add a linear trend to make it nonstationary.
  2. Apply detrending using detrend() from scipy to remove the linear trend and make the series stationary.
  3. Prepare the detrended data for supervised learning by creating a DataFrame with the detrended series and lagged features (lags of 1, 2, and 3).
  4. Split the data chronologically into train and test sets.
  5. Initialize an XGBRegressor model, fit it on the training data, and make predictions on the test set.
  6. Evaluate the model’s performance using Mean Squared Error (MSE).

Detrending is a useful technique for making nonstationary time series stationary, which is a requirement for many forecasting models. However, it’s important to visualize your data before and after detrending to understand its effects. Detrending assumes that the underlying trend is linear, so it may not be suitable for series with more complex trends or seasonality. In such cases, other techniques like differencing or decomposition may be more appropriate.

This example provides a starting point for using detrending with XGBoost for nonstationary time series forecasting. You can extend it to handle more complex scenarios by experimenting with different trend models, incorporating additional features, or using more advanced model architectures.



See Also