XGBoost can be effectively used for time series forecasting tasks, especially for univariate (1D) time series data.
This example demonstrates how to train an XGBoost model to forecast future values of a 1-dimensional time series using a synthetic dataset.
We’ll cover data preparation, model initialization, training, and making predictions.
# XGBoosting.com
# Train an XGBoost Model for Univariate Time Series Forecasting
import numpy as np
import pandas as pd
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
# Generate a synthetic 1D time series dataset
series = np.sin(0.1 * np.arange(200)) + np.random.randn(200) * 0.1
# Prepare data for supervised learning
df = pd.DataFrame(series, columns=['value'])
df['value_lag1'] = df['value'].shift(1)
df = df.dropna()
X = df[['value_lag1']].values
y = df['value'].values
# Chronological split of data into train and test sets
split_index = int(len(X) * 0.8) # 80% of data for training
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]
# Initialize an XGBRegressor model
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
# Fit the model on the training data
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
Here’s a step-by-step breakdown:
- Generate a synthetic 1D time series using a sine wave with added noise.
- Prepare the data for supervised learning by creating lagged features (here, we use a lag of 1).
- Split the data into train and test sets using
train_test_split
chronologically to avoid training on the future or testing on the past. - Initialize an
XGBRegressor
model with chosen hyperparameters. - Fit the model on the training data using
fit()
. - Make predictions on the test set using
predict()
. - Evaluate the model’s performance using a metric like Mean Squared Error (MSE).
This example provides a foundation for using XGBoost in time series forecasting tasks. You can extend it to handle more complex scenarios, such as multi-step forecasting or multivariate time series, by modifying the data preparation and model architecture accordingly.