XGBoosting Home | About | Contact | Examples

XGBoost for Univariate Time Series Forecasting

XGBoost can be effectively used for time series forecasting tasks, especially for univariate (1D) time series data.

This example demonstrates how to train an XGBoost model to forecast future values of a 1-dimensional time series using a synthetic dataset.

We’ll cover data preparation, model initialization, training, and making predictions.

# XGBoosting.com
# Train an XGBoost Model for Univariate Time Series Forecasting
import numpy as np
import pandas as pd
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Generate a synthetic 1D time series dataset
series = np.sin(0.1 * np.arange(200)) + np.random.randn(200) * 0.1

# Prepare data for supervised learning
df = pd.DataFrame(series, columns=['value'])
df['value_lag1'] = df['value'].shift(1)
df = df.dropna()

X = df[['value_lag1']].values
y = df['value'].values

# Chronological split of data into train and test sets
split_index = int(len(X) * 0.8)  # 80% of data for training
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# Initialize an XGBRegressor model
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)

# Fit the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

Here’s a step-by-step breakdown:

  1. Generate a synthetic 1D time series using a sine wave with added noise.
  2. Prepare the data for supervised learning by creating lagged features (here, we use a lag of 1).
  3. Split the data into train and test sets using train_test_split chronologically to avoid training on the future or testing on the past.
  4. Initialize an XGBRegressor model with chosen hyperparameters.
  5. Fit the model on the training data using fit().
  6. Make predictions on the test set using predict().
  7. Evaluate the model’s performance using a metric like Mean Squared Error (MSE).

This example provides a foundation for using XGBoost in time series forecasting tasks. You can extend it to handle more complex scenarios, such as multi-step forecasting or multivariate time series, by modifying the data preparation and model architecture accordingly.



See Also