XGBoost for Time Series Classification

This example demonstrates how to use XGBoost for time series classification with numeric inputs and a categorical target variable.

We’ll use a synthetic dataset generated using scikit-learn’s make_classification function to focus on the model implementation without getting bogged down in data preprocessing or domain-specific details.

We’ll cover data preparation, model training, and evaluation using classification metrics such as accuracy, precision, recall, and F1-score.

# XGBoosting.com
# XGBoost for Time Series Classification with Synthetic Data
import numpy as np
import pandas as pd
from xgboost import XGBClassifier
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_classes=3, random_state=42)

# Prepare data for time series classification
df = pd.DataFrame(X)
df['target'] = y
for i in range(1, 4):
    for j in range(20):
        df[f'feature_{j}_lag{i}'] = df[j].shift(i)
df = df.dropna()

X = df.drop(columns=[i for i in range(20)] + ['target']).values
y = df['target'].values

# Chronological split of data into train and test sets
split_index = int(len(X) * 0.8)
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# Initialize an XGBClassifier model
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)

# Fit the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-score: {f1:.4f}")

# Visualize the confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.colorbar()
tick_marks = np.arange(len(np.unique(y)))
plt.xticks(tick_marks, np.unique(y), rotation=45)
plt.yticks(tick_marks, np.unique(y))
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.tight_layout()
plt.show()

This example generates a synthetic dataset using make_classification with 1000 samples, 20 features, and 3 classes. The data is then prepared for time series classification by adding lagged features for each of the original features.

The data is split chronologically into train and test sets, and an XGBClassifier model is initialized with chosen hyperparameters. The model is trained on the training data using fit() and predictions are made on the test set using predict().

The model’s performance is evaluated using common classification metrics such as accuracy, precision, recall, and F1-score. Finally, a confusion matrix is visualized to provide a more detailed breakdown of the model’s performance across the different classes.

This example provides a starting point for using XGBoost for time series classification tasks. By modifying the data generation, feature engineering, and hyperparameters, you can adapt this example to suit your specific use case.

See Also