XGBoost can be used to fit survival analysis models, such as the Cox proportional hazards model, which predicts the risk of an event occurring over time.
This example demonstrates how to train an XGBoost Cox model using the scikit-learn API and a synthetic dataset generated with NumPy.
# XGBoosting.com
# Fit an XGBoost Cox Model for Survival Analysis using scikit-learn API
import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
# Generate a synthetic dataset with survival times and censoring indicators
n_samples = 1000
n_features = 10
X = np.random.rand(n_samples, n_features)
true_coef = np.random.rand(n_features)
survival_time = np.exp(np.dot(X, true_coef))
censoring = np.random.binomial(1, 0.9, n_samples)
# Initialize the XGBSurvivalAnalysis model
model = XGBRegressor(objective='survival:cox',
eval_metric='cox-nloglik',
tree_method='hist')
# Fit the model to the training data
model.fit(X, survival_time, sample_weight=censoring, verbose=False)
# Make predictions
predictions = model.predict(X)
print("Sample predictions:", predictions[:5])
With just a few steps, you can have a trained XGBoost Cox model:
- Generate a synthetic dataset with survival times and censoring indicators using NumPy.
- Initialize an
XGBRegressor
model with the'survival:cox'
objective. - Fit the model to your data using the
fit()
method, providing the feature matrixX
, survival times, and censoring indicators.
After training, the model can be used to make risk predictions on new data, allowing you to analyze survival patterns and factors influencing event occurrence over time.