XGBoost Configure "poisson-nloglik" Eval Metric

When working with XGBoost for Poisson regression tasks, where the target variable represents count data and follows a Poisson distribution, the Poisson-NegLogLik (negative log-likelihood) is an appropriate evaluation metric.

By setting eval_metric='poisson-nloglik', you can monitor your model’s performance during training and assess how well it captures the Poisson distribution of the target variable.

Here’s an example of how to use Poisson-NegLogLik as the evaluation metric with XGBoost and scikit-learn:

from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
import numpy as np
import matplotlib.pyplot as plt

# Generate a synthetic Poisson regression dataset
np.random.seed(42)
n_samples = 1000
X = np.random.rand(n_samples, 5)
y = np.random.poisson(np.exp(X.sum(axis=1)))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an XGBRegressor with Poisson-NegLogLik as the evaluation metric
model = XGBRegressor(n_estimators=100, eval_metric='poisson-nloglik', random_state=42)

# Train the model
model.fit(X_train, y_train, eval_set=[(X_test, y_test)])

# Retrieve the Poisson-NegLogLik values from the training process
results = model.evals_result()
epochs = len(results['validation_0']['poisson-nloglik'])
x_axis = range(0, epochs)

# Plot the Poisson-NegLogLik values
plt.figure()
plt.plot(x_axis, results['validation_0']['poisson-nloglik'], label='Test')
plt.legend()
plt.xlabel('Number of Boosting Rounds')
plt.ylabel('Poisson-NegLogLik')
plt.title('XGBoost Poisson-NegLogLik Performance')
plt.show()

In this example, we generate a synthetic Poisson regression dataset by sampling random features and computing the target variable using the exponential of the sum of the features. We then split the data into training and testing sets.

We create an instance of XGBRegressor and set eval_metric='poisson-nloglik' to specify Poisson-NegLogLik as the evaluation metric.

During training, we pass the testing set as the eval_set to monitor the model’s performance on unseen data. After training, we retrieve the Poisson-NegLogLik values using the evals_result() method.

Finally, we plot the Poisson-NegLogLik values against the number of boosting rounds to visualize the model’s performance during training. This plot helps us assess how well the model is capturing the Poisson distribution of the target variable.

By using Poisson-NegLogLik as the evaluation metric, we can effectively monitor the model’s performance for Poisson regression tasks and select the best model based on the lowest Poisson-NegLogLik value.

See Also