Configure XGBoost "count:poisson" Objective

The "count:poisson" objective in XGBoost is used for modeling count data, where the target variable represents the number of occurrences of an event and is assumed to follow a Poisson distribution.

This objective is particularly useful for predicting non-negative integer quantities such as web traffic, sales counts, or the number of defects in a manufacturing process.

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_poisson_deviance
from xgboost import XGBRegressor
from scipy.stats import poisson
import numpy as np

# Generate a synthetic dataset simulating Poisson-distributed count data
np.random.seed(42)
n_samples = 1000
n_features = 5
X = np.random.rand(n_samples, n_features)
true_coefficients = np.random.rand(n_features)
mean_counts = np.exp(X @ true_coefficients)
y = 1 + poisson.rvs(mean_counts)


# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize an XGBRegressor with the "count:poisson" objective
model = XGBRegressor(objective="count:poisson", n_estimators=100, learning_rate=0.1)

# Fit the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate the Poisson deviance of the predictions
poisson_deviance = 2 * np.sum(y_test * np.log(y_test / y_pred) - (y_test - y_pred))
print(f"Poisson Deviance: {poisson_deviance:.4f}")

mpd = mean_poisson_deviance(y_test, y_pred)
print(f"Mean Poisson deviance: {mpd:.4f}")

The "count:poisson" objective minimizes the Poisson deviance between the predicted and actual values, which is a more suitable loss function for count data compared to squared error.

The Poisson deviance measures the difference between the observed and predicted counts, taking into account the Poisson distribution’s assumption that the variance is equal to the mean.

When using the "count:poisson" objective, consider the following tips:

Ensure that the target variable represents counts and is non-negative integers.
If the counts are very large, consider scaling them down to improve numerical stability.
Use appropriate evaluation metrics for count data, such as Poisson deviance or mean absolute error, rather than mean squared error.
XGBoost’s "count:poisson" objective can handle some degree of overdispersion in the counts, but if the data is highly overdispersed (variance much larger than the mean), consider using a negative binomial model instead.
Tune hyperparameters like max_depth, learning_rate, and n_estimators to optimize performance for your specific dataset.

See Also