Configure XGBoost "reg:squaredlogerror" Objective

The "reg:squaredlogerror" objective in XGBoost is used for regression tasks when the target variable is continuous and strictly positive.

It minimizes the squared logarithmic error between the predicted and actual values, making it suitable for cases where the target values span a wide range or follow an exponential distribution.

import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_log_error

# Generate a synthetic dataset with positive target values
n_samples = 1000
n_features = 10
X = np.random.rand(n_samples, n_features)
y = np.exp(np.random.rand(n_samples))

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize an XGBRegressor with the "reg:squaredlogerror" objective
model = XGBRegressor(objective="reg:squaredlogerror", n_estimators=100, learning_rate=0.1)

# Fit the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate the mean squared logarithmic error of the predictions
msle = mean_squared_log_error(y_test, y_pred)
print(f"Mean Squared Logarithmic Error: {msle:.4f}")

The "reg:squaredlogerror" objective minimizes the squared difference between the logarithm of the predicted values and the logarithm of the actual values. This is equivalent to minimizing the mean squared logarithmic error (MSLE) loss function.

This objective is suitable when the target variable is continuous, strictly positive, and the goal is to minimize the relative error between predictions and actual values. It is particularly useful when the target values span several orders of magnitude or follow an exponential distribution.

When using the "reg:squaredlogerror" objective, consider the following tips:

Ensure that the target variable is continuous and strictly positive (greater than zero).
If the target variable spans several orders of magnitude, consider log transforming it before training the model.
Scale the input features to a similar range to improve convergence and model performance.
Use appropriate evaluation metrics for regression with positive values, such as RMSLE or MAPE, to assess the model’s performance.
Tune hyperparameters like learning_rate, max_depth, and n_estimators to optimize performance.

See Also