XGBoost Configure "mae" Eval Metric

When working with regression tasks, Mean Absolute Error (MAE) is another popular evaluation metric alongside RMSE. MAE measures the average magnitude of the absolute differences between the predicted and actual values, treating all individual differences equally.

By setting eval_metric='mae', you can monitor your XGBoost model’s performance using MAE during training and enable early stopping to prevent overfitting. This example demonstrates how to use MAE as the evaluation metric with XGBoost and scikit-learn:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
import matplotlib.pyplot as plt

# Generate a synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an XGBRegressor with MAE as the evaluation metric
model = XGBRegressor(n_estimators=100, eval_metric='mae', early_stopping_rounds=10, random_state=42)

# Train the model with early stopping
model.fit(X_train, y_train, eval_set=[(X_test, y_test)])

# Retrieve the MAE values from the training process
results = model.evals_result()
epochs = len(results['validation_0']['mae'])
x_axis = range(0, epochs)

# Plot the MAE values
plt.figure()
plt.plot(x_axis, results['validation_0']['mae'], label='Test')
plt.legend()
plt.xlabel('Number of Boosting Rounds')
plt.ylabel('MAE')
plt.title('XGBoost MAE Performance')
plt.show()

In this example, we generate a synthetic regression dataset using scikit-learn’s make_regression function and split the data into training and testing sets.

We create an instance of XGBRegressor and set eval_metric='mae' to specify MAE as the evaluation metric. We also set early_stopping_rounds=10 to enable early stopping if the MAE doesn’t improve for 10 consecutive rounds.

During training, we pass the testing set as the eval_set to monitor the model’s performance on unseen data. After training, we retrieve the MAE values using the evals_result() method.

Finally, we plot the MAE values against the number of boosting rounds to visualize the model’s performance during training. This plot helps us assess whether the model is overfitting or underfitting and determines the optimal number of boosting rounds.

By using MAE as the evaluation metric, we can effectively monitor the model’s regression performance, prevent overfitting through early stopping, and select the best model based on the lowest MAE value. MAE provides a straightforward interpretation of the average magnitude of errors, making it a useful metric for understanding the model’s predictive accuracy.

See Also