When working with regression tasks where the target variable is strictly positive, the Mean Absolute Percentage Error (MAPE) can be a more intuitive evaluation metric compared to the Root Mean Squared Error (RMSE). MAPE expresses the average prediction error as a percentage of the actual values, making it easier to interpret and communicate to non-technical stakeholders.
By setting eval_metric='mape'
in XGBoost, you can monitor your model’s performance in percentage terms during training and enable early stopping to prevent overfitting. Here’s an example of how to use MAPE as the evaluation metric with XGBoost and scikit-learn:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
import matplotlib.pyplot as plt
import numpy as np
# Generate a synthetic regression dataset with strictly positive target values
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
y = np.abs(y) + 1 # Ensure target values are strictly positive
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create an XGBRegressor with MAPE as the evaluation metric
model = XGBRegressor(n_estimators=100, eval_metric='mape', early_stopping_rounds=10, random_state=42)
# Train the model with early stopping
model.fit(X_train, y_train, eval_set=[(X_test, y_test)])
# Retrieve the MAPE values from the training process
results = model.evals_result()
epochs = len(results['validation_0']['mape'])
x_axis = range(0, epochs)
# Plot the MAPE values
plt.figure()
plt.plot(x_axis, results['validation_0']['mape'], label='Test')
plt.legend()
plt.xlabel('Number of Boosting Rounds')
plt.ylabel('MAPE')
plt.title('XGBoost MAPE Performance')
plt.show()
In this example, we generate a synthetic regression dataset using scikit-learn’s make_regression
function and ensure that the target values are strictly positive by taking the absolute value and adding 1. We then split the data into training and testing sets.
We create an instance of XGBRegressor
and set eval_metric='mape'
to specify MAPE as the evaluation metric. We also set early_stopping_rounds=10
to enable early stopping if the MAPE doesn’t improve for 10 consecutive rounds.
During training, we pass the testing set as the eval_set
to monitor the model’s performance on unseen data. After training, we retrieve the MAPE values using the evals_result()
method.
Finally, we plot the MAPE values against the number of boosting rounds to visualize the model’s performance during training. This plot helps us assess whether the model is overfitting or underfitting and determines the optimal number of boosting rounds.
By using MAPE as the evaluation metric, we can effectively monitor the model’s regression performance in percentage terms, prevent overfitting through early stopping, and select the best model based on the lowest MAPE value. This metric is particularly useful when the target variable is strictly positive and when percentage errors are more meaningful than absolute errors.