The eval_metric
parameter in XGBoost allows you to specify the evaluation metric used for monitoring the model’s performance during training.
Setting an eval_metric
requires setting an eval_set
argument when calling fit()
in sciki-learn or an evals
argument train()
in the native API. The eval_set
defines the data on which the model is evaluated and the eval_metric
is calculated.
The results of the eval_metric
on the eval_set
can then be retrieve via the evals_result()
method in the scikit-learn API.
Properly setting the eval_metric
is crucial for effective model evaluation and optimization, especially when using early stopping.
Example of eval_metric
in scikit-learn
This example demonstrates how to configure the eval_metric
parameter for various problem types and showcases its impact on model training with the scikit-learn API.
from sklearn.datasets import fetch_california_housing, load_breast_cancer, load_iris
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor, XGBClassifier
# Regression example
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)
reg_model = XGBRegressor(n_estimators=100, eval_metric='rmse', early_stopping_rounds=10, random_state=42)
reg_model.fit(X_train, y_train, eval_set=[(X_test, y_test)])
# Binary classification example
breast_cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.2, random_state=42)
bin_model = XGBClassifier(n_estimators=100, eval_metric='logloss', early_stopping_rounds=10, random_state=42)
bin_model.fit(X_train, y_train, eval_set=[(X_test, y_test)])
# Multi-class classification example
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
multi_model = XGBClassifier(n_estimators=100, eval_metric='merror', early_stopping_rounds=10, random_state=42)
multi_model.fit(X_train, y_train, eval_set=[(X_test, y_test)])
Example of eval_metric
in Native API
This example demonstrates how to configure the eval_metric
parameter for various problem types and showcases its impact on model training with the XGBoost native API.
import xgboost as xgb
from sklearn.datasets import fetch_california_housing, load_breast_cancer, load_iris
from sklearn.model_selection import train_test_split
# Regression example
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {
'objective': 'reg:squarederror',
'eval_metric': 'rmse',
'seed': 42
}
reg_model = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtest, 'eval')], early_stopping_rounds=10)
# Binary classification example
breast_cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {
'objective': 'binary:logistic',
'eval_metric': 'logloss',
'seed': 42
}
bin_model = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtest, 'eval')], early_stopping_rounds=10)
# Multi-class classification example
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {
'objective': 'multi:softmax',
'num_class': 3,
'eval_metric': 'merror',
'seed': 42
}
multi_model = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtest, 'eval')], early_stopping_rounds=10)
Choosing an eval_metric
Choosing an appropriate evaluation metric based on the problem type and business objectives is essential.
- For regression problems, common choices include “rmse” (root mean squared error) and “mae” (mean absolute error).
- For binary classification, “error” (binary classification error), “logloss” (binary log loss), and “auc” (area under the receiver operating characteristic curve) are frequently used.
- For multi-class classification settings, “merror” (multi-class classification error) and “mlogloss” (multi-class log loss) are popular options.
The eval_metric
interacts with early stopping to determine the best_iteration
, which represents the optimal number of boosting rounds. Early stopping monitors the model’s performance on a validation set using the specified eval_metric
and halts training if no improvement is observed for a specified number of rounds. By setting the eval_metric
appropriately, you ensure that the model’s training progress is evaluated based on a metric that aligns with your optimization objective.
When selecting an eval_metric
, consider the following guidelines:
- For regression problems, use “rmse” or “mae” depending on whether you want to penalize large errors more heavily or treat all errors equally.
- For binary classification, use “error” for a simple accuracy metric, “logloss” for a probabilistic measure, or “auc” if ranking performance is important.
- For multi-class classification, use “merror” for a straightforward accuracy metric or “mlogloss” for a probabilistic measure.
It is crucial to align the eval_metric
with the model’s intended use case and optimization objective. By setting the eval_metric
appropriately, you ensure that the model’s performance is evaluated and optimized based on a metric that reflects your goals, leading to more effective model training and better generalization.