Configure XGBoost "eval_metric" Parameter

The eval_metric parameter in XGBoost allows you to specify the evaluation metric used for monitoring the model’s performance during training.

Setting an eval_metric requires setting an eval_set argument when calling fit() in sciki-learn or an evals argument train() in the native API. The eval_set defines the data on which the model is evaluated and the eval_metric is calculated.

The results of the eval_metric on the eval_set can then be retrieve via the evals_result() method in the scikit-learn API.

Properly setting the eval_metric is crucial for effective model evaluation and optimization, especially when using early stopping.

Example of `eval_metric` in scikit-learn

This example demonstrates how to configure the eval_metric parameter for various problem types and showcases its impact on model training with the scikit-learn API.

from sklearn.datasets import fetch_california_housing, load_breast_cancer, load_iris
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor, XGBClassifier

# Regression example
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)
reg_model = XGBRegressor(n_estimators=100, eval_metric='rmse', early_stopping_rounds=10, random_state=42)
reg_model.fit(X_train, y_train, eval_set=[(X_test, y_test)])

# Binary classification example
breast_cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.2, random_state=42)
bin_model = XGBClassifier(n_estimators=100, eval_metric='logloss', early_stopping_rounds=10, random_state=42)
bin_model.fit(X_train, y_train, eval_set=[(X_test, y_test)])

# Multi-class classification example
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
multi_model = XGBClassifier(n_estimators=100, eval_metric='merror', early_stopping_rounds=10, random_state=42)
multi_model.fit(X_train, y_train, eval_set=[(X_test, y_test)])

Example of `eval_metric` in Native API

This example demonstrates how to configure the eval_metric parameter for various problem types and showcases its impact on model training with the XGBoost native API.

import xgboost as xgb
from sklearn.datasets import fetch_california_housing, load_breast_cancer, load_iris
from sklearn.model_selection import train_test_split

# Regression example
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {
    'objective': 'reg:squarederror',
    'eval_metric': 'rmse',
    'seed': 42
}
reg_model = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtest, 'eval')], early_stopping_rounds=10)

# Binary classification example
breast_cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'seed': 42
}
bin_model = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtest, 'eval')], early_stopping_rounds=10)

# Multi-class classification example
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {
    'objective': 'multi:softmax',
    'num_class': 3,
    'eval_metric': 'merror',
    'seed': 42
}
multi_model = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtest, 'eval')], early_stopping_rounds=10)

Choosing an `eval_metric`

Choosing an appropriate evaluation metric based on the problem type and business objectives is essential.

For regression problems, common choices include “rmse” (root mean squared error) and “mae” (mean absolute error).
For binary classification, “error” (binary classification error), “logloss” (binary log loss), and “auc” (area under the receiver operating characteristic curve) are frequently used.
For multi-class classification settings, “merror” (multi-class classification error) and “mlogloss” (multi-class log loss) are popular options.

The eval_metric interacts with early stopping to determine the best_iteration, which represents the optimal number of boosting rounds. Early stopping monitors the model’s performance on a validation set using the specified eval_metric and halts training if no improvement is observed for a specified number of rounds. By setting the eval_metric appropriately, you ensure that the model’s training progress is evaluated based on a metric that aligns with your optimization objective.

When selecting an eval_metric, consider the following guidelines:

For regression problems, use “rmse” or “mae” depending on whether you want to penalize large errors more heavily or treat all errors equally.
For binary classification, use “error” for a simple accuracy metric, “logloss” for a probabilistic measure, or “auc” if ranking performance is important.
For multi-class classification, use “merror” for a straightforward accuracy metric or “mlogloss” for a probabilistic measure.

It is crucial to align the eval_metric with the model’s intended use case and optimization objective. By setting the eval_metric appropriately, you ensure that the model’s performance is evaluated and optimized based on a metric that reflects your goals, leading to more effective model training and better generalization.

Example of eval_metric in scikit-learn

Example of eval_metric in Native API

Choosing an eval_metric

See Also

Example of `eval_metric` in scikit-learn

Example of `eval_metric` in Native API

Choosing an `eval_metric`