XGBoost Compare "lambda" vs "reg_lambda" Parameters

Parameters

Both the lambda and reg_lambda parameters in XGBoost control the L2 regularization term, which helps to prevent overfitting by constraining the model’s complexity.

The lambda parameter is preferred in the native XGBoost API, while reg_lambda is used in the scikit-learn API, conforming to the scikit-learn convention.

The lambda parameter cannot be used directly as an argument in scikit-learn as it will cause an SyntaxError as it will be confused with the lambda expression (anonymous function) in Python. Instead, the lambda parameter can be used in scikit-learn by providing model parameters as a dict.

This example demonstrates how to use both parameters and confirms that they have the same effect on the model’s performance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Set up parameters for XGBoost
params_lambda = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'lambda': 1
}

params_reg_lambda = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'reg_lambda': 1
}

# Create two XGBoost classifiers, one using "lambda" and the other using "reg_lambda"
model_lambda = XGBClassifier(**params_lambda)
model_reg_lambda = XGBClassifier(**params_reg_lambda)

# Train both models on the training set
model_lambda.fit(X_train, y_train)
model_reg_lambda.fit(X_train, y_train)

# Make predictions on the test set
predictions_lambda = model_lambda.predict(X_test)
predictions_reg_lambda = model_reg_lambda.predict(X_test)

# Compare the results
assert (predictions_lambda == predictions_reg_lambda).all()

The example below demonstrates the same functionality using the native XGBoost API with DMatrix:

import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert data to DMatrix
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set up parameters for XGBoost
params_lambda = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'lambda': 1
}

params_reg_lambda = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'reg_lambda': 1
}

# Train the models
model_lambda = xgb.train(params_lambda, dtrain, num_boost_round=10)
model_reg_lambda = xgb.train(params_reg_lambda, dtrain, num_boost_round=10)

# Make predictions on the test set
predictions_lambda = model_lambda.predict(dtest).round()
predictions_reg_lambda = model_reg_lambda.predict(dtest).round()

# Compare the results
assert (predictions_lambda == predictions_reg_lambda).all()

The lambda and reg_lambda parameters serve the same purpose in XGBoost, controlling the L2 regularization term. A smaller value (e.g., 0.1) will allow the model to be more complex and potentially overfit, while a larger value (e.g., 10) will constrain the model’s complexity and help prevent overfitting.

The main difference between the two is the API in which they are used. The lambda parameter is used in the native XGBoost API, while reg_lambda is used in the scikit-learn API, conforming to the scikit-learn convention.

When working with XGBoost, it is recommended to use lambda when using the native XGBoost API and reg_lambda when using the scikit-learn API. The choice between lambda and reg_lambda ultimately depends on the API being used and personal preference.

See Also