Configure XGBoost "gamma" Parameter

Parameters

The gamma parameter in XGBoost controls the minimum loss reduction required to make a split on a leaf node of the tree.

By adjusting gamma, you can influence the model’s complexity and its ability to generalize.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with a higher gamma value
model = XGBClassifier(gamma=0.5, eval_metric='logloss')

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

An alias for the gamma parameter is min_split_loss.

For example:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with a higher gamma value
model = XGBClassifier(min_split_loss=0.5, eval_metric='logloss')

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

Understanding the “gamma” Parameter

The gamma parameter is a regularization term that governs the minimum loss reduction needed for a split to occur.

In other words, it specifies the minimum improvement in the model’s objective function that a new partition must bring to justify its creation. gamma is a non-negative value, and higher values make the model more conservative.

The default value of gamma in XGBoost is 0.

Choosing the Right “gamma” Value

The value of gamma affects the model’s complexity and its propensity to overfit:

Higher gamma values make the model more conservative by requiring a larger minimum loss reduction to create a new split. This can lead to a simpler model with fewer splits, which may be less prone to overfitting but might underfit the data.
Lower gamma values make the model more liberal by allowing splits with smaller loss reductions. This can result in a more complex model with a higher number of splits, capable of capturing more intricate patterns but potentially overfitting the data.

When setting gamma, consider the trade-off between model complexity and performance:

A higher gamma may be appropriate if you suspect your model is overfitting, as it will constrain the model’s complexity.
A lower gamma may be beneficial if you believe your model is underfitting the data and you want to allow for more granular splits.

Practical Tips

Start with the default gamma value and adjust it based on the model’s performance on a validation set.
Use cross-validation to find the optimal gamma value that strikes a balance between model complexity and generalization.
Keep in mind that gamma interacts with other regularization parameters, such as min_child_weight and max_depth. Tuning these parameters together can help you find the right balance.
Monitor your model’s performance on a separate validation set to detect signs of overfitting (high training performance, low validation performance) or underfitting (low performance on both sets).

Understanding the “gamma” Parameter

Choosing the Right “gamma” Value

Practical Tips

See Also