Configure XGBoost "min_split_loss" Parameter

Parameters

The min_split_loss parameter in XGBoost is an alias for the gamma parameter, which controls the minimum loss reduction required to make a split on a leaf node of the tree.

By adjusting min_split_loss, you can influence the model’s complexity and its ability to generalize.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with a higher min_split_loss value
model = XGBClassifier(min_split_loss=0.5, eval_metric='logloss')

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

As discussed in the tip on configuring the gamma parameter, min_split_loss is a regularization term that governs the minimum loss reduction needed for a split to occur. It specifies the minimum improvement in the model’s objective function that a new partition must bring to justify its creation. min_split_loss is a non-negative value, and higher values make the model more conservative.

To recap, the key points when configuring the min_split_loss parameter are:

Valid range: Non-negative values
Default value: 0
Impact on model complexity:
- Higher values lead to a simpler model with fewer splits, which may be less prone to overfitting but might underfit the data
- Lower values result in a more complex model with more splits, capable of capturing intricate patterns but potentially overfitting
Interaction with other regularization parameters:
- min_split_loss works in conjunction with min_child_weight and max_depth to control model complexity
- Tuning these parameters together can help find the right balance between overfitting and underfitting

For practical guidance on choosing the right min_split_loss value, refer to the tip on configuring the gamma parameter.

See Also