XGBoosting Home | About | Contact | Examples

Configure XGBoost "max_leaves" Parameter

The max_leaves parameter in XGBoost controls the maximum number of leaf nodes allowed for each tree in the model, influencing the tree’s depth and complexity.

By adjusting max_leaves, you can fine-tune your model’s performance and prevent overfitting or underfitting.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

# Generate synthetic data
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost regressor with a max_leaves value
model = XGBRegressor(max_leaves=31, eval_metric='rmse')

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

Understanding the “max_leaves” Parameter

The max_leaves parameter sets the maximum number of leaf nodes allowed for each tree in the XGBoost model. Leaf nodes are the endpoints of the tree where predictions are made. By controlling the number of leaf nodes, max_leaves influences the depth and complexity of the trees:

The default value for max_leaves is 0.

The parameter is ignored when the tree_method is set to 'exact'.

Choosing the Right “max_leaves” Value

When setting max_leaves, consider the trade-off between model complexity and overfitting:

Start with a moderate value and adjust based on the model’s performance on a validation set. Use cross-validation to find the optimal max_leaves value that balances model performance and overfitting. Keep in mind that the optimal value may depend on the size and complexity of the dataset.

Practical Tips



See Also