Configure XGBoost "grow_policy" Parameter

Parameters

The grow_policy parameter in XGBoost determines how the trees are grown during the training process.

By setting this parameter, you can influence the structure of the resulting trees and potentially improve the model’s performance.

The grow_policy parameter requires that the tree_method parameter be set to 'approx' or 'dist', e.g. not 'exact'.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Configure the XGBoost model with a specific grow_policy
model = XGBClassifier(grow_policy='lossguide', tree_method='approx', eval_metric='logloss')

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

Understanding the “grow_policy” Parameter

The grow_policy parameter accepts two possible values: “depthwise” and “lossguide”.

“depthwise” (default): This policy grows the trees depth-wise, meaning it prioritizes achieving the maximum depth specified by the max_depth parameter. The tree growth process splits nodes until the maximum depth is reached or no further splits are possible due to other constraints (e.g., min_child_weight). This growth policy tends to create deeper, more complex trees.
“lossguide”: This policy grows the trees based on the loss reduction. It chooses the splits that lead to the greatest reduction in the loss function. The tree growth process continues until the loss reduction is below a certain threshold or other stopping criteria are met. This growth policy often results in shallower trees compared to “depthwise”.

Choosing the Right “grow_policy” Value

When deciding between “depthwise” and “lossguide” growth policies, consider the following:

If interpretability is a priority and you want to create deeper, more complex trees, use the “depthwise” policy. This can be useful when you need to understand the decision-making process of the model.
If performance is the main goal and you want to create shallower trees that focus on the most important splits, use the “lossguide” policy. This can lead to faster training times and potentially better generalization.

Keep in mind that the optimal grow_policy value may depend on the specific problem and dataset you are working with.

Practical Tips

Start with the default “grow_policy” value (“depthwise”) and compare its performance to the alternative (“lossguide”) using cross-validation or a separate validation set.
Consider the interaction between grow_policy and other tree-related parameters. For example, when using “depthwise”, you may need to adjust max_depth and min_child_weight to control the tree complexity. With “lossguide”, these parameters may have less impact.
Monitor the model’s performance and examine the resulting tree structures to understand how the chosen grow_policy affects the model’s behavior and interpretability.

Understanding the “grow_policy” Parameter

Choosing the Right “grow_policy” Value

Practical Tips

See Also