XGBoosting Home | About | Contact | Examples

Configure XGBoost "colsample_bylevel" Parameter

The colsample_bylevel parameter in XGBoost controls the fraction of features (columns) sampled for each level (depth) of the tree. By adjusting colsample_bylevel, you can influence the model’s performance and its ability to generalize.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with a colsample_bylevel value
model = XGBClassifier(colsample_bylevel=0.8, eval_metric='logloss')

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

Understanding the “colsample_bylevel” Parameter

The colsample_bylevel parameter determines the fraction of features (columns) to be randomly sampled at each level (depth) of the tree during the model’s training process. It is a regularization technique that can help prevent overfitting by reducing the number of features each level of the tree can access, thus encouraging the model to rely on different subsets of features at different depths. colsample_bylevel accepts values between 0 and 1, with 1 meaning that all features are available for each level. The default value of colsample_bylevel in XGBoost is 1.

Choosing the Right “colsample_bylevel” Value

The value of colsample_bylevel affects the model’s performance and its propensity to overfit:

Practical Tips



See Also