XGBoosting Home | About | Contact | Examples

Configure XGBoost "subsample" Parameter

The subsample parameter in XGBoost controls the fraction of observations used for each tree.

By adjusting subsample, you can influence the model’s performance and its ability to generalize.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with a subsample value
model = XGBClassifier(subsample=0.8, eval_metric='logloss')

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

Understanding the “subsample” Parameter

The subsample parameter determines the fraction of observations to be randomly sampled for each tree during the model’s training process. It is a regularization technique that can help prevent overfitting by introducing randomness into the training data. subsample accepts values between 0 and 1, with 1 meaning that all observations are used for each tree. The default value of subsample in XGBoost is 1.

Choosing the Right “subsample” Value

The value of subsample affects the model’s performance and its propensity to overfit:

When setting subsample, consider the trade-off between model performance and overfitting:

Practical Tips

See Also