XGBoosting Home | About | Contact | Examples

Configure XGBoost "sampling_method" Parameter

The sampling_method parameter in XGBoost plays a critical role in how training data is sampled when building trees.

Proper configuration of this parameter can lead to improvements in training speed and model accuracy, making it a vital aspect for fine-tuning your XGBoost models.

from xgboost import XGBClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with specific sampling method
model = XGBClassifier(sampling_method='uniform', subsample=0.5, eval_metric='logloss')

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

The default sampling_method is 'uniform'. It can only be changed to 'gradient_based' if tree_method is set to 'hist' and device is set to 'cuda'.

Otherwise you will get an error like: “Only uniform sampling is supported, gradient-based sampling is only support by GPU Hist”.

from xgboost import XGBClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with specific sampling method
model = XGBClassifier(tree_method='hist', device='cuda', sampling_method='gradient_based', subsample=0.5, eval_metric='logloss')

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

Understanding the “sampling_method” Parameter

The sampling_method parameter in XGBoost determines how instances are sampled during the tree construction phase. This can significantly influence the behavior of the algorithm during training:

Choosing the Right “sampling_method” Value

Selecting the appropriate sampling method depends on your dataset and the specific challenges it presents:

Practical Tips



See Also