XGBoosting Home | About | Contact | Examples

Configure XGBoost Dart "sample_type" Parameter

The sample_type parameter in XGBoost’s Dart Booster controls how dropped trees are selected during the model training process.

There are two options for this parameter:

The choice of sample_type can impact the model’s performance and generalization ability.

Let’s demonstrate this using a synthetic multiclass classification dataset:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

# Generate a synthetic multiclass classification dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=3, n_redundant=1,
                           n_features=5, random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize two XGBClassifier models with different sample_type settings
model_uniform = XGBClassifier(booster='dart', sample_type='uniform', rate_drop=0.1,
                              skip_drop=0.5, random_state=42)
model_weighted = XGBClassifier(booster='dart', sample_type='weighted', rate_drop=0.1,
                               skip_drop=0.5, random_state=42)

# Train the models
model_uniform.fit(X_train, y_train)
model_weighted.fit(X_train, y_train)

# Make predictions on test set
pred_uniform = model_uniform.predict(X_test)
pred_weighted = model_weighted.predict(X_test)

# Calculate accuracy scores
acc_uniform = accuracy_score(y_test, pred_uniform)
acc_weighted = accuracy_score(y_test, pred_weighted)

print(f"Accuracy (sample_type='uniform'): {acc_uniform:.4f}")
print(f"Accuracy (sample_type='weighted'): {acc_weighted:.4f}")

In this example, we generate a synthetic multiclass classification dataset using scikit-learn’s make_classification() function. We then split the data into training and testing sets.

Next, we initialize two XGBClassifier models with the Dart Booster, setting sample_type='uniform' for one model and sample_type='weighted' for the other. We keep the other hyperparameters the same for both models.

We train both models on the same training data using the fit() method, make predictions on the test set using predict(), and calculate the accuracy scores using scikit-learn’s accuracy_score() function.

Finally, we print the accuracy scores for both models to compare their performance.

By running this example and comparing the accuracy scores, you can see how the choice of sample_type affects the model’s performance for this specific dataset. Experiment with different datasets and hyperparameter settings to gain a better understanding of when to use 'uniform' or 'weighted' sampling in your XGBoost Dart Booster models.

See Also