Halving Random Search for XGBoost Hyperparameters

Halving random search is an efficient alternative to grid search for finding optimal XGBoost hyperparameters.

It progressively discards less promising hyperparameter configurations, focusing computational resources on more promising candidates.

This tip demonstrates how to perform halving random search using the HalvingRandomSearchCV class from scikit-learn.

At the time of writing, halving random search is an experimental feature and requires an additional import to use:

from sklearn.experimental import enable_halving_search_cv

Example of halving random search:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingRandomSearchCV
from xgboost import XGBClassifier
from scipy.stats import randint, uniform

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define parameter distributions
param_dist = {
    'max_depth': randint(3, 10),
    'min_child_weight': randint(1, 6),
    'subsample': uniform(0.5, 0.5),
    'colsample_bytree': uniform(0.5, 0.5),
    'learning_rate': uniform(0.01, 0.3)
}

# Create XGBoost classifier
xgb = XGBClassifier(n_estimators=100, objective='binary:logistic', random_state=42)

# Perform halving random search
halving_search = HalvingRandomSearchCV(estimator=xgb, param_distributions=param_dist,
                                       factor=3, resource='n_estimators',
                                       max_resources=100, random_state=42)
halving_search.fit(X_train, y_train)

# Print best parameters
print(f"Best parameters: {halving_search.best_params_}")
print(f"Best score: {halving_search.best_score_}")

In this example:

We load the breast cancer dataset from scikit-learn and split it into train and test sets.
We define the parameter distributions param_dist using the randint and uniform functions from scipy.stats. These distributions will be sampled to generate hyperparameter configurations. Here, we include max_depth, min_child_weight, subsample, colsample_bytree, and learning_rate, but you can adjust these based on your needs.
We create an instance of the XGBoost classifier XGBClassifier with some basic parameters.
We create a HalvingRandomSearchCV object halving_search, passing in the XGBoost classifier, parameter distributions, and halving search specific parameters:
- factor: The reduction factor for discarding less promising candidates (default: 3).
- resource: The resource to allocate at each iteration. Here, we use 'n_estimators', which means the number of boosting rounds will be increased at each iteration.
- max_resources: The maximum amount of resource to allocate. In this case, it’s the maximum number of estimators.
We fit halving_search to the training data. This will progressively allocate more resources (estimators) to promising configurations while discarding less promising ones.
Finally, we print the best parameters and the corresponding best score.

Compared to grid search, halving random search can often find good hyperparameters with fewer total iterations. It’s particularly useful when you have a large hyperparameter space to explore and limited computational resources. However, the effectiveness of halving search depends on the specific problem and the hyperparameter distributions chosen. It’s a good idea to experiment with different settings to find what works best for your task.

See Also