Bayesian Optimization of XGBoost Hyperparameters with hyperopt

Bayesian optimization is a powerful approach for tuning the hyperparameters of machine learning models like XGBoost.

The hyperopt library is a popular choice for performing Bayesian optimization in Python, offering a flexible and efficient implementation of the Tree-structured Parzen Estimator (TPE) algorithm.

TPE builds a probability model of the objective function, which maps hyperparameters to a performance metric. It uses this model to select the next set of hyperparameters to evaluate, aiming to balance exploration (trying new hyperparameters) and exploitation (focusing on promising regions). After each evaluation, TPE refines the model based on the results, iteratively improving its estimates of the best hyperparameters.

Integrating hyperopt with XGBoost is straightforward. Here’s an example of how to use hyperopt to optimize XGBoost hyperparameters for a classification task:

First, install hyperopt using pip:

pip install hyperopt

Then, use hyperopt to define the search space and optimize the hyperparameters:

from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from xgboost import XGBClassifier
from hyperopt import hp, tpe, fmin, STATUS_OK, Trials

# Generate synthetic classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, n_features=10, random_state=42)

# Define the objective function to minimize
def objective(params):
    # Ensure max_depth is an integer
    params['max_depth'] = int(params['max_depth'])
    model = XGBClassifier(**params)
    score = cross_val_score(model, X, y, cv=5, scoring='accuracy', n_jobs=-1).mean()
    return {'loss': -score, 'status': STATUS_OK}

# Define the search space
space = {
    'max_depth': hp.quniform('max_depth', 3, 10, 1),
    'learning_rate': hp.loguniform('learning_rate', -5, -1),
    'subsample': hp.uniform('subsample', 0.5, 1),
    'colsample_bytree': hp.uniform('colsample_bytree', 0.5, 1),
    'n_estimators': hp.choice('n_estimators', [50, 100, 150, 200]),
}

# Perform Bayesian optimization
trials = Trials()
best = fmin(fn=objective, space=space, algo=tpe.suggest, max_evals=50, trials=trials)

# Print the best hyperparameters and score
print(f"Best hyperparameters: {best}")
best_score = -trials.best_trial['result']['loss']
print(f"Best accuracy: {best_score:.4f}")

In this example:

We generate a synthetic binary classification dataset using scikit-learn’s make_classification function.
We define an objective function that takes hyperparameters, creates an XGBClassifier, and returns the negated mean cross-validation accuracy. We negate the score because fmin minimizes the objective function.
We define the search space using hyperopt’s hp module, specifying the distributions for each hyperparameter.
We create a Trials object to store the results of each evaluation.
We call fmin to perform the optimization, specifying the objective function, search space, optimization algorithm (TPE), and the maximum number of evaluations.
After optimization, we print the best hyperparameters and the corresponding best accuracy.

By leveraging Bayesian optimization with hyperopt, we can efficiently search for high-performing XGBoost hyperparameters, potentially finding better configurations than traditional methods like grid search. This approach is particularly beneficial when dealing with large search spaces and costly objective functions, as it can find good hyperparameters with fewer evaluations.

See Also