Hyperopt is an efficient Python library for hyperparameter optimization that uses a Bayesian optimization approach.
It can be used to tune the hyperparameters of various machine learning algorithms, including XGBoost.
The hyperopt-sklearn library extends hyperopt to work seamlessly with scikit-learn estimators, making it easy to integrate into existing machine learning workflows.
This example demonstrates how to use hyperopt to optimize the hyperparameters of an XGBoost classifier. We’ll cover defining a search space, creating an objective function, running the optimization process, and retrieving the best parameters found.
First, make sure you have the required hyperopt
and hyperopt-sklearn
libraries installed:
pip install hyperopt
And:
pip install git+https://github.com/hyperopt/hyperopt-sklearn
Now, let’s look at an example of optimizing XGBoost parameters using hyperopt via the scikit-learn interface:
from hyperopt import tpe
from hpsklearn import HyperoptEstimator, xgboost_classification
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# protect the entry point
if __name__ == '__main__':
# Load a synthetic classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create the hyperopt estimator
estimator = HyperoptEstimator(classifier=xgboost_classification('my_clf'),
preprocessing=[],
algo=tpe.suggest,
max_evals=50,
trial_timeout=300,
seed=42)
# Perform the search
estimator.fit(X_train, y_train)
# Show the results
print(estimator.score(X_test, y_test))
# Report configuration of the best model
print(estimator.best_model())
In this example:
We load a synthetic binary classification dataset using scikit-learn’s
make_classification
function and split it into train and test sets.We create a
HyperoptEstimator
, specifying the XGBoost classifier, optimization algorithm (tpe.suggest
), and other parameters. This will search the default ranges for the hyperparameters of the XGBoost algorithm that are good to optimize.We fit the estimator on the training data, which runs the hyperparameter optimization process.
We retrieve the best parameters found by the process.
Hyperopt provides an efficient way to optimize XGBoost hyperparameters, leveraging Bayesian optimization to intelligently search the parameter space. The hyperopt-sklearn library makes it straightforward to integrate hyperopt with scikit-learn estimators, allowing you to easily tune your XGBoost models for improved performance.
At the time of writing, the hyperparameters and their default ranges searched are as follows:
max_depth
, uniform 1 to 11learning_rate
log uniform 0.0001 to 0.5n_estimators
100 to 6000 in units of 200gamma
log uniform from 0.0001 to 5.min_child_weight
log uniform from 1 to 100subsample
uniform from 0.5 to 1colsample_bytree
uniform from 0.5 to 1colsample_bylevel
uniform from 0.5 to 1reg_alpha
log uniform from 0.0001 to 1reg_lambda
log uniform from 1 to 4random_state
uniform from 0 to 5