XGBoosting Home | About | Contact | Examples

XGBoost Hyperparameter Optimization with Optuna

Optuna is a powerful hyperparameter optimization library that can significantly improve the performance of XGBoost models.

It provides a flexible and efficient way to search for optimal hyperparameters, supporting various sampling algorithms and pruning techniques.

Optuna seamlessly integrates with XGBoost and offers a simple, intuitive API for defining the search space and objective function.

Here’s an example of how to use Optuna to optimize XGBoost hyperparameters:

import optuna
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

def objective(trial):
    # Define hyperparameters to tune
    params = {
        'max_depth': trial.suggest_int('max_depth', 2, 10),
        'learning_rate': trial.suggest_float('learning_rate', 1e-3, 1.0, log=True),
        'subsample': trial.suggest_float('subsample', 0.5, 1.0),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0),
    }

    # Train XGBoost model with the selected hyperparameters
    model = XGBClassifier(**params)
    model.fit(X_train, y_train)

    # Evaluate model on validation set
    val_score = model.score(X_test, y_test)
    return val_score

# Create Optuna study
study = optuna.create_study(direction='maximize')

# Optimize hyperparameters
study.optimize(objective, n_trials=100)

# Train final model with best hyperparameters
best_params = study.best_params
best_model = XGBClassifier(**best_params)
best_model.fit(X_train, y_train)

# Evaluate best model on test set
test_score = best_model.score(X_test, y_test)

print(f"Best test score: {test_score:.4f}")
print(f"Best parameters: {best_params}")

In this example:

  1. We generate a synthetic binary classification dataset using scikit-learn’s make_classification function and split it into train and test sets.

  2. We define an objective function that takes an Optuna trial object as input. The function defines the hyperparameters to tune and their search spaces using the trial.suggest_* methods. It then trains an XGBoost model with the selected hyperparameters and returns the validation score.

  3. We create an Optuna study object and optimize the objective function for 100 trials.

  4. After the optimization is complete, we retrieve the best hyperparameters using study.best_params and train a final XGBoost model with these hyperparameters.

  5. We evaluate the performance of the best model on the test set and print the test score and best hyperparameters.

Optuna automatically keeps track of the best trials and provides the flexibility to define custom search spaces and pruning criteria. By intelligently searching the hyperparameter space, Optuna can often find better hyperparameters than manual tuning or grid search, leading to improved XGBoost model performance.



See Also