XGBoost Save Best Model From GridSearchCV

When working with XGBoost, it’s often necessary to tune the model’s hyperparameters to achieve optimal performance.

Scikit-learn’s GridSearchCV allows you to define a grid of hyperparameters, perform an exhaustive search to find the best combination, and access the best model.

This example demonstrates how to save and load the best model from a GridSearchCV run.

from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV, train_test_split
from xgboost import XGBClassifier

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define parameter grid
param_grid = {
    'max_depth': [3, 5, 7],
    'learning_rate': [0.1, 0.01, 0.05],
    'n_estimators': [50, 100, 200]
}

# Create XGBClassifier
model = XGBClassifier(random_state=42)

# Perform grid search
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=-1)
grid_search.fit(X_train, y_train)

# Print best score and parameters
print(f"Best score: {grid_search.best_score_:.3f}")
print(f"Best parameters: {grid_search.best_params_}")

# Access best model
best_model = grid_search.best_estimator_

# Save best model
best_model.save_model('best_model.ubj')

# Load saved model
loaded_model = XGBClassifier()
loaded_model.load_model('best_model.ubj')

# Use loaded model for predictions
predictions = loaded_model.predict(X_test)

# Print accuracy score
accuracy = loaded_model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.3f}")

In this example, we first generate a synthetic binary classification dataset using scikit-learn’s make_classification function and split it into train and test sets.

We then define a parameter grid containing different values for max_depth, learning_rate, and n_estimators. These are passed to GridSearchCV along with the XGBClassifier instance, specifying 3-fold cross-validation.

After fitting the GridSearchCV object with the training data, we print the best score and corresponding hyperparameters. We access the best model using the best_estimator_ attribute and save it to a file named ‘best_model.ubj’ using the save_model method.

To demonstrate loading the saved model, we create a new XGBClassifier instance and load the saved model using the load_model method. We then use this loaded model to make predictions on the test set and print the accuracy score.

By following this approach, you can easily save and reuse the best model obtained from a GridSearchCV run, ensuring optimal performance in your XGBoost projects.

See Also