XGBoost Save Best Model From RandomizedSearchCV

When working with XGBoost, hyperparameter tuning is crucial for obtaining optimal model performance. While GridSearchCV allows for an exhaustive search of a predefined parameter grid, RandomizedSearchCV offers a more efficient alternative by searching a random sample of the parameter space.

This example demonstrates how to save and load the best model obtained from a RandomizedSearchCV run, ensuring you can reuse the optimal model for future predictions.

from sklearn.datasets import make_classification
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from xgboost import XGBClassifier
from scipy.stats import randint

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_clusters_per_class=1, n_classes=3, random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define parameter distribution
param_dist = {
    'max_depth': randint(3, 8),
    'learning_rate': [0.1, 0.01, 0.05],
    'n_estimators': randint(50, 300),
    'subsample': [0.6, 0.8, 1.0]
}

# Create XGBClassifier
model = XGBClassifier(random_state=42)

# Perform randomized search
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist,
                                   n_iter=10, cv=3, n_jobs=-1, random_state=42)
random_search.fit(X_train, y_train)

# Print best score and parameters
print(f"Best score: {random_search.best_score_:.3f}")
print(f"Best parameters: {random_search.best_params_}")

# Access best model
best_model = random_search.best_estimator_

# Save best model
best_model.save_model('best_model.ubj')

# Load saved model
loaded_model = XGBClassifier()
loaded_model.load_model('best_model.ubj')

# Use loaded model for predictions
predictions = loaded_model.predict(X_test)

# Print accuracy score
accuracy = loaded_model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.3f}")

In this example, we generate a synthetic multiclass classification dataset using scikit-learn’s make_classification function and split it into train and test sets.

We define a parameter distribution param_dist containing a mix of discrete and continuous hyperparameters. These are passed to RandomizedSearchCV along with the XGBClassifier instance, specifying 10 iterations and 3-fold cross-validation.

After fitting the RandomizedSearchCV object with the training data, we print the best score and corresponding hyperparameters. We access the best model using the best_estimator_ attribute and save it to a file named ‘best_model.ubj’ using the save_model method.

To demonstrate loading the saved model, we create a new XGBClassifier instance and load the saved model using the load_model method. We then use this loaded model to make predictions on the test set and print the accuracy score.

By following this approach, you can efficiently tune your XGBoost model using RandomizedSearchCV, save the best model, and load it later for making predictions, ensuring optimal performance in your machine learning projects.

See Also