When working with XGBoost, hyperparameter tuning is crucial for obtaining optimal model performance. While GridSearchCV
allows for an exhaustive search of a predefined parameter grid, RandomizedSearchCV
offers a more efficient alternative by searching a random sample of the parameter space.
This example demonstrates how to save and load the best model obtained from a RandomizedSearchCV
run, ensuring you can reuse the optimal model for future predictions.
from sklearn.datasets import make_classification
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from xgboost import XGBClassifier
from scipy.stats import randint
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_clusters_per_class=1, n_classes=3, random_state=42)
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define parameter distribution
param_dist = {
'max_depth': randint(3, 8),
'learning_rate': [0.1, 0.01, 0.05],
'n_estimators': randint(50, 300),
'subsample': [0.6, 0.8, 1.0]
}
# Create XGBClassifier
model = XGBClassifier(random_state=42)
# Perform randomized search
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist,
n_iter=10, cv=3, n_jobs=-1, random_state=42)
random_search.fit(X_train, y_train)
# Print best score and parameters
print(f"Best score: {random_search.best_score_:.3f}")
print(f"Best parameters: {random_search.best_params_}")
# Access best model
best_model = random_search.best_estimator_
# Save best model
best_model.save_model('best_model.ubj')
# Load saved model
loaded_model = XGBClassifier()
loaded_model.load_model('best_model.ubj')
# Use loaded model for predictions
predictions = loaded_model.predict(X_test)
# Print accuracy score
accuracy = loaded_model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.3f}")
In this example, we generate a synthetic multiclass classification dataset using scikit-learn’s make_classification
function and split it into train and test sets.
We define a parameter distribution param_dist
containing a mix of discrete and continuous hyperparameters. These are passed to RandomizedSearchCV
along with the XGBClassifier
instance, specifying 10 iterations and 3-fold cross-validation.
After fitting the RandomizedSearchCV
object with the training data, we print the best score and corresponding hyperparameters. We access the best model using the best_estimator_
attribute and save it to a file named ‘best_model.ubj’ using the save_model
method.
To demonstrate loading the saved model, we create a new XGBClassifier
instance and load the saved model using the load_model
method. We then use this loaded model to make predictions on the test set and print the accuracy score.
By following this approach, you can efficiently tune your XGBoost model using RandomizedSearchCV
, save the best model, and load it later for making predictions, ensuring optimal performance in your machine learning projects.