Saving your trained XGBoost models in a compressed format can significantly reduce storage space and improve loading times.
Python’s scikit-learn library provides a convenient way to save models in a compressed ZIP format using the joblib
module.
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
import joblib
# Generate a random classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
# Train an XGBoost model
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X, y)
# Save the model in a compressed ZIP format
joblib.dump(model, 'xgb_model.pkl.z', compress=('zlib', 3))
# Load the compressed model
loaded_model = joblib.load('xgb_model.pkl.z')
# Use the loaded model to make predictions
predictions = loaded_model.predict(X)
# Print the accuracy of the loaded model's predictions
print(f"Accuracy: {accuracy_score(y, predictions):.2f}")
Here’s a step-by-step breakdown:
- We train an XGBoost classifier on a randomly generated dataset.
- We save the trained model in a compressed ZIP format using
joblib.dump()
. Thecompress
parameter is set to('zlib', 3)
, which specifies the compression algorithm (zlib) and the compression level (3). - We load the compressed model using
joblib.load()
. - We use the loaded model to make predictions on the original dataset.
- Finally, we print the accuracy of the loaded model’s predictions to verify that it matches the performance of the original model.
By saving your XGBoost models in a compressed format, you can efficiently store and distribute them while maintaining their performance and functionality.