Save Compressed XGBoost Model

Save

Saving your trained XGBoost models in a compressed format can significantly reduce storage space and improve loading times.

Python’s scikit-learn library provides a convenient way to save models in a compressed ZIP format using the joblib module.

from sklearn.datasets import make_classification
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
import joblib

# Generate a random classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# Train an XGBoost model
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X, y)

# Save the model in a compressed ZIP format
joblib.dump(model, 'xgb_model.pkl.z', compress=('zlib', 3))

# Load the compressed model
loaded_model = joblib.load('xgb_model.pkl.z')

# Use the loaded model to make predictions
predictions = loaded_model.predict(X)

# Print the accuracy of the loaded model's predictions
print(f"Accuracy: {accuracy_score(y, predictions):.2f}")

Here’s a step-by-step breakdown:

We train an XGBoost classifier on a randomly generated dataset.
We save the trained model in a compressed ZIP format using joblib.dump(). The compress parameter is set to ('zlib', 3), which specifies the compression algorithm (zlib) and the compression level (3).
We load the compressed model using joblib.load().
We use the loaded model to make predictions on the original dataset.
Finally, we print the accuracy of the loaded model’s predictions to verify that it matches the performance of the original model.

By saving your XGBoost models in a compressed format, you can efficiently store and distribute them while maintaining their performance and functionality.

See Also