XGBoosting Home | About | Contact | Examples

XGBoost save_model() vs dump_model()

XGBoost provides two functions for saving models: dump_model() and save_model().

While they may seem similar, they serve different purposes. dump_model() is used to save the model in a format suitable for visualization or interpretation, while save_model() is used to persist the model for later use in prediction or inference.

Crucially, models saved with dump_model() cannot be loaded back into XGBoost for further training or prediction, whereas those saved with save_model() can.

Here’s an example demonstrating the difference:

from sklearn.datasets import make_classification
import xgboost

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# Train an XGBoost classifier
model = xgboost.XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X, y)

# Save the model for visualization using dump_model()
model.get_booster().dump_model("model_dump.txt")

# Attempt to load the model saved with dump_model()
try:
    model_from_dump = xgboost.XGBClassifier()
    model_from_dump.load_model("model_dump.txt")
except xgboost.core.XGBoostError as e:
    print(f"Error loading model: {str(e)}. Models saved with dump_model() cannot be loaded.")

# Save the model for later use using save_model()
model.save_model("model_saved.json")

# Load the model saved with save_model()
model_from_save = xgboost.XGBClassifier()
model_from_save.load_model("model_saved.json")

# Make a prediction with the loaded model
new_data = [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
prediction = model_from_save.predict(new_data)
print(f"Prediction from loaded model: {prediction}")

In this example, we first generate a synthetic binary classification dataset using scikit-learn’s make_classification function. We then train an XGBoost classifier on this data.

Next, we use dump_model() to save the model to a file named “model_dump.txt”. This file will contain a text representation of the model suitable for visualization or manual inspection, but not for loading back into XGBoost.

We demonstrate this by attempting to load the model saved with dump_model() using load_model(). This raises an XGBoostError, as the format of the dumped model is not suitable for loading.

Finally, we use save_model() to save the model to a file named “model_saved.json”. This file uses a format that allows the model to be loaded back into XGBoost for further use. We load this model using load_model() and make a prediction on new data to verify that the model has been successfully loaded and can be used for inference.

While dump_model() is useful for understanding or visualizing your trained model, save_model() should be used when you need to save your model for later prediction or further training.



See Also