XGBoost allows you to save a trained model to disk and load it later to resume training.
This is useful for iterative model development, as you can train a model incrementally, saving progress along the way.
Here’s how you can resume training an XGBoost model using the xgb_model
parameter in the fit()
method.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from xgboost import XGBRegressor
# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize XGBRegressor and train for 50 rounds
model = XGBRegressor(n_estimators=50, random_state=42)
model.fit(X_train, y_train)
# Save model to disk
model.save_model("model.json")
# Load pre-trained model
model_loaded = XGBRegressor(random_state=42)
model_loaded.load_model("model.json")
# Resume training for 50 more rounds
model_loaded.fit(X_train, y_train, xgb_model=model_loaded.get_booster())
# Make predictions and evaluate performance
y_pred = model_loaded.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
In this example:
We generate a synthetic regression dataset using
make_regression()
.We initialize an
XGBRegressor
and train it for 50 rounds.We save the trained model to disk using
save_model()
.We create a new
XGBRegressor
instance and load the saved model usingload_model()
.We resume training the loaded model for an additional 50 rounds by passing the loaded model to the
xgb_model
parameter infit()
. We useget_booster()
to get the underlyingBooster
object.Finally, we make predictions with the fully trained model and evaluate its performance using mean squared error.
By using the xgb_model
parameter, we can seamlessly continue training a model from where we left off, allowing for flexible and efficient model development workflows.