PMML (Predictive Model Markup Language) is an XML-based format for representing machine learning models.
Saving your XGBoost models in PMML format enables interoperability with other tools and platforms that support PMML.
First, we must install the sklearn2pmml library using our preferred package manager, such as pip:
pip install sklearn2pmml
Next, we can save our XGBoost model in PMML format.
import numpy as np
import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.pipeline import Pipeline
from sklearn2pmml import sklearn2pmml, PMMLPipeline
from sklearn2pmml.decoration import ContinuousDomain
# Generate a synthetic dataset
X, y = make_classification(n_samples=100, n_features=20, random_state=42)
# Create a PMMLPipeline with a single XGBoost model
pipeline = PMMLPipeline([
("domain", ContinuousDomain()),
("classifier", xgb.XGBClassifier(eval_metric='logloss'))
])
# Fit the pipeline
pipeline.fit(X, y)
# Export the pipeline to a PMML file
sklearn2pmml(pipeline, "xgboost_model.pmml", with_repr=True)
Here’s what we’re doing:
- We generate a synthetic classification dataset using
make_classification
with 100 samples and 20 features. - We create a
PMMLPipeline
that includes aContinuousDomain
transformer and anXGBClassifier
. Theeval_metric='logloss'
parameter specifies the evaluation metric for the model. - We fit the pipeline on the generated dataset using
pipeline.fit(X, y)
. - We use
sklearn2pmml
to export the fitted pipeline to a PMML file namedxgboost_model.pmml
. Thewith_repr=True
argument includes a human-readable representation of the model in the PMML file.
By using a PMMLPipeline
, we can include additional preprocessing steps, such as feature scaling or selection, alongside the XGBoost model. The entire pipeline is then exported to PMML format, ensuring that the preprocessing steps are also captured in the PMML representation.
Once saved in PMML format, the model can be used in other environments or tools that support PMML. However, the specifics of loading and using PMML models may vary depending on the tool or platform.