XGBoost provides various ways to report debug information during model training, after fitting the model, and during inference.
Here are some techniques you can use:
1. Verbose Logging During Training
XGBoost allows you to control the verbosity of the output during training which can help in debugging.
You can set the verbosity
parameter in the XGBoost configuration.
Levels are:
0
(silent): No output.1
(warning): Shows warnings.2
(info): Shows information and warnings.3
(debug): Shows debug output.
import xgboost as xgb
params = {'verbosity': 3} # Adjust verbosity level for more detailed output
dtrain = xgb.DMatrix(X_train, label=y_train)
model = xgb.train(params, dtrain, num_boost_round=100)
2. Monitoring Callbacks
XGBoost provides a callback function for monitoring the progress of algorithms during training.
This can be very useful for debugging and understanding the model’s learning.
def custom_callback(env):
""" Custom callback to print debug information. """
print(f"Round: {env.iteration}, Training Error: {env.evaluation_result_list}")
model = xgb.train(params, dtrain, num_boost_round=100, callbacks=[custom_callback])
3. Logging Configuration
You can configure the logging to show more detailed debug information by setting the level on the logger used by XGBoost.
import logging
logger = logging.getLogger('xgboost')
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
4. Feature Importance
After training, viewing feature importance can help in debugging and understanding which features the model is considering most significant.
xgb.plot_importance(model)
plt.show()
5. Model Dump
Dumping the model can provide a textual representation of the trees or the linear model.
This is useful for debugging purposes to see how features are being split in trees.
model.dump_model('dump.raw.txt')
6. Predictions with iteration_range
When making predictions, you can use the iteration_range
parameter to use a certain number of trees, which can be useful for debugging predictions at different stages of training.
preds = model.predict(dtest, iteration_range=50)
Using these methods, you can extract a lot of debugging and diagnostic information from your XGBoost models during training and inference, which can be crucial for optimizing performance and understanding the model’s behavior.