XGBoost Report Model Debug Information

Train

XGBoost provides various ways to report debug information during model training, after fitting the model, and during inference.

Here are some techniques you can use:

1. Verbose Logging During Training

XGBoost allows you to control the verbosity of the output during training which can help in debugging.

You can set the verbosity parameter in the XGBoost configuration.

Levels are:

0 (silent): No output.
1 (warning): Shows warnings.
2 (info): Shows information and warnings.
3 (debug): Shows debug output.

import xgboost as xgb

params = {'verbosity': 3}  # Adjust verbosity level for more detailed output
dtrain = xgb.DMatrix(X_train, label=y_train)
model = xgb.train(params, dtrain, num_boost_round=100)

2. Monitoring Callbacks

XGBoost provides a callback function for monitoring the progress of algorithms during training.

This can be very useful for debugging and understanding the model’s learning.

def custom_callback(env):
   """ Custom callback to print debug information. """
   print(f"Round: {env.iteration}, Training Error: {env.evaluation_result_list}")

model = xgb.train(params, dtrain, num_boost_round=100, callbacks=[custom_callback])

3. Logging Configuration

You can configure the logging to show more detailed debug information by setting the level on the logger used by XGBoost.

import logging
logger = logging.getLogger('xgboost')
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)

4. Feature Importance

After training, viewing feature importance can help in debugging and understanding which features the model is considering most significant.

xgb.plot_importance(model)
plt.show()

5. Model Dump

Dumping the model can provide a textual representation of the trees or the linear model.

This is useful for debugging purposes to see how features are being split in trees.

model.dump_model('dump.raw.txt')

6. Predictions with `iteration_range`

When making predictions, you can use the iteration_range parameter to use a certain number of trees, which can be useful for debugging predictions at different stages of training.

preds = model.predict(dtest, iteration_range=50)

Using these methods, you can extract a lot of debugging and diagnostic information from your XGBoost models during training and inference, which can be crucial for optimizing performance and understanding the model’s behavior.