When working with XGBoost, you may need to inspect the data stored in a DMatrix object for debugging or to integrate with other tools.
Here’s how you can access and print the data in a DMatrix.
import numpy as np
from xgboost import DMatrix
# Generate synthetic data
n_samples, n_features = 10, 5
X = np.random.rand(n_samples, n_features)
y = np.random.randint(2, size=n_samples)
# Create DMatrix
dmatrix = DMatrix(data=X, label=y)
# Access and print data
print("Feature Matrix:")
print(dmatrix.get_data().toarray()[:5, :]) # Print first 5 rows
print("\nLabels:")
print(dmatrix.get_label()[:5]) # Print first 5 labels
In this example:
We generate a small synthetic dataset using NumPy with 10 samples and 5 features. The features are random floats between 0 and 1, and the labels are randomly assigned 0 or 1.
We create a
DMatrixobjectdmatrixfrom the synthetic dataXand labelsy.To access the data in the
DMatrix, we use:dmatrix.get_data(): Returns the feature matrix as ascipy.sparsematrix.toarray(): Returns a NumPy dense arraydmatrix.get_label(): Returns the labels as a NumPy array.
We print the first 5 rows of the feature matrix and the first 5 labels to confirm the data is stored correctly in the
DMatrix.
The output will look something like:
Feature Matrix:
[[0.76826936 0.3751373 0.13454452 0.95997924 0.1613448 ]
[0.20457081 0.39761412 0.23949952 0.65726465 0.12632865]
[0.01898127 0.8946074 0.9941333 0.25311878 0.9032138 ]
[0.5629613 0.6073675 0.17431487 0.07749726 0.5096905 ]
[0.5430099 0.1685548 0.89152557 0.08665203 0.33809692]]
Labels:
[1. 1. 0. 1. 0.]
This example demonstrates how to quickly inspect the data in a DMatrix. Keep in mind that for large datasets, you may want to print only a subset of the data to avoid flooding your output.