When working with XGBoost, you may need to inspect the data stored in a DMatrix
object for debugging or to integrate with other tools.
Here’s how you can access and print the data in a DMatrix
.
import numpy as np
from xgboost import DMatrix
# Generate synthetic data
n_samples, n_features = 10, 5
X = np.random.rand(n_samples, n_features)
y = np.random.randint(2, size=n_samples)
# Create DMatrix
dmatrix = DMatrix(data=X, label=y)
# Access and print data
print("Feature Matrix:")
print(dmatrix.get_data().toarray()[:5, :]) # Print first 5 rows
print("\nLabels:")
print(dmatrix.get_label()[:5]) # Print first 5 labels
In this example:
We generate a small synthetic dataset using NumPy with 10 samples and 5 features. The features are random floats between 0 and 1, and the labels are randomly assigned 0 or 1.
We create a
DMatrix
objectdmatrix
from the synthetic dataX
and labelsy
.To access the data in the
DMatrix
, we use:dmatrix.get_data()
: Returns the feature matrix as ascipy.sparse
matrix.toarray()
: Returns a NumPy dense arraydmatrix.get_label()
: Returns the labels as a NumPy array.
We print the first 5 rows of the feature matrix and the first 5 labels to confirm the data is stored correctly in the
DMatrix
.
The output will look something like:
Feature Matrix:
[[0.76826936 0.3751373 0.13454452 0.95997924 0.1613448 ]
[0.20457081 0.39761412 0.23949952 0.65726465 0.12632865]
[0.01898127 0.8946074 0.9941333 0.25311878 0.9032138 ]
[0.5629613 0.6073675 0.17431487 0.07749726 0.5096905 ]
[0.5430099 0.1685548 0.89152557 0.08665203 0.33809692]]
Labels:
[1. 1. 0. 1. 0.]
This example demonstrates how to quickly inspect the data in a DMatrix
. Keep in mind that for large datasets, you may want to print only a subset of the data to avoid flooding your output.