When your data is stored in Python lists rather than numpy arrays or pandas DataFrames, you’ll need to convert it before training an XGBoost model.
XGBoost’s DMatrix
class provides an efficient way to convert list data into the format required by the train()
function.
from xgboost import DMatrix, train
# Assuming X and y are lists
X = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
y = [0, 1, 1]
# Create DMatrix from X and y
data_dmatrix = DMatrix(data=X, label=y)
# Set XGBoost parameters
params = {
'objective': 'binary:logistic',
'learning_rate': 0.1,
'random_state': 42
}
# Train the model
model = train(params, data_dmatrix)
Here’s what’s happening:
We assume that our feature matrix
X
and target vectory
are stored as Python lists. Remember to ensure thatX
andy
have compatible dimensions before proceeding.We use XGBoost’s
DMatrix
class to convert our list data into a format optimized for XGBoost.DMatrix
is an XGBoost-specific data structure that is designed for both memory efficiency and training speed.We set the XGBoost parameters using a dictionary
params
. Here, we specify the objective function (binary logistic for binary classification), learning rate, and random seed. These parameters can be tuned for your specific use case.We train the model by passing the
params
dictionary anddata_dmatrix
to thetrain
function. This function is part of XGBoost’s native API and handles the actual model training process.
By leveraging DMatrix
, you can efficiently train XGBoost models even when your data is initially stored in Python lists. This approach can be especially handy when you’re dealing with data that isn’t already in a numpy or pandas format.