When your data is stored in Python lists rather than numpy arrays or pandas DataFrames, you’ll need to convert it before training an XGBoost model.
XGBoost’s DMatrix class provides an efficient way to convert list data into the format required by the train() function.
from xgboost import DMatrix, train
# Assuming X and y are lists
X = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
y = [0, 1, 1]
# Create DMatrix from X and y
data_dmatrix = DMatrix(data=X, label=y)
# Set XGBoost parameters
params = {
'objective': 'binary:logistic',
'learning_rate': 0.1,
'random_state': 42
}
# Train the model
model = train(params, data_dmatrix)
Here’s what’s happening:
We assume that our feature matrix
Xand target vectoryare stored as Python lists. Remember to ensure thatXandyhave compatible dimensions before proceeding.We use XGBoost’s
DMatrixclass to convert our list data into a format optimized for XGBoost.DMatrixis an XGBoost-specific data structure that is designed for both memory efficiency and training speed.We set the XGBoost parameters using a dictionary
params. Here, we specify the objective function (binary logistic for binary classification), learning rate, and random seed. These parameters can be tuned for your specific use case.We train the model by passing the
paramsdictionary anddata_dmatrixto thetrainfunction. This function is part of XGBoost’s native API and handles the actual model training process.
By leveraging DMatrix, you can efficiently train XGBoost models even when your data is initially stored in Python lists. This approach can be especially handy when you’re dealing with data that isn’t already in a numpy or pandas format.