XGBoosting Home | About | Contact | Examples

Train an XGBoost Model on a Pandas DataFrame

When your data is stored in a Pandas DataFrame, you can directly use it to train an XGBoost model.

Simply pass your DataFrame X containing the features and Series y containing the target to the fit() method of your XGBoost model.

from xgboost import XGBClassifier
import pandas as pd

# Assuming data is a DataFrame with input features and target
data = pd.DataFrame({
    'A': [1, 4, 7],
    'B': [2, 5, 8], 
    'C': [3, 6, 9],
    'target': [0, 1, 1]
})

# Separate input features and target
X = data[['A', 'B', 'C']]
y = data['target']

# Initialize and train the model
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X, y)

Here’s what’s happening:

  1. We assume that our data, including both input features and the target variable, is stored in a single Pandas DataFrame called data.

  2. We separate the input features X and target y from the combined DataFrame using column selection. X is a DataFrame containing only the feature columns, and y is a Series containing the target variable.

  3. We create an instance of the XGBClassifier (or XGBRegressor for regression tasks) and specify our desired hyperparameters.

  4. We directly pass X and y to the fit() method. XGBoost will use these Pandas data structures during training without any need for conversion.

Pandas DataFrames and Series are fully compatible with XGBoost, as the library is designed to work efficiently with these data formats. By using Pandas, you can leverage its powerful data manipulation and preprocessing capabilities before seamlessly training your XGBoost model.

Remember to ensure that X and y have compatible dimensions before training your model.



See Also