Integer Input Features for XGBoost

Data

XGBoost can handle integer input features directly without the need for normalization or standardization.

This makes it particularly easy to work with datasets that contain integer features.

from xgboost import XGBRegressor
import numpy as np

# Synthetic feature matrix X with integer features
X = np.array([[25, 10, 3],
              [50, 20, 1],
              [30, 15, 3],
              [10,  5, 2],
              [45, 18, 1],
              [28, 12, 3]])

# Example target variable
y = np.array([100, 200, 150, 50, 180, 120])

# Initialize and train XGBoost model
model = XGBRegressor(random_state=42)
model.fit(X, y)

# New data for prediction
X_new = np.array([[32, 13, 3],
                  [15,  8, 2]])

# Make predictions
predictions = model.predict(X_new)

print("Predictions:", predictions)

Here’s a step-by-step breakdown:

We create a synthetic feature matrix X that contains only integer features. In this example, we have three features, all represented as integers.
We define a target variable y, also as integers. This could represent something like sales figures or customer ages, depending on the context.
We initialize an XGBRegressor with a random_state for reproducibility. You can add other hyperparameters here as needed.
We train the XGBoost model using the feature matrix X and target variable y. XGBoost will handle the integer features directly, without any further preprocessing.
For making predictions, we create a new feature matrix X_new, also containing integer features in the same format as the training data.
We use the trained model to make predictions on X_new using model.predict().

The key takeaway is that XGBoost can handle integer features natively, which simplifies the data preparation process. However, it’s important to note that if your dataset contains categorical features, you’ll still need to encode them as integers before training your XGBoost model, as shown in the previous example.

See Also