Configure XGBoost "enable_categorical" Parameter

The enable_categorical parameter in XGBoost allows for native handling of categorical features.

It requires first specifying the feature types as 'category' in your Pandas DataFrame and setting the enable_categorical parameter to True when initializing the XGBoost model, you can streamline your data preparation process and improve the efficiency of your workflow.

# XGBoosting.com
# Configure XGBoost "enable_categorical" Parameter
import pandas as pd
import numpy as np
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=42)
X = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(20)])

# Convert a subset of columns to categorical
categorical_features = ['feature_5', 'feature_7', 'feature_13']
for feature in categorical_features:
    X[feature] = pd.cut(X[feature], bins=4, labels=False).astype('category')

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with enable_categorical=True
model = XGBClassifier(enable_categorical=True, eval_metric='mlogloss')

# Fit the model
model.fit(X_train, y_train)

# Make predictions on the test set
predictions = model.predict(X_test)

# Display the first few predictions
print(predictions[:10])

In this example, we generate a synthetic dataset using make_classification from scikit-learn. We then convert the dataset into a pandas DataFrame for easy manipulation. To demonstrate the usage of enable_categorical, we convert a subset of columns ('feature_5', 'feature_7', and 'feature_13') into categorical variables using pd.cut() and setting the data type to 'category'.

Next, we split the dataset into training and testing sets using train_test_split. We initialize an XGBoost classifier (XGBClassifier) with enable_categorical=True.

We fit the model on the training data using model.fit() and make predictions on the test set using model.predict(). Finally, we display the first few predictions to verify that the model has been trained and can generate predictions.

By leveraging the enable_categorical parameter, XGBoost automatically handles the categorical features in the dataset, applying an efficient encoding scheme optimized for tree-based algorithms. This simplifies the data preprocessing step and allows XGBoost to effectively learn from datasets containing a mix of numeric and categorical variables.

See Also