Configure XGBoost "num_class" Parameter

When tackling multi-class classification problems with XGBoost, we must properly configure the num_class parameter, especially when using the multi:softmax or multi:softprob objective function.

This parameter specifies the number of classes in your target variable, enabling XGBoost to structure its output accordingly.

All class label integer values must be in [0, num_class), e.g. 0, 1, or 2 for a 3 class problem.

Here’s an example of setting num_class when using XGBClassifier from the scikit-learn API:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate a synthetic multi-class dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=3, n_redundant=1, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with "multi:softmax" objective and num_class set to 3
model = XGBClassifier(objective='multi:softmax', num_class=3, eval_metric='mlogloss')

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

Note, the num_class does not appear to be required or used when specified in the XGBClassifier class constructor.

For example, setting a value of 0 or a value greater or lesser than the number of classes does not raise an error or change the output of the model.

The num_class is required when using the native XGBoost API with either the multi:softmax or multi:softprob objective function.

Here’s the same example of setting num_class when using the native XGBoost API:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import xgboost as xgb

# Generate a synthetic multi-class dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=3, n_redundant=1, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create DMatrix objects for the data
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set the parameters including "multi:softmax" objective and num_class
params = {
    'objective': 'multi:softmax',
    'num_class': 3,
    'eval_metric': 'mlogloss'
}

# Train the model
model = xgb.train(params, dtrain)

# Make predictions
predictions = model.predict(dtest)

The num_class parameter tells XGBoost how many output nodes are needed in the final layer of the model.

It’s essential to set this value correctly to match the number of unique classes in your target variable.

Keep in mind that num_class is not necessary for binary classification tasks and is only required for multi-class problems.

See Also