XGBoost Configure fit() "sample_weight" Parameter

XGBoost allows you to assign different weights to each training sample, which can be useful when working with imbalanced datasets or when you want certain samples to have more influence on the model.

Here’s how you can train an XGBoost model with sample weights using the scikit-learn API.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
import numpy as np

# Generate imbalanced binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1], random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize XGBClassifier
model = XGBClassifier(objective='binary:logistic', random_state=42)

# Create sample_weight array
sample_weight = np.where(y_train == 1, 10, 1)

# Fit model with sample_weights
model.fit(X_train, y_train, sample_weight=sample_weight)

# Make predictions and evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

In this example:

We generate an imbalanced binary classification dataset using make_classification with a 90/10 class split.
We initialize an XGBClassifier with objective='binary:logistic' for binary classification.
We create a sample_weight array that assigns a weight of 10 to the minority class (1) and 1 to the majority class (0). This gives more importance to the minority class during training.
We fit the model using the sample_weight parameter to pass our weights.
Finally, we make predictions on the test set and evaluate the model’s accuracy.

By assigning higher weights to the minority class, we can help the model learn to predict it better, even when it’s underrepresented in the training data.

See Also