XGBoost allows you to assign different weights to each training sample, which can be useful when working with imbalanced datasets or when you want certain samples to have more influence on the model.
Here’s how you can train an XGBoost model with sample weights using the scikit-learn API.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
import numpy as np
# Generate imbalanced binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1], random_state=42)
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize XGBClassifier
model = XGBClassifier(objective='binary:logistic', random_state=42)
# Create sample_weight array
sample_weight = np.where(y_train == 1, 10, 1)
# Fit model with sample_weights
model.fit(X_train, y_train, sample_weight=sample_weight)
# Make predictions and evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
In this example:
We generate an imbalanced binary classification dataset using
make_classification
with a 90/10 class split.We initialize an
XGBClassifier
withobjective='binary:logistic'
for binary classification.We create a
sample_weight
array that assigns a weight of 10 to the minority class (1) and 1 to the majority class (0). This gives more importance to the minority class during training.We fit the model using the
sample_weight
parameter to pass our weights.Finally, we make predictions on the test set and evaluate the model’s accuracy.
By assigning higher weights to the minority class, we can help the model learn to predict it better, even when it’s underrepresented in the training data.