XGBoost Configure fit() "feature_weights" Parameter

Train

XGBoost allows you to assign different selection probabilities to features when using the colsample_bytree or colsample_bylevel parameters.

This can be useful when you know certain features are more informative than others and want the model to focus on them.

Here’s how you can train an XGBoost model with feature weights using the scikit-learn API.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, noise=0.1, random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize XGBRegressor
model = XGBRegressor(n_estimators=100, learning_rate=0.1, colsample_bytree=0.8, colsample_bylevel=0.8, random_state=42)

# Create feature_weights array
feature_weights = np.zeros(X_train.shape[1])
feature_weights[:5] = 1  # Higher weights for informative features

# Fit model with feature_weights
model.fit(X_train, y_train, feature_weights=feature_weights)

# Make predictions and evaluate performance
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

In this example:

We generate a synthetic regression dataset using make_regression with 10 features, 5 of which are informative.
We initialize an XGBRegressor with colsample_bytree=0.8 and colsample_bylevel=0.8, which means that 80% of features will be randomly selected at each tree and level.
We create a feature_weights array that assigns a weight of 1 to the first 5 features (which are informative) and 0 to the rest. This gives the informative features a higher probability of being selected during colsampling.
We fit the model using the feature_weights parameter to pass our weights.
Finally, we make predictions on the test set and evaluate the model’s performance using mean squared error.

By assigning higher weights to informative features, we can guide XGBoost to focus on them during training, potentially improving the model’s performance.

See Also