L1 regularization, also known as Lasso (Least Absolute Shrinkage and Selection Operator), is a technique used to prevent overfitting in XGBoost models.
It adds a penalty term to the objective function, encouraging sparse feature selection.
Configuring L1 regularization in XGBoost involves setting the alpha
hyperparameter to a non-zero value.
In the scikit-learn APi, this parameter is reg_alpha
.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
# Create a synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define the XGBoost regressor with L1 regularization (alpha)
xgb_model = XGBRegressor(objective='reg:squarederror', reg_alpha=1.0, n_estimators=100)
# Train the model
xgb_model.fit(X_train, y_train)
# Predict on the test set
y_pred = xgb_model.predict(X_test)
L1 regularization, or Lasso, is a regularization technique that adds a penalty term to the objective function proportional to the absolute values of the model’s coefficients. The purpose of L1 regularization is to prevent overfitting by encouraging sparsity in the model’s feature space. It effectively performs feature selection by driving the coefficients of less important features to exactly zero.
In XGBoost, the strength of L1 regularization is controlled by the alpha
hyperparameter. A higher value of alpha
implies a stronger regularization effect, leading to more coefficients being reduced to zero. When configuring L1 regularization, it’s recommended to start with a small value of alpha
(e.g., 0.1) and tune it based on the model’s performance on a validation set.
It’s important to note that XGBoost also supports L2 regularization (Ridge), controlled by the lambda
hyperparameter. L2 regularization adds a penalty term proportional to the square of the coefficients’ magnitudes, encouraging smaller but non-zero coefficients. In practice, it’s common to use a combination of L1 and L2 regularization to balance between feature selection and coefficient shrinkage.
After applying L1 regularization, you can interpret the model’s feature importances to identify the most relevant features. Features with non-zero coefficients are considered important, while features with coefficients driven to zero are effectively ignored by the model.
L1 regularization is particularly useful when dealing with high-dimensional datasets or when explicit feature selection is desired. It helps improve model interpretability and reduces the risk of overfitting by focusing on a subset of the most informative features.