While the scikit-learn API provides a convenient way to train XGBoost models, using the xgboost.train()
function directly offers more flexibility and control over the training process.
Here’s how you can train an XGBoost model using xgboost.train()
:
# XGBoosting.com
# Train an XGBoost Model using xgboost.train()
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import xgboost as xgb
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Convert data to DMatrix format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Define parameters
params = {
'objective': 'binary:logistic',
'eval_metric': 'logloss',
'eta': 0.1,
'max_depth': 6,
'subsample': 0.8,
'colsample_bytree': 0.8,
'seed': 42
}
# Train the model
num_rounds = 100
model = xgb.train(params, dtrain, num_rounds, evals=[(dtest, 'test')], early_stopping_rounds=10)
# Make predictions and evaluate
y_pred = model.predict(dtest)
y_pred = (y_pred > 0.5).astype(int)
accuracy = accuracy_score(y_test, y_pred)
print(f'Test accuracy: {accuracy:.4f}')
Key steps:
- Convert your data to the DMatrix format using
xgb.DMatrix()
. This is required forxgboost.train()
. - Define your training parameters in a dictionary. This allows fine-grained control over the model’s behavior.
- Train the model with
xgb.train()
, specifying the parameters, training data, number of rounds, and evaluation sets. - Make predictions with the trained model using
model.predict()
and evaluate its performance.
Using xgboost.train()
provides more control over the training process and allows you to customize the model’s behavior extensively.