While the scikit-learn API provides a convenient way to train XGBoost models, using the xgboost.train() function directly offers more flexibility and control over the training process.
Here’s how you can train an XGBoost model using xgboost.train():
# XGBoosting.com
# Train an XGBoost Model using xgboost.train()
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import xgboost as xgb
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Convert data to DMatrix format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Define parameters
params = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'eta': 0.1,
    'max_depth': 6,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'seed': 42
}
# Train the model
num_rounds = 100
model = xgb.train(params, dtrain, num_rounds, evals=[(dtest, 'test')], early_stopping_rounds=10)
# Make predictions and evaluate
y_pred = model.predict(dtest)
y_pred = (y_pred > 0.5).astype(int)
accuracy = accuracy_score(y_test, y_pred)
print(f'Test accuracy: {accuracy:.4f}')
Key steps:
- Convert your data to the DMatrix format using xgb.DMatrix(). This is required forxgboost.train().
- Define your training parameters in a dictionary. This allows fine-grained control over the model’s behavior.
- Train the model with xgb.train(), specifying the parameters, training data, number of rounds, and evaluation sets.
- Make predictions with the trained model using model.predict()and evaluate its performance.
Using xgboost.train() provides more control over the training process and allows you to customize the model’s behavior extensively.
