XGBoost Train Model Using xgboost.train() Native API

Train

While the scikit-learn API provides a convenient way to train XGBoost models, using the xgboost.train() function directly offers more flexibility and control over the training process.

Here’s how you can train an XGBoost model using xgboost.train():

# XGBoosting.com
# Train an XGBoost Model using xgboost.train()
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import xgboost as xgb

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert data to DMatrix format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Define parameters
params = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'eta': 0.1,
    'max_depth': 6,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'seed': 42
}

# Train the model
num_rounds = 100
model = xgb.train(params, dtrain, num_rounds, evals=[(dtest, 'test')], early_stopping_rounds=10)

# Make predictions and evaluate
y_pred = model.predict(dtest)
y_pred = (y_pred > 0.5).astype(int)
accuracy = accuracy_score(y_test, y_pred)
print(f'Test accuracy: {accuracy:.4f}')

Key steps:

Convert your data to the DMatrix format using xgb.DMatrix(). This is required for xgboost.train().
Define your training parameters in a dictionary. This allows fine-grained control over the model’s behavior.
Train the model with xgb.train(), specifying the parameters, training data, number of rounds, and evaluation sets.
Make predictions with the trained model using model.predict() and evaluate its performance.

Using xgboost.train() provides more control over the training process and allows you to customize the model’s behavior extensively.

See Also