XGBoosting Home | About | Contact | Examples

Data

Helpful examples for preparing data for XGBoost models.

Data preparation for XGBoost models generally involves cleaning and transforming raw data into a suitable format for training, including handling missing values, encoding categorical variables, scaling features, and splitting the data into training and validation sets to ensure the model can learn effectively and generalize well to new data.

ExamplesTags
Data Preparation for XGBoost
Detecting and Handling Data Drift with XGBoost
Encode Categorical Features As Dummy Variables for XGBoost
Feature Engineering for XGBoost
Float Input Features for XGBoost
Impute Missing Input Values for XGBoost
Integer Input Features for XGBoost
Label Encode Categorical Input Variables for XGBoost
Label Encode Categorical Target Variable for XGBoost
Missing Input Values With XGBoost
One-Hot Encode Categorical Features for XGBoost
Ordinal Encode Categorical Features for XGBoost
Removing Outliers from Training Data For XGBoost
String Input Features for XGBoost
Text Input Features for XGBoost
Train an XGBoost Model on a CSV File
Train an XGBoost Model on a Dataset Stored in Lists
Train an XGBoost Model on a DMatrix With Native API
Train an XGBoost Model on a NumPy Array
Train an XGBoost Model on a Pandas DataFrame
Train an XGBoost Model on an Excel File
Train XGBoost with DMatrix External Memory
Use XGBoost Feature Importance for Feature Selection
Use XGBoost Feature Importance for Incremental Feature Selection
What is a DMatrix in XGBoost
What is a QuantileDMatrix in XGBoost
Why Use A DMatrix in XGBoost
XGBoost "sample_weight" to Bias Training Toward Recent Examples (Data Drift)
XGBoost Add Lagged Input Variables for Time Series Forecasting
XGBoost Add Rolling Mean To Time Series Data
XGBoost Assumes Data is IID (i.i.d.)
XGBoost Assumes Stationary Time Series Data
XGBoost Convert DMatrix to NumPy Array
XGBoost Convert DMatrix to Pandas DataFrame
XGBoost Convert NumPy Array to DMatrix
XGBoost Convert Pandas DataFrame to DMatrix
XGBoost Convert Python List to DMatrix
XGBoost Detrend Transform Time Series Data
XGBoost Difference Transform Time Series Data
XGBoost Don't Use One-Hot-Encoding
XGBoost Drop Non-Predictive Input Features
XGboost Feature Engineering Of Dates
XGBoost Feature Selection with RFE
XGBoost for Imbalanced Classification with SMOTE
XGBoost for the Abalone Age Dataset
XGBoost for the Adult Dataset
XGBoost for the Boston Housing Dataset
XGBoost for the California Housing Dataset
XGBoost for the Cleveland Heart Disease Dataset
XGBoost for the Covertype Dataset
XGBoost for the Diabetes Dataset
XGBoost for the Glass Identification Dataset
XGBoost for the Handwritten Digits Dataset
XGBoost for the Higgs Boson Dataset
XGBoost for the Horse Colic Dataset
XGBoost for the Ionosphere Dataset
XGBoost for the Iris Dataset
XGBoost for the KDDCup99 Dataset
XGBoost for the Linnerud Dataset
XGBoost for the Pima Indians Diabetes Dataset
XGBoost for the Sonar Dataset
XGBoost for the Wheat Seeds Dataset
XGBoost for the Wholesale Customers Dataset
XGBoost for the Wine Dataset
XGBoost for the Wisconsin Breast Cancer Dataset
XGBoost Interpolate Missing Values For Time Series Data
XGBoost Load CSV File as DMatrix
XGboost Min-Max Scaling Numerical Input Features
XGBoost Model Performance Improves With More Data
XGBoost NaN Input Values (missing)
XGBoost Native Categorical Faster Than One Hot and Ordinal Encoding
XGboost Normalize Numerical Input Features
XGBoost Performs Automatic Feature Selection
XGboost Power Transform Numerical Input Features
XGBoost Power Transform Time Series Data
XGBoost Print Data in DMatrix
XGBoost Remove Least Important Features
XGboost Remove Outliers With Elliptic Envelope Method
XGboost Remove Outliers With IQR Statistical Method
XGboost Remove Outliers With Isolation Forest
XGboost Remove Outliers With Local Outlier Factor
XGboost Remove Outliers With One-Class SVM
XGboost Remove Outliers With Z-Score Statistical Method
XGBoost Robust to Correlated Input Features (multi-collinearity)
XGBoost Robust to Mislabeled Data (label noise)
XGBoost Robust to More Features Than Examples (P>>N)
XGBoost Robust to Outliers in Data
XGBoost Robust to Redundant Input Features
XGBoost Robust to Small Datasets
XGBoost Seasonal Difference Transform Time Series Data
XGboost Standardize Numerical Input Features
XGBoost's Native Support for Categorical Features