A framework to evaluate various models for tabular regression and classification tasks. The package integrates 25 machine learning (including deep learning) models for tabular prediction tasks from the following well-established model bases:
autogluon"LightGBM","CatBoost","XGBoost","Random Forest","Extremely Randomized Trees","K-Nearest Neighbors","Linear Regression","Neural Network with MXNet","Neural Network with PyTorch","Neural Network with FastAI".
pytorch_widedeep"TabMlp","TabResnet","TabTransformer","TabNet","SAINT","ContextAttentionMLP","SelfAttentionMLP","FTTransformer","TabPerceiver","TabFastFormer".
pytorch_tabular"Category Embedding","NODE","TabNet","TabTransformer","AutoInt","FTTransformer".
You are able to implement your own models, data processing pipelines, and datasets under the flexible and
well-tested framework for consistent comparisons with baseline models, which is even easier when your own model is
based on pytorch.
Supported features for all model bases:
- Data processing
- Data splitting (training/validation/testing sets)
- Data imputation
- Data filtering
- Data scaling
- Data augmentation
- Feature augmentation
- Feature selection
- etc.
- Multi-modal data
- Loading UCI datasets
- Data/result analysis
- Leaderboard
- Box plot
- Pair plot
- Pearson correlation
- Partial dependency plot (with bootstrapping)
- Feature importance (Permutation and SHAP)
- etc.
- Building models upon other trained models
pytorch_lightning-based training forpytorchmodels- Gaussian-process-based Bayesian hyperparameter optimization
- Cross-validation (including continuing from a cross-validation checkpoint)
- Saving, loading, and migrating models
The package stands on the shoulder of the giants:
- scikit-learn
- PyTorch
- PyTorch Lightning
- etc. (See
requirements.txt)
A full documentation is available here. For a quick start:
tabular_ensemblecan be installed using pypi by running the following command:
pip install tabensemb[torch]Please use pip install tabensemb instead if you already have torch>=1.12.0 installed. Use pip install tabensemb[test] if you want to run unit tests.
To install from source,
pip install -e .[torch]- (Optional) Run unit tests after installed
tabensemb[test]:
cd test
pytest .- Place your
.csvor.xlsxfile in adatasubfolder (e.g.,data/sample.csv), and generate a configuration file in aconfigssubfolder (e.g.,configs/sample.py), containing the following content
cfg = {
"database": "sample",
"continuous_feature_names": ["cont_0", "cont_1", "cont_2", "cont_3", "cont_4"],
"categorical_feature_names": ["cat_0", "cat_1", "cat_2"],
"label_name": ["target"],
}- Run the experiment using the configuration and the data using
python main.py --base sample --epoch 10where --base refers to the configuration file, and additional arguments (such as --epoch here) refer to those in config/default.py.
See the documentation pages for details.
If you use this repository, please cite us as:
(Will be updated after released on arXiv or published)