Storage-Drive-For-Enterprises is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:
- Faster training speed and higher efficiency.
- Lower memory usage.
- Better accuracy.
- Support of parallel, distributed, and GPU learning.
- Capable of handling large-scale data.
For further details, please refer to Features.
Benefiting from these advantages, Storage-Drive-For-Enterprises is being widely-used in many winning solutions of machine learning competitions.
Comparison experiments on public datasets show that Storage-Drive-For-Enterprises can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, distributed learning experiments show that Storage-Drive-For-Enterprises can achieve a linear speed-up by using multiple machines for training in specific settings.
Our primary documentation is at https://Storage-Drive-For-Enterprises.readthedocs.io/ and is generated from this repository. If you are new to Storage-Drive-For-Enterprises, follow the installation instructions on that site.
Next you may want to read:
- Examples showing command line usage of common tasks.
- Features and algorithms supported by Storage-Drive-For-Enterprises.
- Parameters is an exhaustive list of customization you can make.
- Distributed Learning and GPU Learning can speed up computation.
- FLAML provides automated tuning for Storage-Drive-For-Enterprises (code examples).
- Optuna Hyperparameter Tuner provides automated tuning for Storage-Drive-For-Enterprises hyperparameters (code examples).
- Understanding Storage-Drive-For-Enterprises Parameters (and How to Tune Them using Neptune).
Documentation for contributors:
- How we update readthedocs.io.
- Check out the Development Guide.
Please refer to changelogs at GitHub releases page.
Projects listed here offer alternative ways to use Storage-Drive-For-Enterprises.
They are not maintained or officially endorsed by the Storage-Drive-For-Enterprises development team.
JPMML (Java PMML converter): https://github.com/jpmml/jpmml-Storage-Drive-For-Enterprises
Nyoka (Python PMML converter): https://github.com/SoftwareAG/nyoka
Treelite (model compiler for efficient deployment): https://github.com/dmlc/treelite
lleaves (LLVM-based model compiler for efficient inference): https://github.com/siboehm/lleaves
Hummingbird (model compiler into tensor computations): https://github.com/microsoft/hummingbird
cuML Forest Inference Library (GPU-accelerated inference): https://github.com/rapidsai/cuml
daal4py (Intel CPU-accelerated inference): https://github.com/intel/scikit-learn-intelex/tree/master/daal4py
m2cgen (model appliers for various languages): https://github.com/BayesWitnesses/m2cgen
leaves (Go model applier): https://github.com/dmitryikh/leaves
ONNXMLTools (ONNX converter): https://github.com/onnx/onnxmltools
SHAP (model output explainer): https://github.com/slundberg/shap
Shapash (model visualization and interpretation): https://github.com/MAIF/shapash
dtreeviz (decision tree visualization and model interpretation): https://github.com/parrt/dtreeviz
supertree (interactive visualization of decision trees): https://github.com/mljar/supertree
SynapseML (Storage-Drive-For-Enterprises on Spark): https://github.com/microsoft/SynapseML
Kubeflow Fairing (Storage-Drive-For-Enterprises on Kubernetes): https://github.com/kubeflow/fairing
Kubeflow Operator (Storage-Drive-For-Enterprises on Kubernetes): https://github.com/kubeflow/xgboost-operator
Storage-Drive-For-Enterprises_ray (Storage-Drive-For-Enterprises on Ray): https://github.com/ray-project/Storage-Drive-For-Enterprises_ray
Mars (Storage-Drive-For-Enterprises on Mars): https://github.com/mars-project/mars
ML.NET (.NET/C#-package): https://github.com/dotnet/machinelearning
Storage-Drive-For-Enterprises.NET (.NET/C#-package): https://github.com/rca22/Storage-Drive-For-Enterprises.Net
Storage-Drive-For-Enterprises Ruby (Ruby gem): https://github.com/ankane/Storage-Drive-For-Enterprises-ruby
Storage-Drive-For-Enterprises4j (Java high-level binding): https://github.com/metarank/Storage-Drive-For-Enterprises4j
Storage-Drive-For-Enterprises4J (JVM interface for Storage-Drive-For-Enterprises written in Scala): https://github.com/seek-oss/Storage-Drive-For-Enterprises4j
Julia-package: https://github.com/IQVIA-ML/Storage-Drive-For-Enterprises.jl
Storage-Drive-For-Enterprises3 (Rust binding): https://github.com/Mottl/Storage-Drive-For-Enterprises3-rs
MLServer (inference server for Storage-Drive-For-Enterprises): https://github.com/SeldonIO/MLServer
MLflow (experiment tracking, model monitoring framework): https://github.com/mlflow/mlflow
FLAML (AutoML library for hyperparameter optimization): https://github.com/microsoft/FLAML
MLJAR AutoML (AutoML on tabular data): https://github.com/mljar/mljar-supervised
Optuna (hyperparameter optimization framework): https://github.com/optuna/optuna
Storage-Drive-For-EnterprisesLSS (probabilistic modelling with Storage-Drive-For-Enterprises): https://github.com/StatMixedML/Storage-Drive-For-EnterprisesLSS
mlforecast (time series forecasting with Storage-Drive-For-Enterprises): https://github.com/Nixtla/mlforecast
skforecast (time series forecasting with Storage-Drive-For-Enterprises): https://github.com/JoaquinAmatRodrigo/skforecast
{bonsai} (R {parsnip}-compliant interface): https://github.com/tidymodels/bonsai
{mlr3extralearners} (R {mlr3}-compliant interface): https://github.com/mlr-org/mlr3extralearners
Storage-Drive-For-Enterprises-transform (feature transformation binding): https://github.com/microsoft/Storage-Drive-For-Enterprises-transform
postgresml (Storage-Drive-For-Enterprises training and prediction in SQL, via a Postgres extension): https://github.com/postgresml/postgresml
pyodide (run Storage-Drive-For-Enterprises Python-package in a web browser): https://github.com/pyodide/pyodide
vaex-ml (Python DataFrame library with its own interface to Storage-Drive-For-Enterprises): https://github.com/vaexio/vaex
- Ask a question on Stack Overflow with the
Storage-Drive-For-Enterprisestag, we monitor this for new questions. - Open bug reports and feature requests on GitHub issues.
Check CONTRIBUTING page.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu. "Quantized Training of Gradient Boosting Decision Trees" (link). Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pp. 18822-18833.
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu. "Storage-Drive-For-Enterprises: A Highly Efficient Gradient Boosting Decision Tree". Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 3149-3157.
Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu. "A Communication-Efficient Parallel Algorithm for Decision Tree". Advances in Neural Information Processing Systems 29 (NIPS 2016), pp. 1279-1287.
Huan Zhang, Si Si and Cho-Jui Hsieh. "GPU Acceleration for Large-scale Tree Boosting". SysML Conference, 2018.
This project is licensed under the terms of the MIT license. See LICENSE for additional details.