This is an open-source AutoML project. Currently this project contains a model finder, hyper-parameter tuner, and trial manager all wrapped up in one synchronous end to end program. The input to this program is a dataset and your priorities, the output is a trained model with auto-optimized hyper parameters, ready to run inference, simple as that.
NOTE: This project is still under development, clone and play, we appreciate and encourage pull requests and feedback from the community. Help us make this tool awesome.
You might be building an ML pipeline to avoid model performance degeneration, or maybe you're just too lazy to download, debug, and tune your own model. Either way, you shouldn't be focusing your efforts on simple things like detection. There's a whole world out there for you to explore, give your hand at trajectory prediction or action recognition and let ZazuML free you up from the boring stuff.
ZazuML is built up from 4 main packages,
-
The Model Selector who's job is to select the optimal model based on the priorities of the user.
-
The Tuner who manages and keeps track of trials
-
The Launchpad who is in charge of launches and distribution of gpu resources amongst trials
-
And last but not least, the ZaZoo
The tetrahedron in the image above represents a vector space where each model occupies a unique position with it's own advantages and short comings. ZazuML computes the minimal euclidean distance between your priorities and model architecture.
First thing to do is . . .
docker run --rm -it --init --runtime=nvidia --ipc=host -e NVIDIA_VISIBLE_DEVICES=0 buffalonoam/zazu-image:0.1 bash
Be sure to update the nvidia-devices flag!
git clone https://github.com/dataloop-ai/ZazuML.git
cd ZazuML
git clone https://github.com/dataloop-ai/zazoo.git
The next thing to do is edit the configs.json file
{
"max_trials": 1,
"max_instances_at_once": 1,
"model_priority_space": [10, 0, 0],
"task": "detection",
"data": {
"home_path": "/home/noam/data/coco",
"annotation_type": "coco",
"dataset_name": "2017"
}
}
max_trials - defines the maximum total number of trials that will be tested
max_instances_at_once - defines the number of trials that will run simultaneously, i.e. in parallel to each other and must be smaller than the number of available gpus.
model_priority_space - define the model specs that best suits your priorities.
This is a 3 dimensional vector describing your model preferences in a euclidean vector space.
- axis 0 - accuracy
- axis 1 - inference speed
- axis 2 - memory
For example "model_priority_space": [2, 9, 10] indicates a very light but low accuracy model
task - i.e. detection vs classification vs instance segmentation (we currently only support detection)
data - This is an example of how to run on a Coco styled dataset.
python zazutuner.py --search
python zazutuner.py --train
python zazutuner.py --predict
python zazutuner.py --search --remote
Some of the code was influenced by keras-tuner