Salento is a statistical bug-detection framework based on the machine learning model used by Bayou. For technical details about Salento refer to the paper Bayesian Specification Learning for Finding API Usage Errors, FSE'17 (link)
- Python3 (Tested with 3.5.1)
- Tensorflow (Tested with 1.4)
To train a Salento model on a data file, say DATA.json:
- Setup environment:
export PYTHONPATH=$PYTHONPATH:/path/to/salento/src/main/python
-
Ensure that the data is in the right JSON format using the schema file
doc/json_schemas/salento_input_schema.json. -
(Optional.) Extract evidences from the data:
python3 src/main/python/scripts/evidence_extractor.py DATA.json DATA-training.json
This will create a DATA-training.json after extracting evidences from each package in DATA.json. Run with --help for more options that you can use to filter the sequences selected for training.
- Go to the model folder and start training with a model configuration:
cd src/main/python/salento/models/low_level_evidences
python3 train.py /path/to/DATA-training.json --config config.json
Run with --help to see a description of the model configuration options. Edit config.json as needed.
To test a trained model on some test data:
1-3. Follow steps 1-3 above to produce a file DATA-testing.json with evidences.
- Go to the aggregators folder and run one of the aggregators on the test data:
cd src/main/python/salento/aggregators
python3 sequence_aggregator.py --data_file /path/to/DATA-testing.json --model_dir /path/to/model/directory
The model directory should contain the trained model's files, such as checkpoint, config.json, etc.