A REST server to quickly create test PMML models.
pmml-zoo allows you to quickly generate simulated data and create test PMML models by sending a JSON payload to a REST server, getting back the trained model.
What it is not
pmml-zoo doesn't aim at creating production models, it is intended to create models for smoke, integration and unit tests.
The best way to get started is using pmml-zoo container image.
$ docker pull ruivieira/pmml-zoo:0.0.1
$ docker run -i --rm -p 5000:5000 ruivieira/pmml-zooAssuming the server is running locally, the full REST API will be available at http://0.0.0.0:5000/apidocs.
As an example, let's create a linear regression.
We can send the following JSON payload to 0.0.0.0:5000/model/linear-regression:
curl --request POST \
--url http://0.0.0.0:5000/model/linear-regression \
--header 'content-type: application/json' \
--data '
{"data": {
"size": 1000,
"inputs": [
{"name": "feature-1",
"type": "continuous",
"points": [[10.0, 20.0], [20.0, 40.0], [50, 35.0], [100, 16.0]]
},
{"name": "feature-2",
"type": "discrete",
"points": [[0, 3.9], [2, 4.3], [8, 2.9], [9, 7.0]]
},
{"name": "feature-3",
"type": "categorical",
"points": [["low", 2.0], ["medium", 4.0], ["high", 1.0]]
}
],
"outputs": [
{"name": "feature-4",
"type": "continuous",
"points": [[1.0, 2.0], [4.0, 7.3], [7.0, 1.0], [100, 16.0]]
}]
}
}' \
-o model.pmml Data is simulated by first creating an empirical distribution by interpolating the provided points.
This empirical distribution is then sampled size times and that will be the variable data.
An important note is that all variables are independent (although spurious correlation may occur).
A complete explanation is provided in the documentation.
sizeis the size of the dataset.pointsis a list of data points to use to construct the interpolation, in the format(value, weight). For instance a list of[(1.0, 2.0), (2.0, 4.0)]means that value2.0will more frequent.nameis the feature name, which be used in the PMML modeltypecan be one ofcontinuous,discreteorcategoricalinputsandoutputshave the same format, with the obvious difference implied in the name.
After sending the above payload, a response consisting of the PMML's XML is returned, which is save (in this example) to the model.pmml file.
For now, these are the supported models:
- Linear regression (
/model/linearregression) - Random forest classification (
/model/randomforest)
Please use the issues for any suggestions, feedback, PRs or bugs. Thank you!