Jupyter notebooks can be a lot of fun but in the same time very useful in the initial stage of of exploring and learning before building and scaling to a bigger project. In this case I am exploring a minimal package for developling predictive ML models for small molecules.
- Data handling A
Dataset2Dand aDataStructureclasses take care of loading data in different formats, filtering for chemistry, calculating from a collection of fingerprints and descriptors using multiprocessing for efficiency, scaling/normalizing and providingXandynumpy arrays to be fed to the ML models. - Training - hyperparameter optimisation A number of popular ML algorithms are available for regression or classification. A
Classifierwrapper class takes care of setting up a grid search for hyperparameters, training the models and providing predictions - Testing by cross validation with calculation of error estimates
- Visualisation of models performance using different metrics and including error estimates
MIT