Releases: KxSystems/ml
4.1.0
ML Registry Functionality: A location for the storage and versioning of ML models on-prem along with a common model retrieval API allowing models regardless of underlying requirements to be retrieved and used on kdb+ data. This allows for enhanced team collaboration opportunities and management oversight by centralising team work to a common storage location.
4.0.0
The release of ML toolkit 4.0 comes with several key changes, enhancements and improvements:
- Unified Codebase: Migrated other components of the ML toolkit (NLP & AutoML) into the same repository for improved code sharing and maintainability.
- PyKX Support: NLP, ML and AutoML will now use PyKX if available, otherwise reverting to embedPy.
- Python Dependency Updates: Added support for python 3.11, and removed several dependency version pins & limits to ensure compatibility and improved performance.
- Enhanced Testing & CI: Improved internal testing and continuous integration systems, ensuring better reliability for future releases. Includes automated Snyk scans for enhanced security.
- Multi-Processing Support Fix: Resolved issues with multi-processing support, providing more robust and efficient parallel processing capabilities.
- Examples Provided: Comprehensive examples and associated sample output reports are now available under examples/. These examples offer practical use cases and demonstrate the new features and improvements.
3.2.0
- Fix to issues relating to unsupported versions of scipy
- Updates to tests no-longer supported by the Python equivalent functions
3.1.0
- Update to FRESH functionality to be more efficient in distributed applications
- Fix to df2tab to handle nulls appropriately in date columns
- Fix to tsPlot functionality
Addition of stats library in tgz releases
Addition of stats library to packaged release (#95) * Addition of stats folder for .tgz releases * length update for FRESH functionality
3.0.1
Addition of stats library for docker image deployment
3.0.0
- Refactor coding/commenting style to be up to date with coding standards
- Addition of stats section. This includes functionality such as
- OLS/WLS fit/predict functionality
- Transfer of percentile/describe function from utility folder to stats folder
- Expansion of the.ml.describe` function to allow users more flexibility by having a user configurable json file - Change function names to camel case. Any functions that were affected by this change are defined within
functionMapping.json. These functions are still callable until the next release of the ML Toolkit. If the old versions are called a warning message will be sent to stdout - Scaling and transformation preprocessing functions were amended to now contain a
fit/transform/fitTransformkey. Any functions affected by this changed are defined withinfunctionMapping.json. These functions are still callable until the next release of the ML Toolkit. If the old versions are called a warning message will be sent to stdout. - All functions containing a
predict/update/transformkey as output, must now takeconfigas the initial input which is of typedictionaryand has amodelInfokey - The contents within Freshs'
hyperparam.txtfile were converted to a json filehyperparameters.json - The utility functions within the clustering library were moved to
clust/utils.q init.qcan now be loaded before initialization ofml.q- All README files were updated to reflect that the toolkit is not in its BETA release stages
- Test script was added to check that length of code in files did not exceed 80 chars
filelength.t - Tests are now run in appveyor/travis by calling
testFiles.bat. This will be updated when any new test folder is added to the toolkit - All tests were updated to reflect these changes
2.0.0
What’s New:
Time series functionality:
- Addition of time series models implemented in q
- AR, ARMA, ARIMA, SARIMA and ARCH.
- Time series feature engineering techniques (windowed and lagged feature generation.
- Data stationarity testing
Graph/pipeline resources:
- Framework for the development of modularised kdb+ workflows and executable pipeline structures
Optimization:
- Implementation of the Broyden-Fletcher-Goldfarb-Shanno algorithm for function minimization
Grid Search:
- Random and pseudo random (Sobol) number generated parameter set functionality providing an alternative to exhaustive grid search.
Clustering:
- Implementation of k-means clustering now uses early stopping
Updates:
Clustering:
- Fit / predict / update style function calls rather than just fit+predict as previously to allow models to be deployed for classification on incoming data.
Initial release candidate for version 2.0.0 (update)
Additive update, including clustering updates
- Fit / predict / update style function calls rather than just fit+predict as previously to allow models to be deployed for classification on incoming data.
Initial release candidate for version 2.0.0
What’s New:
Time series functionality:
- Addition of time series models implemented in q
- AR, ARMA, ARIMA, SARIMA and ARCH.
- Time series feature engineering techniques (windowed and lagged feature generation.
- Data stationarity testing
Graph/pipeline resources:
- Framework for the development of modularised kdb+ workflows and executable pipeline structures
Optimization:
- Implementation of the Broyden-Fletcher-Goldfarb-Shanno algorithm for function minimization
Grid Search:
- Random and pseudo random (Sobol) number generated parameter set functionality providing an alternative to exhaustive grid search.
Clustering: - Implementation of k-means clustering now uses early stopping
Updates:
Clustering:
- Fit / predict / update style function calls rather than just fit+predict as previously to allow models to be deployed for classification on incoming data.