Codestin Search App

ML Registry Functionality: A location for the storage and versioning of ML models on-prem along with a common model retrieval API allowing models regardless of underlying requirements to be retrieved and used on kdb+ data. This allows for enhanced team collaboration opportunities and management oversight by centralising team work to a common storage location.

The release of ML toolkit 4.0 comes with several key changes, enhancements and improvements:

Unified Codebase: Migrated other components of the ML toolkit (NLP & AutoML) into the same repository for improved code sharing and maintainability.
PyKX Support: NLP, ML and AutoML will now use PyKX if available, otherwise reverting to embedPy.
Python Dependency Updates: Added support for python 3.11, and removed several dependency version pins & limits to ensure compatibility and improved performance.
Enhanced Testing & CI: Improved internal testing and continuous integration systems, ensuring better reliability for future releases. Includes automated Snyk scans for enhanced security.
Multi-Processing Support Fix: Resolved issues with multi-processing support, providing more robust and efficient parallel processing capabilities.
Examples Provided: Comprehensive examples and associated sample output reports are now available under examples/. These examples offer practical use cases and demonstrate the new features and improvements.

Fix to issues relating to unsupported versions of scipy
Updates to tests no-longer supported by the Python equivalent functions

Update to FRESH functionality to be more efficient in distributed applications
Fix to df2tab to handle nulls appropriately in date columns
Fix to tsPlot functionality

Addition of stats library for docker image deployment

Refactor coding/commenting style to be up to date with coding standards
Addition of stats section. This includes functionality such as
- OLS/WLS fit/predict functionality
- Transfer of percentile/describe function from utility folder to stats folder
- Expansion of the .ml.describe` function to allow users more flexibility by having a user configurable json file
Change function names to camel case. Any functions that were affected by this change are defined within functionMapping.json. These functions are still callable until the next release of the ML Toolkit. If the old versions are called a warning message will be sent to stdout
Scaling and transformation preprocessing functions were amended to now contain a fit/transform/fitTransform key. Any functions affected by this changed are defined within functionMapping.json. These functions are still callable until the next release of the ML Toolkit. If the old versions are called a warning message will be sent to stdout.
All functions containing a predict/update/transform key as output, must now take config as the initial input which is of type dictionary and has a modelInfo key
The contents within Freshs' hyperparam.txt file were converted to a json file hyperparameters.json
The utility functions within the clustering library were moved to clust/utils.q
init.q can now be loaded before initialization of ml.q
All README files were updated to reflect that the toolkit is not in its BETA release stages
Test script was added to check that length of code in files did not exceed 80 chars filelength.t
Tests are now run in appveyor/travis by calling testFiles.bat. This will be updated when any new test folder is added to the toolkit
All tests were updated to reflect these changes

What’s New:

Time series functionality:

Addition of time series models implemented in q
- AR, ARMA, ARIMA, SARIMA and ARCH.
Time series feature engineering techniques (windowed and lagged feature generation.
Data stationarity testing

Graph/pipeline resources:

Framework for the development of modularised kdb+ workflows and executable pipeline structures

Optimization:

Implementation of the Broyden-Fletcher-Goldfarb-Shanno algorithm for function minimization

Grid Search:

Random and pseudo random (Sobol) number generated parameter set functionality providing an alternative to exhaustive grid search.

Clustering:

Implementation of k-means clustering now uses early stopping

Updates:

Clustering:

Fit / predict / update style function calls rather than just fit+predict as previously to allow models to be deployed for classification on incoming data.

Additive update, including clustering updates

Fit / predict / update style function calls rather than just fit+predict as previously to allow models to be deployed for classification on incoming data.

What’s New:
Time series functionality:

Addition of time series models implemented in q
- AR, ARMA, ARIMA, SARIMA and ARCH.
Time series feature engineering techniques (windowed and lagged feature generation.
Data stationarity testing

Graph/pipeline resources:

Framework for the development of modularised kdb+ workflows and executable pipeline structures

Optimization:

Implementation of the Broyden-Fletcher-Goldfarb-Shanno algorithm for function minimization

Grid Search:

Random and pseudo random (Sobol) number generated parameter set functionality providing an alternative to exhaustive grid search.
Clustering:
Implementation of k-means clustering now uses early stopping

Updates:

Clustering:

Fit / predict / update style function calls rather than just fit+predict as previously to allow models to be deployed for classification on incoming data.

Releases: KxSystems/ml

4.1.0

Uh oh!

4.0.0

Uh oh!

3.2.0

Uh oh!

3.1.0

Uh oh!

Addition of stats library in tgz releases

Uh oh!

3.0.1

Uh oh!

3.0.0

Uh oh!

2.0.0

Uh oh!

Initial release candidate for version 2.0.0 (update)

Uh oh!

Initial release candidate for version 2.0.0

Uh oh!