feat(KDP): adding numerical embedding layers#26
Merged
piotrlaczkowski merged 8 commits intoUnicoLab:mainfrom Feb 21, 2025
Merged
feat(KDP): adding numerical embedding layers#26piotrlaczkowski merged 8 commits intoUnicoLab:mainfrom
piotrlaczkowski merged 8 commits intoUnicoLab:mainfrom
Conversation
piotrlaczkowski
requested changes
Feb 20, 2025
Collaborator
piotrlaczkowski
left a comment
There was a problem hiding this comment.
Docstrings to be reformatted and some strange behaviour with selective featues (from the image)
piotrlaczkowski
approved these changes
Feb 21, 2025
github-actions bot
pushed a commit
that referenced
this pull request
Mar 31, 2025
## 1.10.0 (2025-03-31) * ops(KDP): fixing action wrapper version ([cae2785](cae2785)) * ops(KDP): fixing actions and node versions ([28cd93b](28cd93b)) * ops(KDP): fixing semantic release version problem ([c6b612e](c6b612e)) * ops(KDP): improving release workflows ([3f91e19](3f91e19)) * ops(KDP): increating python version allowance ([1d06b76](1d06b76)) * ops(KDP): increating python version allowance ([6070e69](6070e69)) * ops(KDP): relaxing python requirements ([fa903e4](fa903e4)) * docs(KDP): add an example_usages ([e980e12](e980e12)) * docs(KDP): add to the docs to showcase new features ([790f102](790f102)) * docs(KDP): added a diagram ([48d2e57](48d2e57)) * docs(KDP): added docs ([d8da4c5](d8da4c5)) * docs(KDP): added docs ([dd728ef](dd728ef)) * docs(KDP): adding concrete examples of usage of advance features to docs (#21) ([ea419cf](ea419cf)), closes [#21](#21) * docs(KDP): adding new styling ([55ae7ce](55ae7ce)) * docs(KDP): adding some images and API documentation tools ([a4be43f](a4be43f)) * docs(KDP): adjusting numerical embeddings docs ([45f8872](45f8872)) * docs(KDP): fixing missing links ([076c9b3](076c9b3)) * docs(KDP): improving documentation ([02a137a](02a137a)) * docs(KDP): improving documentation ([91d4f85](91d4f85)) * docs(KDP): improving some remaining docs ([0c932b0](0c932b0)) * docs(KDP): refactoring documentation ([1ccc3f2](1ccc3f2)) * docs(KDP): removing whta we do not have ([93706fb](93706fb)) * docs(KDP): reorganising documentation for better UX ([a67c634](a67c634)) * docs(KDP): reorganising documentation for better UX ([052454e](052454e)) * docs(KDP): revamping entire docs ([d0ef7b7](d0ef7b7)) * docs(KDP): smart processing for custom pipelines ([8e7d0a7](8e7d0a7)) * docs(KDP): updating DistributionEncoder docs ([1916c40](1916c40)) * feat(KDP): add the AdvancedNumericalEmbedding feature ([8fa90e7](8fa90e7)) * feat(KDP): added option to specify the distribution manually ([d3cce76](d3cce76)) * feat(KDP): added selective retention of outputs based on dependencies among layers ([c17acd2](c17acd2)) * feat(KDP): addin DistributionAwareEncored layer and numeric preprocessing ([9bfe276](9bfe276)) * feat(KDP): adding auto config / recommender ([00e75d6](00e75d6)) * feat(KDP): adding categorical features hashing ([078df9d](078df9d)) * feat(KDP): adding DistributionAwareEncored layer for numeric features preprocessing. (#20) ([c988087](c988087)), closes [#20](#20) * feat(KDP): adding feature selection mechanism to the preprocessor (docs, tests) (#19) ([462bfc1](462bfc1)), closes [#19](#19) * feat(KDP): adding Gater Residual Variable / Features Selection Network capability ([f82b788](f82b788)) * feat(KDP): adding MoE feature and tests ([fac7806](fac7806)) * feat(KDP): adding numerical embedding layers (#26) ([5c3a974](5c3a974)), closes [#26](#26) * feat(KDP): adding passthrough feature ([2965916](2965916)) * feat(kdp): adding TabularAttentionLayers and implementation ([f567585](f567585)) * feat(kdp): adding TabularAttentionLayers and implementation (#11) ([cfbd38b](cfbd38b)), closes [#11](#11) * feat(KDP): Enhance Dynamic Preprocessing Pipeline (#24) ([bd90f11](bd90f11)), closes [#24](#24) * feat(KDP): global embedding for numeric features option added ([83f6996](83f6996)) * feat(KDP): Integrate Advanced Numerical Embedding (#25) ([185292c](185292c)), closes [#25](#25) * feat(KDP): smart processing for custom pipelines ([448f63f](448f63f)) * fix(KDP): add new examples for tabular attention cases and more complex Mixed Transformers and Tabul ([16340f2](16340f2)) * fix(KDP): add transdormer() method to ProcessingModel ([0c6c65c](0c6c65c)) * fix(KDP): added a missing code for the example for disttribution aware layer for custom pipelines ([9ca9fad](9ca9fad)) * fix(KDP): added a missing code for the example for disttribution aware layer for custom pipelines ([9b92475](9b92475)) * fix(KDP): added docstrings ([0eb968d](0eb968d)) * fix(KDP): added fixes for the distribution estimator and tests ([da79bb9](da79bb9)) * fix(KDP): Added get_feature_importances() method and fixed the docs. ([664023f](664023f)) * fix(KDP): added prefered_distribution parameter for NumericalFeatures ([84b2eb5](84b2eb5)) * fix(KDP): adding FeatureSelection to Text and Date features (#28) ([e1f453f](e1f453f)), closes [#28](#28) * fix(kdp): adding pre-commit fixes ([ec98d29](ec98d29)) * fix(KDP): broke transform method into 2 separate methods and end-to-end tests ([11f258d](11f258d)) * fix(KDP): changed the order of the transormers and the tabularAttention applications ([d387826](d387826)) * fix(KDP): DistributionAwareEncoder fix and tests for custom pipelines (#23) ([ad91096](ad91096)), closes [#23](#23) * fix(KDP): edited some of the tests to reflect the changes in processor.py ([23b36ce](23b36ce)) * fix(KDP): edited the docs ([34476c6](34476c6)) * fix(KDP): Fix linter errors: unused variable and rearranged imports ([2c2a447](2c2a447)) * fix(KDP): Fix remaining unused imports with ruff ([fe3a014](fe3a014)) * fix(KDP): fix_tabukar_att_and_transfor_order_and_add_docs (#17) ([4b1c510](4b1c510)), closes [#17](#17) * fix(KDP): fixed all the algorithms for distribution detection all tests pass now ([52dad69](52dad69)) * fix(KDP): Fixed dimensions micmatch for the input of the Tabular Attention ([5a66ad1](5a66ad1)) * fix(KDP): fixed issues between graph and eager mode plus others ([a3fe7e1](a3fe7e1)) * fix(KDP): fixes to the tests ([bc0c543](bc0c543)) * fix(KDP): Fixing Distribution-Aware Encoder and adding comprehensive testing (#22) ([97b41c3](97b41c3)), closes [#22](#22) * fix(kdp): fixing docs requirements and release for docs ([9d8f8b3](9d8f8b3)) * fix(KDP): fixing failiing tests ([966434d](966434d)) * fix(kdp): fixing formatting issues ([352e72b](352e72b)) * fix(KDP): fixing layers functionality ([ffe8d89](ffe8d89)) * fix(kdp): fixing tests fromatting ([955ed08](955ed08)) * fix(KDP): fixing the doc ([fdaa101](fdaa101)) * fix(KDP): improving docs UX ([f51daf8](f51daf8)) * fix(KDP): reformatting with pre-commits ([2f01e67](2f01e67)) * fix(KDP): reformatting with pre-commits ([2523f89](2523f89)) * fix(KDP): removed an unused method ([a170db8](a170db8)) * fix(KDP): Removed some buggy feature ([da24b7b](da24b7b)) * fix(KDP): Removing data.csv ([3ca0daf](3ca0daf)) * fix(KDP): small fixes ([47a6267](47a6267)) * test(KDP): add end to end and unit test for the "TabularAttention" and the "MultiResolutionTabularAt ([1cf5d09](1cf5d09)) * test(kdp): add end to end tests ([a1b3018](a1b3018)) * test(kdp): add end to end tests (#13) ([5737d9b](5737d9b)), closes [#13](#13) * test(KDP): add more tests ([a4d536c](a4d536c)) * test(KDP): add more tests (#15) ([84bfdc6](84bfdc6)), closes [#15](#15) * test(KDP): add tests ([5cb4e8e](5cb4e8e)) * test(KDP): add tests ([55cbbb3](55cbbb3)) * test(KDP): add tests and fix (#14) ([5308029](5308029)), closes [#14](#14) * test(KDP): add tests for various cases and also a ValueError for missing vocab scenario ([82a99a3](82a99a3)) * test(KDP): add unit test for gates res network ([1d980e0](1d980e0)) * test(KDP): added test for advanced features ([d4fc5f3](d4fc5f3)) * test(KDP): added test for advanced features ([c18a59b](c18a59b)) * test(KDP): adding passthrough tests ([f98dc2c](f98dc2c)) * test(KDP): dummy ([4181bb3](4181bb3)) * test(KDP): dummy commit ([2883d79](2883d79)) * test(KDP): dummy commit ([1f1e35b](1f1e35b)) * test(KDP): empty commit for testing ([9b0f386](9b0f386)) * test(KDP): extending testes for preprocessor module ([2e23b3d](2e23b3d)) * refactor(KDP): impreoving auto configuration functionality and UX ([7b76a99](7b76a99)) * refactor(KDP): maintainance on preprocessor to optimize code and refactor ([7146afe](7146afe)) * refactor(KDP): removing tf-proba dependency ([83cb73d](83cb73d)) * refactor(KDP): splitting custom_layers ([f029f77](f029f77)) * refactor(KDP): splitting layers into separate files ([c188267](c188267)) * refactor(KDP): splitting more tests for layers ([84293f0](84293f0)) * feat(validation): add day of the month add assertions and error handling ([fa88c24](fa88c24)) * fix: update distribution aware encoder and tests ([62f0dba](62f0dba)) * fix(validation): added unit tests and fixed some small bugs (#12) ([3042e2a](3042e2a)), closes [#12](#12) * Merge branch 'main' into feat_adding_grvs ([147bceb](147bceb)) * Merge branch 'main' into feat_dist_aware_embedding_numerical ([27dff94](27dff94)) * Merge branch 'main' into fix_tab_att_and_transfor ([6312d09](6312d09)) * Merge branch 'main' into tabular_attention_tests ([9491590](9491590)) * Merge branch 'piotrlaczkowski:main' into cutom_preprocess_smart ([1d7e48a](1d7e48a)) * Merge branch 'piotrlaczkowski:main' into feat_num_emb ([0bf0cc3](0bf0cc3)) * Merge branch 'piotrlaczkowski:main' into feat_num_emb ([1dd8e83](1dd8e83)) * Merge pull request #1 from piotrlaczkowski/main ([609dd5b](609dd5b)), closes [#1](#1) * test(validation): added unit tests and fixed a little type mismatch ([d0303cb](d0303cb))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces advanced numerical embedding techniques in the Keras Data Processor (KDP) to better capture complex numerical relationships in data. The key changes include the addition of new embedding layers, updates to documentation, and modifications to the preprocessing model to support these new features.
Documentation Updates:
docs/advanced_numerical_embeddings.md: Added detailed documentation onAdvancedNumericalEmbeddingandGlobalAdvancedNumericalEmbedding, including their purposes, key parameters, and usage examples.docs/complex_example.md: Updated example to include the new advanced numerical embedding and global numerical embedding configurations.docs/example_usages.md: Added a new example demonstrating the usage of numerical embeddings with numerical features.Codebase Enhancements:
kdp/custom_layers.py: AddedGlobalAdvancedNumericalEmbeddingclass and updatedAdvancedNumericalEmbeddingclass to handle eager execution and feature squeezing. [1] [2] [3]kdp/processor.py: Integrated global numerical embedding into the preprocessing model, including new parameters and processing steps. [1] [2] [3] [4] [5]