-
Notifications
You must be signed in to change notification settings - Fork 5
fix(KDP): adding FeatureSelection to Text and Date features #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions bot
pushed a commit
that referenced
this pull request
Mar 31, 2025
## 1.10.0 (2025-03-31) * ops(KDP): fixing action wrapper version ([cae2785](cae2785)) * ops(KDP): fixing actions and node versions ([28cd93b](28cd93b)) * ops(KDP): fixing semantic release version problem ([c6b612e](c6b612e)) * ops(KDP): improving release workflows ([3f91e19](3f91e19)) * ops(KDP): increating python version allowance ([1d06b76](1d06b76)) * ops(KDP): increating python version allowance ([6070e69](6070e69)) * ops(KDP): relaxing python requirements ([fa903e4](fa903e4)) * docs(KDP): add an example_usages ([e980e12](e980e12)) * docs(KDP): add to the docs to showcase new features ([790f102](790f102)) * docs(KDP): added a diagram ([48d2e57](48d2e57)) * docs(KDP): added docs ([d8da4c5](d8da4c5)) * docs(KDP): added docs ([dd728ef](dd728ef)) * docs(KDP): adding concrete examples of usage of advance features to docs (#21) ([ea419cf](ea419cf)), closes [#21](#21) * docs(KDP): adding new styling ([55ae7ce](55ae7ce)) * docs(KDP): adding some images and API documentation tools ([a4be43f](a4be43f)) * docs(KDP): adjusting numerical embeddings docs ([45f8872](45f8872)) * docs(KDP): fixing missing links ([076c9b3](076c9b3)) * docs(KDP): improving documentation ([02a137a](02a137a)) * docs(KDP): improving documentation ([91d4f85](91d4f85)) * docs(KDP): improving some remaining docs ([0c932b0](0c932b0)) * docs(KDP): refactoring documentation ([1ccc3f2](1ccc3f2)) * docs(KDP): removing whta we do not have ([93706fb](93706fb)) * docs(KDP): reorganising documentation for better UX ([a67c634](a67c634)) * docs(KDP): reorganising documentation for better UX ([052454e](052454e)) * docs(KDP): revamping entire docs ([d0ef7b7](d0ef7b7)) * docs(KDP): smart processing for custom pipelines ([8e7d0a7](8e7d0a7)) * docs(KDP): updating DistributionEncoder docs ([1916c40](1916c40)) * feat(KDP): add the AdvancedNumericalEmbedding feature ([8fa90e7](8fa90e7)) * feat(KDP): added option to specify the distribution manually ([d3cce76](d3cce76)) * feat(KDP): added selective retention of outputs based on dependencies among layers ([c17acd2](c17acd2)) * feat(KDP): addin DistributionAwareEncored layer and numeric preprocessing ([9bfe276](9bfe276)) * feat(KDP): adding auto config / recommender ([00e75d6](00e75d6)) * feat(KDP): adding categorical features hashing ([078df9d](078df9d)) * feat(KDP): adding DistributionAwareEncored layer for numeric features preprocessing. (#20) ([c988087](c988087)), closes [#20](#20) * feat(KDP): adding feature selection mechanism to the preprocessor (docs, tests) (#19) ([462bfc1](462bfc1)), closes [#19](#19) * feat(KDP): adding Gater Residual Variable / Features Selection Network capability ([f82b788](f82b788)) * feat(KDP): adding MoE feature and tests ([fac7806](fac7806)) * feat(KDP): adding numerical embedding layers (#26) ([5c3a974](5c3a974)), closes [#26](#26) * feat(KDP): adding passthrough feature ([2965916](2965916)) * feat(kdp): adding TabularAttentionLayers and implementation ([f567585](f567585)) * feat(kdp): adding TabularAttentionLayers and implementation (#11) ([cfbd38b](cfbd38b)), closes [#11](#11) * feat(KDP): Enhance Dynamic Preprocessing Pipeline (#24) ([bd90f11](bd90f11)), closes [#24](#24) * feat(KDP): global embedding for numeric features option added ([83f6996](83f6996)) * feat(KDP): Integrate Advanced Numerical Embedding (#25) ([185292c](185292c)), closes [#25](#25) * feat(KDP): smart processing for custom pipelines ([448f63f](448f63f)) * fix(KDP): add new examples for tabular attention cases and more complex Mixed Transformers and Tabul ([16340f2](16340f2)) * fix(KDP): add transdormer() method to ProcessingModel ([0c6c65c](0c6c65c)) * fix(KDP): added a missing code for the example for disttribution aware layer for custom pipelines ([9ca9fad](9ca9fad)) * fix(KDP): added a missing code for the example for disttribution aware layer for custom pipelines ([9b92475](9b92475)) * fix(KDP): added docstrings ([0eb968d](0eb968d)) * fix(KDP): added fixes for the distribution estimator and tests ([da79bb9](da79bb9)) * fix(KDP): Added get_feature_importances() method and fixed the docs. ([664023f](664023f)) * fix(KDP): added prefered_distribution parameter for NumericalFeatures ([84b2eb5](84b2eb5)) * fix(KDP): adding FeatureSelection to Text and Date features (#28) ([e1f453f](e1f453f)), closes [#28](#28) * fix(kdp): adding pre-commit fixes ([ec98d29](ec98d29)) * fix(KDP): broke transform method into 2 separate methods and end-to-end tests ([11f258d](11f258d)) * fix(KDP): changed the order of the transormers and the tabularAttention applications ([d387826](d387826)) * fix(KDP): DistributionAwareEncoder fix and tests for custom pipelines (#23) ([ad91096](ad91096)), closes [#23](#23) * fix(KDP): edited some of the tests to reflect the changes in processor.py ([23b36ce](23b36ce)) * fix(KDP): edited the docs ([34476c6](34476c6)) * fix(KDP): Fix linter errors: unused variable and rearranged imports ([2c2a447](2c2a447)) * fix(KDP): Fix remaining unused imports with ruff ([fe3a014](fe3a014)) * fix(KDP): fix_tabukar_att_and_transfor_order_and_add_docs (#17) ([4b1c510](4b1c510)), closes [#17](#17) * fix(KDP): fixed all the algorithms for distribution detection all tests pass now ([52dad69](52dad69)) * fix(KDP): Fixed dimensions micmatch for the input of the Tabular Attention ([5a66ad1](5a66ad1)) * fix(KDP): fixed issues between graph and eager mode plus others ([a3fe7e1](a3fe7e1)) * fix(KDP): fixes to the tests ([bc0c543](bc0c543)) * fix(KDP): Fixing Distribution-Aware Encoder and adding comprehensive testing (#22) ([97b41c3](97b41c3)), closes [#22](#22) * fix(kdp): fixing docs requirements and release for docs ([9d8f8b3](9d8f8b3)) * fix(KDP): fixing failiing tests ([966434d](966434d)) * fix(kdp): fixing formatting issues ([352e72b](352e72b)) * fix(KDP): fixing layers functionality ([ffe8d89](ffe8d89)) * fix(kdp): fixing tests fromatting ([955ed08](955ed08)) * fix(KDP): fixing the doc ([fdaa101](fdaa101)) * fix(KDP): improving docs UX ([f51daf8](f51daf8)) * fix(KDP): reformatting with pre-commits ([2f01e67](2f01e67)) * fix(KDP): reformatting with pre-commits ([2523f89](2523f89)) * fix(KDP): removed an unused method ([a170db8](a170db8)) * fix(KDP): Removed some buggy feature ([da24b7b](da24b7b)) * fix(KDP): Removing data.csv ([3ca0daf](3ca0daf)) * fix(KDP): small fixes ([47a6267](47a6267)) * test(KDP): add end to end and unit test for the "TabularAttention" and the "MultiResolutionTabularAt ([1cf5d09](1cf5d09)) * test(kdp): add end to end tests ([a1b3018](a1b3018)) * test(kdp): add end to end tests (#13) ([5737d9b](5737d9b)), closes [#13](#13) * test(KDP): add more tests ([a4d536c](a4d536c)) * test(KDP): add more tests (#15) ([84bfdc6](84bfdc6)), closes [#15](#15) * test(KDP): add tests ([5cb4e8e](5cb4e8e)) * test(KDP): add tests ([55cbbb3](55cbbb3)) * test(KDP): add tests and fix (#14) ([5308029](5308029)), closes [#14](#14) * test(KDP): add tests for various cases and also a ValueError for missing vocab scenario ([82a99a3](82a99a3)) * test(KDP): add unit test for gates res network ([1d980e0](1d980e0)) * test(KDP): added test for advanced features ([d4fc5f3](d4fc5f3)) * test(KDP): added test for advanced features ([c18a59b](c18a59b)) * test(KDP): adding passthrough tests ([f98dc2c](f98dc2c)) * test(KDP): dummy ([4181bb3](4181bb3)) * test(KDP): dummy commit ([2883d79](2883d79)) * test(KDP): dummy commit ([1f1e35b](1f1e35b)) * test(KDP): empty commit for testing ([9b0f386](9b0f386)) * test(KDP): extending testes for preprocessor module ([2e23b3d](2e23b3d)) * refactor(KDP): impreoving auto configuration functionality and UX ([7b76a99](7b76a99)) * refactor(KDP): maintainance on preprocessor to optimize code and refactor ([7146afe](7146afe)) * refactor(KDP): removing tf-proba dependency ([83cb73d](83cb73d)) * refactor(KDP): splitting custom_layers ([f029f77](f029f77)) * refactor(KDP): splitting layers into separate files ([c188267](c188267)) * refactor(KDP): splitting more tests for layers ([84293f0](84293f0)) * feat(validation): add day of the month add assertions and error handling ([fa88c24](fa88c24)) * fix: update distribution aware encoder and tests ([62f0dba](62f0dba)) * fix(validation): added unit tests and fixed some small bugs (#12) ([3042e2a](3042e2a)), closes [#12](#12) * Merge branch 'main' into feat_adding_grvs ([147bceb](147bceb)) * Merge branch 'main' into feat_dist_aware_embedding_numerical ([27dff94](27dff94)) * Merge branch 'main' into fix_tab_att_and_transfor ([6312d09](6312d09)) * Merge branch 'main' into tabular_attention_tests ([9491590](9491590)) * Merge branch 'piotrlaczkowski:main' into cutom_preprocess_smart ([1d7e48a](1d7e48a)) * Merge branch 'piotrlaczkowski:main' into feat_num_emb ([0bf0cc3](0bf0cc3)) * Merge branch 'piotrlaczkowski:main' into feat_num_emb ([1dd8e83](1dd8e83)) * Merge pull request #1 from piotrlaczkowski/main ([609dd5b](609dd5b)), closes [#1](#1) * test(validation): added unit tests and fixed a little type mismatch ([d0303cb](d0303cb))
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request to
kdp/processor.pyincludes changes to add support for new feature types and improve the feature selection process. The most important changes include adding new feature types toFeatureSelectionPlacementOptions, removing unnecessary preprocessing steps for numeric features, and implementing feature selection for text and date features.Feature Type Additions:
kdp/processor.py: AddedTEXTandDATEto theFeatureSelectionPlacementOptionsenum to support new feature types.Preprocessing Improvements:
kdp/processor.py: Removed the casting tofloat32before distribution-aware encoding in the_add_pipeline_numericmethod to streamline the preprocessing steps.Feature Selection Enhancements:
kdp/processor.py: Implemented feature selection for text features in the_add_pipeline_textmethod, applying feature selection if enabled forTEXTorALL_FEATURESoptions.kdp/processor.py: Implemented feature selection for date features in the_add_pipeline_datemethod, applying feature selection if enabled forDATEorALL_FEATURESoptions.fixed the issues of #27