-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add skip_feature_view_validation parameter to push() for ODFVs with missing UDF dependencies #5894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: franciscojavierarceo <[email protected]>
Co-authored-by: franciscojavierarceo <[email protected]>
Co-authored-by: franciscojavierarceo <[email protected]>
Co-authored-by: franciscojavierarceo <[email protected]>
| return feature_services | ||
|
|
||
|
|
||
| @registry_proto_cache_with_tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot why did you remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see you added it in _list_all_feature_views_cached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the @registry_proto_cache_with_tags decorator because it only accepts 3 parameters (registry_proto, project, tags) and doesn't support the additional skip_udf parameter.
To maintain caching while supporting skip_udf, I:
- Created
_list_all_feature_views_cached()with the decorator (for skip_udf=False) - Made
list_all_feature_views()conditionally call the cached version whenskip_udf=False, or bypass caching whenskip_udf=True
This prevents cache pollution with dummy UDFs while preserving the caching behavior for the default case (skip_udf=False).
Would you prefer a different approach, such as modifying the decorator to support additional parameters?
| @@ -1769,7 +1783,10 @@ async def push_async( | |||
| _ = await asyncio.gather( | |||
| *[ | |||
| self.write_to_online_store_async( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot can you create a new GitHub Issue to clean up the write_to_online_store_async method? we should just be using async in the server going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have direct access to create GitHub issues, but here's the recommended issue content:
Title: Refactor async methods in FeatureStore for server-only usage
Description:
The write_to_online_store_async() and push_async() methods should be cleaned up as async functionality should primarily be used in the server context going forward.
Proposed changes:
- Review usage patterns of async methods
- Consider deprecating or restructuring async methods for server-specific usage
- Ensure clear separation between sync (client) and async (server) code paths
Related PR: #[this PR number]
Context: Line 1785 in feature_store.py
| assert result.equals(test_df) | ||
|
|
||
|
|
||
| def test_python_transformation_from_proto_with_skip_udf(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot we need to validate that the skip_udf is being applied only during push() for ODFVs that do not have write_to_online_store=True because that would hide legitimate breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 2433df6. Added logic to only skip UDF deserialization for ODFVs with write_to_online_store=False. ODFVs with write_to_online_store=True always load the actual UDF since it will be executed during push operations. Also added test test_skip_validation_only_applies_to_non_writing_odfvs() to validate this behavior.
Co-authored-by: franciscojavierarceo <[email protected]>
Co-authored-by: franciscojavierarceo <[email protected]>
…tency Co-authored-by: franciscojavierarceo <[email protected]>
What this PR does / why we need it:
store.push()fails withModuleNotFoundErrorwhen On-Demand Feature Views contain UDFs that reference modules unavailable in the current environment (e.g., training code not deployed to serving infrastructure). This blocks data ingestion in production environments.Changes
Transformation deserialization bypass:
PandasTransformation.from_proto()andPythonTransformation.from_proto()acceptskip_udfparameterdill.loads()when enabledCall chain propagation:
push(),push_async(),write_to_online_store(),write_to_online_store_async()acceptskip_feature_view_validationparameterlist_all_feature_views()→ registry → proto_registry_utilsOnDemandFeatureView.from_proto()propagatesskip_udfto transformation parsingSelective UDF skipping (Critical safeguard):
skip_feature_view_validationonly skips UDF deserialization for ODFVs withwrite_to_online_store=Falsewrite_to_online_store=Truealways load their actual UDFs since they will be executed during push operationslist_on_demand_feature_views()to checkwrite_to_online_storeflag before applyingskip_udfCaching strategy:
list_all_feature_views()andlist_on_demand_feature_views()to support conditional cachingskip_udf=False(default): Uses cached versions (_list_all_feature_views_cached,_list_on_demand_feature_views_cached) with@registry_proto_cache_with_tagsdecorator for performanceskip_udf=True: Bypasses caching to prevent cache pollution with dummy UDFs@registry_proto_cache_with_tagsdecorator was moved from the public functions to new internal cached versions because the decorator's signature only supports 3 parameters (registry_proto, project, tags) and cannot accommodate the additionalskip_udfparameterAPI Consistency:
skip_feature_view_validationto match the naming convention from PR feat: Add skip_feature_view_validation parameter to FeatureStore.apply() and plan() #5859 forapply()andplan()methodsUsage
Important: ODFVs with
write_to_online_store=Truewill always have their UDFs deserialized even whenskip_feature_view_validation=True, as their transformations are executed during push. Only ODFVs that don't execute transformations during push can safely skip UDF loading.All parameters default to
Falsefor backward compatibility. No API breaking changes.Which issue(s) this PR fixes:
Addresses ModuleNotFoundError when pushing data with ODFVs containing UDFs that reference environment-specific modules.
Misc
test_skip_feature_view_validation_only_applies_to_non_writing_odfvs()to validate thatwrite_to_online_store=TrueODFVs always load real UDFsapply()methodskip_feature_view_validation) for API consistency across all FeatureStore methodswrite_to_online_store_async,push_async) as async functionality should be server-specific going forwardOriginal prompt
This section details on the original issue you should resolve
<issue_title>Read On-Demand Feature View and deserialization while pushing data</issue_title>
<issue_description> ## Description
A ModuleNotFoundError occurs when calling store.push() to ingest data into the Online Store. The error is triggered when Feast attempts to synchronize the registry and encounters an On-Demand Feature View.
Because Feast uses dill to serialize/deserialize User Defined Functions (UDFs), it fails if the execution environment lacks the specific Python module (in this case, training) that was present when the UDF was originally defined and registered.
🔍 Error Traceback
Steps to Reproduce
Root Cause Analysis
This is a serialization dependency issue. When dill (the library Feast uses for pickling) serializes a function, it often stores references to the modules where global variables or dep...
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.