Thanks to visit codestin.com
Credit goes to GitHub.com

Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 23, 2026

What this PR does / why we need it:

store.push() fails with ModuleNotFoundError when On-Demand Feature Views contain UDFs that reference modules unavailable in the current environment (e.g., training code not deployed to serving infrastructure). This blocks data ingestion in production environments.

Changes

Transformation deserialization bypass:

  • PandasTransformation.from_proto() and PythonTransformation.from_proto() accept skip_udf parameter
  • Returns explicit identity functions instead of deserializing with dill.loads() when enabled
  • Preserves type signatures for both DataFrame and Dict transformations

Call chain propagation:

  • push(), push_async(), write_to_online_store(), write_to_online_store_async() accept skip_feature_view_validation parameter
  • Flows through list_all_feature_views() → registry → proto_registry_utils
  • OnDemandFeatureView.from_proto() propagates skip_udf to transformation parsing

Selective UDF skipping (Critical safeguard):

  • skip_feature_view_validation only skips UDF deserialization for ODFVs with write_to_online_store=False
  • ODFVs with write_to_online_store=True always load their actual UDFs since they will be executed during push operations
  • This prevents hiding legitimate errors where the UDF is actually needed for transformation execution
  • Logic implemented in list_on_demand_feature_views() to check write_to_online_store flag before applying skip_udf

Caching strategy:

  • Modified list_all_feature_views() and list_on_demand_feature_views() to support conditional caching
  • When skip_udf=False (default): Uses cached versions (_list_all_feature_views_cached, _list_on_demand_feature_views_cached) with @registry_proto_cache_with_tags decorator for performance
  • When skip_udf=True: Bypasses caching to prevent cache pollution with dummy UDFs
  • Maintains backward compatibility by preserving caching behavior for default case
  • The @registry_proto_cache_with_tags decorator was moved from the public functions to new internal cached versions because the decorator's signature only supports 3 parameters (registry_proto, project, tags) and cannot accommodate the additional skip_udf parameter

API Consistency:

Usage

# Before: Fails if UDF references unavailable modules
store.push("push_source", df)

# After: Skips UDF deserialization for ODFVs that won't execute transformations
store.push("push_source", df, skip_feature_view_validation=True)

Important: ODFVs with write_to_online_store=True will always have their UDFs deserialized even when skip_feature_view_validation=True, as their transformations are executed during push. Only ODFVs that don't execute transformations during push can safely skip UDF loading.

All parameters default to False for backward compatibility. No API breaking changes.

Which issue(s) this PR fixes:

Addresses ModuleNotFoundError when pushing data with ODFVs containing UDFs that reference environment-specific modules.

Misc

  • CodeQL scan: 0 alerts
  • Added unit tests for transformation skip logic and parameter propagation
  • Added test test_skip_feature_view_validation_only_applies_to_non_writing_odfvs() to validate that write_to_online_store=True ODFVs always load real UDFs
  • Follows pattern suggested by @franciscojavierarceo for consistency with apply() method
  • Parameter naming matches PR feat: Add skip_feature_view_validation parameter to FeatureStore.apply() and plan() #5859 (skip_feature_view_validation) for API consistency across all FeatureStore methods
  • Caching implementation reviewed and confirmed to maintain performance for default use case while preventing cache pollution when skip_feature_view_validation is enabled
  • Follow-up issue recommended for cleaning up async methods (write_to_online_store_async, push_async) as async functionality should be server-specific going forward
Original prompt

This section details on the original issue you should resolve

<issue_title>Read On-Demand Feature View and deserialization while pushing data</issue_title>
<issue_description> ## Description

A ModuleNotFoundError occurs when calling store.push() to ingest data into the Online Store. The error is triggered when Feast attempts to synchronize the registry and encounters an On-Demand Feature View.

Because Feast uses dill to serialize/deserialize User Defined Functions (UDFs), it fails if the execution environment lacks the specific Python module (in this case, training) that was present when the UDF was originally defined and registered.
🔍 Error Traceback

File "/usr/local/lib/python3.12/site-packages/data_dataflow/core/io.py", line 201, in process
    self.store.push(push_source_name, df, to=PushMode.ONLINE)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/feast/feature_store.py", line 1698, in push
    self.write_to_online_store(
  File "/usr/local/lib/python3.12/site-packages/feast/feature_store.py", line 1946, in write_to_online_store
    feature_view, df = self._get_feature_view_and_df_for_online_write(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/feast/feature_store.py", line 1904, in _get_feature_view_and_df_for_online_write
    for fv_proto in self.list_all_feature_views(allow_registry_cache)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/feast/feature_store.py", line 294, in list_all_feature_views
    return self._list_all_feature_views(allow_cache, tags=tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/feast/feature_store.py", line 269, in _list_all_feature_views
    for fv in self.registry.list_all_feature_views(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/feast/infra/registry/registry.py", line 647, in list_all_feature_views
    return proto_registry_utils.list_all_feature_views(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/feast/infra/registry/proto_registry_utils.py", line 67, in wrapper
    cache_value = func(registry_proto, project, tags)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/feast/infra/registry/proto_registry_utils.py", line 243, in list_all_feature_views
    + list_on_demand_feature_views(registry_proto, project, tags)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/feast/infra/registry/proto_registry_utils.py", line 67, in wrapper
    cache_value = func(registry_proto, project, tags)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/feast/infra/registry/proto_registry_utils.py", line 285, in list_on_demand_feature_views
    OnDemandFeatureView.from_proto(on_demand_feature_view)
  File "/usr/local/lib/python3.12/site-packages/feast/on_demand_feature_view.py", line 400, in from_proto
    transformation = PandasTransformation.from_proto(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/feast/transformation/pandas_transformation.py", line 150, in from_proto
    udf=dill.loads(user_defined_function_proto.body),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 311, in loads
    return load(file, ignore, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 297, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 452, in load
    obj = StockUnpickler.load(self)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 442, in find_class
    return StockUnpickler.find_class(self, module, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'training'

Steps to Reproduce

- Define an OnDemandFeatureView in an environment where put transformation code to a local module.

- The UDF inside this view references a function, class, or constant from the training module.

- Run feast apply to save the definition to the remote registry (e.g., S3, GCS, or SQL).

- Run store.push() from a different environment that does not have the training module installed in its Python path.

Root Cause Analysis

This is a serialization dependency issue. When dill (the library Feast uses for pickling) serializes a function, it often stores references to the modules where global variables or dep...


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] Fix ModuleNotFoundError in On-Demand Feature View feat: Add skip_validation parameter to push() for ODFVs with missing UDF dependencies Jan 23, 2026
return feature_services


@registry_proto_cache_with_tags

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot why did you remove this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see you added it in _list_all_feature_views_cached

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the @registry_proto_cache_with_tags decorator because it only accepts 3 parameters (registry_proto, project, tags) and doesn't support the additional skip_udf parameter.

To maintain caching while supporting skip_udf, I:

  1. Created _list_all_feature_views_cached() with the decorator (for skip_udf=False)
  2. Made list_all_feature_views() conditionally call the cached version when skip_udf=False, or bypass caching when skip_udf=True

This prevents cache pollution with dummy UDFs while preserving the caching behavior for the default case (skip_udf=False).

Would you prefer a different approach, such as modifying the decorator to support additional parameters?

@@ -1769,7 +1783,10 @@ async def push_async(
_ = await asyncio.gather(
*[
self.write_to_online_store_async(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot can you create a new GitHub Issue to clean up the write_to_online_store_async method? we should just be using async in the server going forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have direct access to create GitHub issues, but here's the recommended issue content:

Title: Refactor async methods in FeatureStore for server-only usage

Description:
The write_to_online_store_async() and push_async() methods should be cleaned up as async functionality should primarily be used in the server context going forward.

Proposed changes:

  • Review usage patterns of async methods
  • Consider deprecating or restructuring async methods for server-specific usage
  • Ensure clear separation between sync (client) and async (server) code paths

Related PR: #[this PR number]
Context: Line 1785 in feature_store.py

assert result.equals(test_df)


def test_python_transformation_from_proto_with_skip_udf():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot we need to validate that the skip_udf is being applied only during push() for ODFVs that do not have write_to_online_store=True because that would hide legitimate breaks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 2433df6. Added logic to only skip UDF deserialization for ODFVs with write_to_online_store=False. ODFVs with write_to_online_store=True always load the actual UDF since it will be executed during push operations. Also added test test_skip_validation_only_applies_to_non_writing_odfvs() to validate this behavior.

@franciscojavierarceo
Copy link
Member

@copilot maybe we should word it as skip_feature_view_validation to be consistent with #5859

Copy link
Contributor Author

Copilot AI commented Jan 24, 2026

@copilot maybe we should word it as skip_feature_view_validation to be consistent with #5859

Renamed in commit 885536e. Changed all occurrences of skip_validation to skip_feature_view_validation to match the naming convention from PR #5859.

Copilot AI changed the title feat: Add skip_validation parameter to push() for ODFVs with missing UDF dependencies feat: Add skip_feature_view_validation parameter to push() for ODFVs with missing UDF dependencies Jan 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Read On-Demand Feature View and deserialization while pushing data

2 participants