Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

shunping
Copy link
Collaborator

It is causing some internal test failure so we revert it for now.

- Fix custom coder not being used in Reshuffle (global window) (apache#33339)
- Fix custom coders not being used in Reshuffle (non global window) apache#33363
- Add missing to_type_hint to WindowedValueCoder apache#33403
@chamikaramj
Copy link
Contributor

LGTM. Thanks.

Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@shunping
Copy link
Collaborator Author

Failed tests are unrelated to the changes.

@chamikaramj chamikaramj merged commit e9424b9 into apache:master Dec 18, 2024
85 of 91 checks passed
@robertwb
Copy link
Contributor

Just a thought, as this changes coders in some cases, should this be guarded by the update compatibility flag? https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L592

@shunping
Copy link
Collaborator Author

shunping commented Dec 19, 2024

As the flag is defined in "StreamingOptions", is it previously designed for using in streaming case?

@kennknowles
Copy link
Member

As the flag is defined in "StreamingOptions", is it previously designed for using in streaming case?

Yes, it is designed for "streaming update" where you may have an in-progress aggregation in a shuffle when you do a pipeline update. Then you need the state to be compatible.

@robertwb
Copy link
Contributor

And by "state" here this includes the in-flight encoded elements that were written by the pre-udpate version of the pipeline and will be read by the post-update code.

Irrelevant for batch pipelines, but may become so if a runner supports some kind of a resume (from pause or failure) where the code might be updated.

@shunping
Copy link
Collaborator Author

I see. Thank you both for the clarification!

Regarding the possibly breaking changes that could be introduced by reverting this reverted PR, shall we add a new pipeline option rather than overloading this existing flag?

Something like "use_legacy_reshuffle" can allow users to switch back to the previous reshuffle code path, where basically FastPrimitivesCoder are used inside regardless of coders/typehints specified by cx.

@robertwb
Copy link
Contributor

I don't think we want to introduce a new flag. The point of the update_compatibility_version is so that we don't have to make a new option (that both we have to handle and our users have to know about) for every update incompatible change, all you need to know is what version you used to originally launch your pipeline.

@shunping
Copy link
Collaborator Author

shunping commented Dec 21, 2024

I don't think we want to introduce a new flag. The point of the update_compatibility_version is so that we don't have to make a new option (that both we have to handle and our users have to know about) for every update incompatible change, all you need to know is what version you used to originally launch your pipeline.

I am fine with using a flag like that to avoid adding more options, as I don't like too many options to remember too. However, I cannot deny that both the naming and where it is defined are a little bit confusing to me.

We are somehow overloading "update" to both streaming and batch in this context. For batch, cx may only want to "create" a pipeline with existed code that works as before. There is no "update" on the pipeline from their perspective, only an update of Beam version. :)

@robertwb
Copy link
Contributor

For a batch pipeline, setting this flag is a workaround, and they should fix their type hints. (We should make that clear in the docs.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants