-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Encode paneinfo with PaneInfoCoder. #34824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a51cee8
to
62b2c44
Compare
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Assigning reviewers. If you would like to opt out of this review, comment R: @liferoad for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, what's wrong with not having a PaneInfoCoder. I see the issue #34826, but does FastPrimitiveCoder still work?
FastPrimitiveCoder ends up falling back to PickleCoder and much bigger encodings. PaneInfoCoder encodes it using 1 byte. |
I don't get why we need to expose beam/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto Line 938 in c9a0e90
I may be missing some important piece of background in #34348, but regarding
can we do the following? return key, window.GlobalWindows.windowed_value(value, timestamp, pane_info) In this case, the coder of this output will be |
That logic specifically avoids the window param for performance reasons #4933. See the comments
|
Hmmmm, my understanding of that comment is different. I think the optimization in #4933 is that for global windows we don't need to put WindowParam in the DoFn. Anyway, just want to call out here. def reify_timestamps(element, timestamp=DoFn.TimestampParam):
... |
@@ -92,6 +93,7 @@ def register_standard_coders(self, fallback_coder): | |||
self._register_coder_internal(bytes, coders.BytesCoder) | |||
self._register_coder_internal(bool, coders.BooleanCoder) | |||
self._register_coder_internal(str, coders.StrUtf8Coder) | |||
self._register_coder_internal(windowed_value.PaneInfo, coders.PaneInfoCoder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix has a large blast radius. It changed registered standard coders which is default behavior globally. Shall we do this or just put a coder implementation in new ReShuffle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to define PaneInfoCoder here
and then useCoderRegistry.register_coder
?
I don't have strong opinion, is there a disadvantage to doing it here? It seems like an 'internal' coder use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kennknowles What do you think about this? We should use PaneInfoCoder
but it could break the job update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a disadvantage to doing it here
This introduces a breaking change such that windowed_value.PaneInfo has changed default coder. This is not limited to ReShuffle but also user pipeline.
Since test results shows PaneInfoCoder is much more efficient than FastPrimitiveCoder for PaneInfo, I am fine with the change, just need to know this is another breaking change introduced in 2.65.0
If we keep this change I recommend to also mention this in https://github.com/apache/beam/blob/master/CHANGES.md#breaking-changes-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should update https://github.com/apache/beam/blob/master/CHANGES.md#breaking-changes-1 at least.
CC @robertwb (the author of $4933) regarding the WindowParam comment here. |
* Encode paneinfo with PaneInfoCoder. * Fix tests. * Fix import. * Add typecoder test. * Implement eq and hash. --------- Co-authored-by: Claude <[email protected]>
* Encode paneinfo with PaneInfoCoder. * Fix tests. * Fix import. * Add typecoder test. * Implement eq and hash. --------- Co-authored-by: Claude <[email protected]>
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.