Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented Apr 16, 2025

About the change

Solves : #12792 (comment)
This is happening in IRC client we retry on 5xx (this is a http client level retry, no reconciliation, just retrying same request without rebasing) please ref, now this a very know issue that the service despite giving 5xx applies the commit in its persistence (check this with glue non irc issue here) , but client still see its as 5xx, and retries, so when this is retried via http client retry and the persistence has already applied the commit it will now give 409 hence make table delete the metadata of the commit since we get back CommitFailed exception.

The fix here proposes to throw CommitStateUnknown so that client can reconcile the state and at the bare minumum doesn't treat this as failed and clean up the metadata files

Testing

Added new UT

@github-actions github-actions bot added the core label Apr 16, 2025
@singhpk234 singhpk234 force-pushed the fix/IOException branch 2 times, most recently from 29665a1 to 8665dc1 Compare April 16, 2025 19:08
@singhpk234 singhpk234 marked this pull request as ready for review May 24, 2025 03:54
@singhpk234 singhpk234 force-pushed the fix/IOException branch 2 times, most recently from 96c35f0 to 24dfd82 Compare May 24, 2025 17:21
@singhpk234 singhpk234 changed the title REST: Avoid table corruption on unhandled RestException REST: Avoid table corruption on self conflicts due to internal HTTP retries May 25, 2025
Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, made offline comment about "was-retried" or "is-retry" vs "is-retried" but that's a nit. I also just think "retry" - true/false is ok

One minor request, please add a test where we check that CSUnknown is actually thrown if possible.

@singhpk234
Copy link
Contributor Author

Thank you for your feedbacks Russell, I addressed it in the latest commit.

@RussellSpitzer
Copy link
Member

@amogh-jahagirdar Do you have any other feedback?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the fix looks great @singhpk234 just had a minor comment on a public builder API, that'd be good to address before we release anything

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a nit on the comment, but change looks good. Thank you @singhpk234 , this is an important fix!

Comment on lines 95 to 101
// If the request was retried, and if its final error was 409, It could probably also
// mean that HTTP retries when happened the IRC service could have actually applied
// the commit in their persistence, while giving back the client still a 5xx.
// If so, since the base has changed, it could conflict with itself.
// In cases like this its best not to mark this as failed instead
// make this is a commit state unknown, so some reconciliation can happen and
// the metadata clean-up is not triggered.
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit on comment, maybe something like this?

/**
 * If a retried request finally fails with 409, 
 * the IRC service may have persisted the commit
 * despite initial 5xx errors, resulting in a self-conflict on retry
 *  due to the base changing.
 * Mark this failure as commit state unknown rather than failed to prevent file cleanup.
 */

@amogh-jahagirdar amogh-jahagirdar changed the title REST: Avoid table corruption on self conflicts due to internal HTTP retries Core: Avoid table corruption on self conflicts due to internal HTTP retries for REST clients May 29, 2025
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Avoid table corruption on self conflicts due to internal HTTP retries for REST clients Core: Avoid table corruption from 409 on self conflicts after 5xx retries by throwing CommitStateUnknown May 29, 2025
@amogh-jahagirdar amogh-jahagirdar merged commit 4927610 into apache:main May 29, 2025
42 checks passed
singhpk234 added a commit to singhpk234/iceberg that referenced this pull request Jun 13, 2025
singhpk234 pushed a commit to singhpk234/iceberg that referenced this pull request Jun 19, 2025
… 5xx retries by throwing CommitStateUnknown (apache#12818)"

This reverts commit 4927610.
singhpk234 added a commit to singhpk234/iceberg that referenced this pull request Jul 3, 2025
singhpk234 added a commit to singhpk234/iceberg that referenced this pull request Jul 3, 2025
singhpk234 added a commit to singhpk234/iceberg that referenced this pull request Jul 3, 2025
RussellSpitzer pushed a commit that referenced this pull request Jul 3, 2025
dcagney pushed a commit to Affirm/iceberg that referenced this pull request Jul 7, 2025
dcagney pushed a commit to Affirm/iceberg that referenced this pull request Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants