Thanks to visit codestin.com
Credit goes to github.com

Skip to content

refactor s3 storage to use context manager to avoid race condition #10387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 5, 2024

Conversation

bentsku
Copy link
Contributor

@bentsku bentsku commented Mar 4, 2024

Motivation

As reported in #10003 (also in apache/arrow-rs#5283), we would sometimes encounter an error when doing a lot of concurrent access (read and write) to S3 on the same object.
The bug is extremely hard to reproduce, the only way I could do it was by running the LocalStack image with the docker flag --cpus=0.5 to simulate a constrained environment, and run the rust test suite of object_store (which is really fast) forever until it would break (between 5% and 10% of the time I would say...).

When it would fail, we would have a cryptic message from the ASGI bridge. After reproducing the issue a few times, I could add a bunch of debug statement in rolo and find the culprit:
A GetObject request with Content-Length set to 1, with the key prefix RACE-, followed by a PutObject of length 2.
The first call get the object metadata with a still indicated content-length of 1, but then gets the full new data content 10 so it fails.

Basically a very small race condition between the state of the object and its value. I suspect the issue would most probably happen between the Get and Put call, when returning the S3EphemeralObject and pass it to the response handler. However, its iterator and __iter__ would not be called until the end of the chain, which would only generate the read lock then. In between, a Put call could still snatch the write lock and modify its value.

Changes

Now, after this small refactor, as soon as we create these S3StoredObject subclass, we acquire the lock (in read or write mode). The lock will stay acquired during the life of the object, which means we can control better when we can group action together (modifying the metadata of the object can now be done inside the WriteLock).

Also, it looks much nicer now: we can use the context manager around the S3StoredObject, by using it with the .open() call. The caller of open is always responsible for closing it, and almost every single usage is now done inside a context manager.

I've modified the EphemeralS3StoredMultipart logic to not store the full EphemeralS3StoredObject anymore to release the lock, and now properly create an object when we need a part. This looks cleaner.

Added some checks in place to not write on a non-writable object, to avoid creating race condition and not acquiring a write lock.

The only exception is for GetObject: in that case, when passing an iterator to the chain, the server is responsible for calling .close() on the iterator. (see #8926). This is actually the fix of the issue: we now properly generate the read lock in the provider call, and keep that locked acquired until the response is sent.

Also, special thanks to @tustvold, which have been very understanding and helpful with the reports and the really nice test suite.

Testing

I've been running the (sometimes) failing test since 40 minutes now and have encountered only one failure, so it is still not fully fixed, but I believe the window where the race condition can happen have been greatly reduced. I still can pinpoint when it happens, but with the current architecture, we have to get the object metadata before the actual object content, so in between these 2 actions the object can get updated. If anyone has an idea on how to fix this.

edit: see the comment but I got a 3% failure rate on 200 runs, so I've updated it with a workaround using the real object modification time to do a check inside the read lock, where we can fetch a possibly updated object. This seems to have done it, I did 200 runs without failure for now, so this is fixed, combining the new lock system with this little trick.

for i in {1..200}; do cargo test aws::tests::s3_test --features aws -- --exact; done

@bentsku bentsku added aws:s3 Amazon Simple Storage Service semver: patch Non-breaking changes which can be included in patch releases labels Mar 4, 2024
@bentsku bentsku requested a review from thrau March 4, 2024 22:58
@bentsku bentsku self-assigned this Mar 4, 2024
@bentsku bentsku requested a review from macnev2013 as a code owner March 4, 2024 22:58
Copy link

github-actions bot commented Mar 4, 2024

S3 Image Test Results (AMD64 / ARM64)

  2 files  ±0    2 suites  ±0   3m 15s ⏱️ +2s
392 tests +2  341 ✅ +2   51 💤 ±0  0 ❌ ±0 
784 runs  +4  682 ✅ +4  102 💤 ±0  0 ❌ ±0 

Results for commit 96dec55. ± Comparison against base commit 76292e1.

♻️ This comment has been updated with latest results.

@coveralls
Copy link

coveralls commented Mar 5, 2024

Coverage Status

coverage: 83.401% (-0.008%) from 83.409%
when pulling 96dec55 on refactor-s3-storage
into 76292e1 on master.

Copy link

github-actions bot commented Mar 5, 2024

LocalStack Community integration with Pro

    2 files  ±0      2 suites  ±0   1h 27m 52s ⏱️ + 2m 39s
2 690 tests +2  2 435 ✅ +2  255 💤 ±0  0 ❌ ±0 
2 692 runs  +2  2 435 ✅ +2  257 💤 ±0  0 ❌ ±0 

Results for commit 96dec55. ± Comparison against base commit 76292e1.

♻️ This comment has been updated with latest results.

@bentsku
Copy link
Contributor Author

bentsku commented Mar 5, 2024

So I've done a try with 200 test runs (takes more than an hour with 0.5 CPU 😭), and it failed 6 times now, around 3% failure rate, I'm not really happy about that. I'm going to push another commit with a workaround using real file modification nano time to compare with the object. I'm not fond of it and hope we could find another solution, but this would unblock the situation.

I could also cherry pick this commit and open another PR for it, maybe that would be cleaner.
96dec55 fixes the race condition, I did 200 runs with no failure.

I'll also run benchmarks to compare the difference between master and this PR, but I would suspect the difference is minimal.

@bentsku bentsku changed the title refactor s3 storage to use context manager refactor s3 storage to use context manager to avoid race condition Mar 5, 2024
Copy link
Member

@thrau thrau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incredible investigation and great set of changes. 🥇 really love the approach to reproduce the issue by creating certain favorable conditions and then just working with percentages 👍

changes LGTM!

Comment on lines +749 to +751
# TODO: remove this with 3.3, this is for persistence reason
if not hasattr(s3_object, "internal_last_modified"):
s3_object.internal_last_modified = s3_stored_object.last_modified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for backwards compatibility

@bentsku bentsku merged commit 0224bbb into master Mar 5, 2024
@bentsku bentsku deleted the refactor-s3-storage branch March 5, 2024 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:s3 Amazon Simple Storage Service semver: patch Non-breaking changes which can be included in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants