Thanks to visit codestin.com
Credit goes to github.com

Skip to content

introduce new S3 native provider #8786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 8, 2023
Merged

introduce new S3 native provider #8786

merged 6 commits into from
Aug 8, 2023

Conversation

bentsku
Copy link
Contributor

@bentsku bentsku commented Jul 31, 2023

First PR to implement the new S3 native provider not depending on moto. This implements all the basic abstractions, as well as the basic operations around buckets and objects.

There are a lot of comments left, as it's still WIP, and this introduces only basic behaviour. We will need to implement all the bucket operations to be able to get the complete behaviour from basic operations.

List of implemented operations:

  • CreateBucket
  • DeleteBucket
  • ListBuckets
  • HeadBucket
  • GetBucketLocation
  • PutObject
  • GetObject
  • HeadObject
  • DeleteObject
  • DeleteObjects
  • CopyObject
  • ListObjects
  • ListObjectsV2
  • GetObjectAttributes
  • RestoreObject

Abstractions

This PR implements new abstractions around S3. We are splitting between "data", aka the buckets, objects and metadata around them, and the actual file object containing S3 Objects real content. This will allow us to properly split how we save these for persistence in the next iteration.

We're introducing a new S3 BaseStorageBackend and TemporaryStorageBackend, which abstract away how those S3 file objects, which we will name "assets", are stored and retrieved. This will in turn allow us to either have a temporary filesystem, or the real filesystem in case of persistence, and give us greater flexibility.

The last core Operations and Models outside of Object and Bucket configuration are the multipart, coming in #8787

There a lot of bugs regarding the versioning code path, because it couldn't be tested before I could enable versioning on the bucket, which is now done and is fixed with #8799.

@bentsku bentsku added aws:s3 Amazon Simple Storage Service semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases labels Jul 31, 2023
@bentsku bentsku self-assigned this Jul 31, 2023
@coveralls
Copy link

coveralls commented Aug 1, 2023

Coverage Status

coverage: 81.675% (-1.0%) from 82.722% when pulling 5e26d7d on s3-native-model-basic-op into 220c458 on master.

@github-actions
Copy link

github-actions bot commented Aug 1, 2023

LocalStack Community integration with Pro

       2 files         2 suites   1h 33m 59s ⏱️
2 035 tests 1 650 ✔️ 385 💤 0
2 036 runs  1 650 ✔️ 386 💤 0

Results for commit c9bce20.

♻️ This comment has been updated with latest results.

@bentsku bentsku changed the title wip: introduce new S3 native provider introduce new S3 native provider Aug 1, 2023
@bentsku bentsku marked this pull request as ready for review August 1, 2023 14:19
Copy link
Member

@thrau thrau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already looks amazing @bentsku , well done! The separation into the StorageBackend is really good and the extra flexibility we gain will help a lot moving forward with issues around storage, performance, and persistence!

It's a big PR, but it already looks very good! Obviously lots of TODOs and blanks still in there, but that's fine at this stage. I'm also not going to nitpick much in the code. I don't see a lot of things that could really be improved, and I think it would be good to get a first version out ASAP.

Just some high level thoughts we could discuss:
The KeyStore / StorageBackend duality is interesting to me, and I'm noticing that the API needs to keep the two in sync, which could potentially be a source of bugs when not done correctly. I don't know whether there's any value in trying to combine the two concepts. In general, the separation of Metadata and Data management makes a lot of sense to me though.
It could make sense to add more higher-level constructs into the storage backend to push more functionality that the API currently does into the storage provider, basically hiding file objects all together.

@bentsku bentsku force-pushed the s3-native-model-basic-op branch 2 times, most recently from 425c5ab to 5e26d7d Compare August 4, 2023 19:41
Copy link
Member

@thrau thrau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iteration has made the code so much cleaner, thanks for bearing with me @bentsku and considering the changes!

The additional S3ObjectStore layer has made the Provider implementation much easier to understand, and isolation is much better 👌

My comments are mostly around adding more doc strings, and some minor tweaks. Some of the things can be tackled as follow-up PRs.

Really excellent work! 💯 🚀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: is there any reason they are not generated in the API yet? i suppose historically they weren't added through spec patches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the idea was that if the exception needed something specific, like extra fields or specific default status code, we would generate it. Otherwise, we would follow the usual path of subclassing CommonServiceException, like many other services. I just pulled them out of the previous provider so that I can use in both of them, but in the future once we only have one again, we could pulled them again inside the provider file? Is that a good idea?

if isinstance(found_object, S3Object):
self._storage_backend.remove(bucket, found_object)

# TODO: request charged
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment still about batching delete?

Create a temporary directory representing a bucket
:param bucket_name
"""
tmp_dir = mkdtemp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try and restructure the class so we can inject a root_directory path as dependency into the constructor, that is then used for all filesystem operations. this will help massively with persistence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean in the base class? Because this one is not really supposed to be used with persistence? but for testing, sure, it makes a lot of sense

@bentsku bentsku force-pushed the s3-native-model-basic-op branch from 5e26d7d to 8098b88 Compare August 7, 2023 23:43
@bentsku bentsku force-pushed the s3-native-model-basic-op branch from 8098b88 to c9bce20 Compare August 8, 2023 00:08
@bentsku bentsku merged commit f1ac9eb into master Aug 8, 2023
@bentsku bentsku deleted the s3-native-model-basic-op branch August 8, 2023 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:s3 Amazon Simple Storage Service semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants