Thanks to visit codestin.com
Credit goes to github.com

Skip to content

re-introduce S3 virtual host rewriter for v3 #9450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 24, 2023
Merged

Conversation

bentsku
Copy link
Contributor

@bentsku bentsku commented Oct 23, 2023

Motivation

We removed the VirtualHostRewriter from the parser with the introduction of S3 v2, because it is not optimal and breaking the Request contract: it mutates it during the parsing.

However, we've had several issues with the proxying of requests (which should be now solved), and it's a problem performance wise. I've ran several benchmarks with and without this PR to show the impact of the proxying.

I agree that mutating the request is not the right thing to do, but this is temporary before we address the shortcomings we have. There are different solutions to the Virtual Hosted requests problem.

  • we could have a better proxying mechanism that would not need to restore the payload and could copy the request except its stream, by not using EnvironBuilder and rolling our own mechanism. But if we have that, we could simply copy the request and mutate the copy before parsing, avoiding a network roundtrip.
  • some kind of in memory passing of the mutated request from the router to the handler chain, but seems like very complicated
  • having 2 parsers for S3, one before the specs are mutated (see fix internal botocore API usage #7086 (comment)) which would be for path style requests, and one after the specs are mutated (stripped of the bucket in the path), with some kind of mechanism to get the bucket parameter from the host

Right now, the simplest seems to be the request copying then mutating mechanism. In the meantime, I believe re-introducing the VirtualHostRewriter is needed because of the 2x performance drop.

Benchmarks

12b and 100kb were tested for 1000 requests, 10mb for 100.
This is a small benchmark python against LocalStack in host mode.
Note: the GET time for the large object seems unusually fast, I believe the previous benchmarks I had done with warp were maybe reading the stream back.

/ 12b PUT 12b GET 100kb PUT 100kb GET 10mb PUT 10mb GET
Throughput (req/s)
Proxy 109.30 112.76 93.71 106.79 5.59 101.22
Rewriter 225.89 225.72 185.18 184.16 9.76 191.35
Request duration (ms)
Proxy 9.1492 8.8684 10.6714 9.3643 178.9484 9.8794
Rewriter 4.4270 4.4303 5.4002 5.4300 102.4324 5.2259

Additional notes

I also believe that mutating the request this way also was an issue when the request would then be sent to moto: the parsed request would not be the one moto had received. But we don't suffer from this issue anymore, because we never manipulate the request directly, but the ServiceRequest in v3, which is a big advantage.

Notes on the Virtual Host regex

I believe we can greatly simplify the very greedy virtual host regex, because we've been using the regex from the router since v2 to "capture" virtual hosted requests, and we didn't have an issue since. Also, the current regex only allows domains that are localhost.localstack.cloud and amazonaws.com. I've ran a quick test with several entries (but not multiple times because the entries are cached) to show the difference between the 2 regex matching on 19 entries both virtual host and not:

  • old regex: 0.0522ms / 19 entries
  • new regex: 0.0228ms / 19 entries

Changes

Re-introduce the VirtualHostRewriter in the parser for the v3 provider.

Simplified the virtual host regex, and also made use of named group, and we can now extract the region from it, which might become handy at some point in a handler. \cc @viren-nadkarni
Added the new test for the new regex.

@bentsku bentsku added aws:s3 Amazon Simple Storage Service semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases labels Oct 23, 2023
@bentsku bentsku self-assigned this Oct 23, 2023
@bentsku bentsku requested a review from thrau October 23, 2023 22:26
@github-actions
Copy link

github-actions bot commented Oct 23, 2023

LocalStack Community integration with Pro

       2 files         2 suites   1h 6m 36s ⏱️
2 268 tests 1 691 ✔️ 577 💤 0
2 269 runs  1 691 ✔️ 578 💤 0

Results for commit 04dc2b2.

♻️ This comment has been updated with latest results.

Copy link
Member

@thrau thrau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment about the approach: I think this is a good solution for now. I don't think we need to be too dogmatic about principles like immutability when it comes to S3, we're just making our life difficult, and if it works and doesn't break anything in any way, why not.

@bentsku bentsku force-pushed the s3-virtual-host-v3 branch from 8fd49d0 to 119c716 Compare October 24, 2023 12:02
@coveralls
Copy link

coveralls commented Oct 24, 2023

Coverage Status

coverage: 82.881% (+0.06%) from 82.824% when pulling 04dc2b2 on s3-virtual-host-v3 into e78814f on master.

@bentsku bentsku marked this pull request as ready for review October 24, 2023 13:28
Copy link
Member

@alexrashed alexrashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restoring the virutal host rewriting, instead of using the proxy infrastructure, seems fine considering the immense performance impact.
In the future, we might be able to move away from it again (in case we can do an in-memory proxy forwarding without the HTTP overhead), but for now this is a nice solution.

I am just not sure if this wouldn't break the legacy provider (even though it's going to be removed soon), since it's also using the virtual host rewriter, and the regex seems to be simplified quite a lot (to a degree where it catches way less, for example it seems to neglect protocols in host headers)?

@bentsku
Copy link
Contributor Author

bentsku commented Oct 24, 2023

I've updated the code so that the functionality for the legacy provider should be untouched, which allows us to release this as non-breaking before v3.

I've ran a quick test to see if the legacy provider was still working well:

# started LocalStack with `PROVIDER_OVERRIDE_S3=legacy`
$ awslocal s3 mb s3://test
make_bucket: test
$ curl http://test.s3.localhost.localstack.cloud:4566
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>test</Name><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Marker></Marker></ListBucketResult>%    

Also sneaked a little fix to the legacy provider, because I couldn't run a test because it didn't accept the s3.localhost.localstack.cloud:4566 as an endpoint when not using virtual host style. I believe the legacy provider might already be broken in some way because of the werkzeug upgrades, but we didn't get any report about that.

@bentsku bentsku requested a review from alexrashed October 24, 2023 15:00
Copy link
Member

@alexrashed alexrashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for addressing the comments! The changes are now looking good, and the legacy implementation is clearly separated from the new implementation (which will make it super easy to remove with 3.0). 🚀

@bentsku bentsku merged commit c24bf15 into master Oct 24, 2023
@bentsku bentsku deleted the s3-virtual-host-v3 branch October 24, 2023 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:s3 Amazon Simple Storage Service semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants