-
Notifications
You must be signed in to change notification settings - Fork 868
[http1] streaming request bodies #2007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces the infrastructure needed to stream request bodies for http1, we have the two readers, chunked and content-length use the h2o_req_t::write_req.cb to emit body chunks.
…reaming-request-bodies
kazuho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR. I haven't gone through the code, but I noticed following points that you might want to look into.
Regarding the test failure, I am not sure what's happening. IIUC the test that fails now does not involve a request body, and therefore should continue to work fine, assuming that the behavior of the http1 handler has not changed for requests without bodies. Am I missing something here?
|
Thank you @i110 and @kazuho I didn't realize the error failure was coming from the non-streaming case. Addressed. I've also addressed the issue pointed out by @kazuho above. I believe this can be cleaned up by merging some of the streaming infrastructure used by http1 and http2. I'll take a stab at that. |
code under lib/core/request.c
kazuho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR.
The high level design looks fine, and I like how you've refactored the code so that the some of the logic is shared between the http1 stack and http2 stack. I think we'd might want to make further tweaks (including @i110's point that the selection between streaming and non-streaming mode does not need to be delayed), but I am fine with doing it in a separate PR if that's preferable for you.
Regarding the changes to HTTP/1 stack, I think I have found some corner cases (see below). They make me wonder what we should do with the lack of tests covering the error cases and pipelining. Would you be interested in writing them, or do you want somebody more familiar to perl to work on the issue (thinking of @i110 or myself)?
lib/http1.c
Outdated
| /* all input has arrived */ | ||
| conn->req.entity = h2o_iovec_init(conn->sock->input->bytes + conn->_reqsize - reader->content_length, reader->content_length); | ||
| on_entity_read_complete(conn); | ||
| handle_one_body_fragment(conn, conn->sock->input->size, complete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this process excessive amount of input as request body if the last input consists of the end of the request body and the first few bytes of the next request (i.e. pipelining)?
Please correct me if I'm incorrect, and if we have a test covering such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, i've added a test in fb8e95b that shows the previous code was broken. This is fixed now.
lib/http1.c
Outdated
| return; | ||
| } | ||
|
|
||
| if (conn->req.proceed_req) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd appreciate it if you could add != NULL. We apply operators so that the result would be a boolean, unless the input (being an integral value) is considered a boolean (due to lack of a built-in boolean type in C).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adressed in c3bcd0d thank you.
lib/http1.c
Outdated
| if (conn->req.proceed_req) { | ||
| conn->_req_entity_reader = NULL; | ||
| set_timeout(conn, 0, NULL); | ||
| h2o_socket_read_stop(conn->sock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the right moment. Don't we need to disarm the timeout and stop reading additional bytes every time when we pass something to the handler? Otherwise, the amount of data we buffer cannot be capped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in e4d894b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e4d894b looks good to me, but would you mind elaborating why we need to stop reading (or consider the case of conn->req.proceed_req being non-NULL) in this function, after consulting the value of conn->req.http1_is_persistent?
It is my understanding that cleanup_connection is called only when the HTTP/1 stack successfully processes a request and sending a response. Assuming that is still the case, I think proceed_req and _req_entity_reader must have been set to NULL, and reading from the socket should have been stopped. Maybe what we need are assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added assertions, thanks for the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added assertions, thanks for the suggestion.
And the assertions found that 603c731 is necessary because h2o_send_error can complete the request without closing the connection.
…-encoding:chunked While the behavior is technically correct, dropping content-length: loses information that might be valuable to other peers down the path. We fix this by making sure that `req.content_length` is correctly assigned for http/1 when available. It was already correctly initialized in the http/2 case.
connection to the origin is established, effecitively disabling streaming
becf577 to
fbec251
Compare
fbec251 to
fa4fa74
Compare
kazuho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes. Looks mostly fine to me, some nitpicks below. PTAL.
include/h2o.h
Outdated
|
|
||
| typedef void (*h2o_proceed_req_cb)(h2o_req_t *req, size_t written, int is_end_stream); | ||
| typedef int (*h2o_write_req_cb)(void *ctx, h2o_iovec_t chunk, int is_end_stream); | ||
| typedef void (*h2o_on_body_streaming_selected_cb)(h2o_req_t *, int streaming); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest changing the typename and the prototype argument to something like:
typedef void (*h2o_on_request_streaming_selected_cb)(h2o_req_t *req, int is_streaming);
- By changing "body_streaming" to "request_streaming", we avoid the confusion that this is related to response streaming. We can omit "body", because body is the only thing we stream.
- Adding "is_" prefix better aligns the callback type with other functions that also accept boolean arguments (see right above).
Note also the the support of the feature is indicated by a property named supports_request_streaming in h2o_handler_t.
include/h2o.h
Outdated
| struct { | ||
| h2o_write_req_cb cb; | ||
| void *ctx; | ||
| h2o_on_body_streaming_selected_cb on_body_streaming_selected; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the attribute can be write_req.on_streaming_selected instead of write_req.on_body_streaming_selected if we follow the rules stated above.
include/h2o.h
Outdated
| struct { | ||
| size_t bytes_received; | ||
| h2o_buffer_t *body; | ||
| } _body; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/_body/_req_body/
lib/http1.c
Outdated
| entity_read_send_error_502(conn, "Bad Gateway", "Bad Gateway"); | ||
| return; | ||
| } | ||
| h2o_buffer_consume(&conn->sock->input, consume); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can omit the third argument of the function and do h2o_buffer_consume(&conn->sock->input, fragment_size) here.
The only case where fragment_size and consume are different in the current code is when chunked encoding is used. I'd argue that for such a case, the caller can invoke h2o_buffer_consume at first to consume the chunk header, then invoke handle_one_body_fragment to just process the payload of the chunk.
h2o_buffer_consume is a fast function, and processing of a chunked request is not a cold path (because only clients that know that the server supports HTTP/1.1 would use it).
lib/http1.c
Outdated
| set_timeout(conn, 0, NULL); | ||
| h2o_socket_read_stop(conn->sock); | ||
| process_request(conn); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should inline-expand the body of this function in handle_one_body_fragment (that's the only call-site), preferably before we call write_req.
lib/http1.c
Outdated
| return 0; | ||
| } | ||
|
|
||
| static void on_body_streaming_selected(h2o_req_t *req, int streaming) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/streaming/is_streaming/
lib/http2/connection.c
Outdated
| } | ||
|
|
||
| static int write_req_first(void *_req, h2o_iovec_t payload, int is_end_stream) | ||
| static void on_body_streaming_selected(h2o_req_t *req, int streaming) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/on_body_streaming_selected/on_request_streaming_selected/
s/int streaming/int is_streaming/
lib/http1.c
Outdated
| uint64_t _req_index; | ||
| size_t _prevreqlen; | ||
| size_t _reqsize; | ||
| size_t _headers_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would appreciate it if you could either change the name of the variable to something more appropriate, or add a comment explaining how the value is used.
This is because the value does not always represent the size of the header fields with the proposed change (when the request has a body, it becomes zero). It is my understanding that the amount of data left un-consumed in the socket read buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone with _unconsumed_request_size
lib/http1.c
Outdated
| if (conn->req.proceed_req) { | ||
| conn->_req_entity_reader = NULL; | ||
| set_timeout(conn, 0, NULL); | ||
| h2o_socket_read_stop(conn->sock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e4d894b looks good to me, but would you mind elaborating why we need to stop reading (or consider the case of conn->req.proceed_req being non-NULL) in this function, after consulting the value of conn->req.http1_is_persistent?
It is my understanding that cleanup_connection is called only when the HTTP/1 stack successfully processes a request and sending a response. Assuming that is still the case, I think proceed_req and _req_entity_reader must have been set to NULL, and reading from the socket should have been stopped. Maybe what we need are assertions.
- s/_body/_req_body/ - s/on_body_streaming_selected/on_streaming_selected/ - s/h2o_on_body_streaming_selected_cb/h2o_on_request_streaming_selected_cb/
…reaming_selected/
going through `cleanup_connection` and `http1_is_persistent` is false
603c731 to
765027f
Compare
IIUC, the proposed change is to suggest closing the connection from proxy.c whenever it is impossible to connect to the origin. I am not sure if that is the correct thing to do. At the moment, we close a H1 connection only when the framing becomes corrupt (i.e. when there would be a risk of splitting attack unless we close the connection). In case of an origin sending 502, the framing is not necessarily corrupt. It is beneficial to keep the connection open especially when only some of the requests are routed to an origin, while other requests being served by H2O itself. I think we might want to consider the case you are trying to fix as part of #2010. The proxy handler returning 502 due to not being able to connect to an origin while the request from the client being inflight is a particular form of returning an early response. I am not sure of how we should handle the error, but my instinct is that it's something not unique to the proxy handler, but rather something general to all the handlers. |
|
@deweerdt Regarding my previous comment, I asked @i110 on how he deals with the issue in H2, and his answer was than in #2010 the stream is closed by the Line 657 in 24d2036
I think this approach might be something we should adopt in H1 as well (or change to a different approach in both H1 and H2 code). WDYT? |
That's a good question. Yeah, let's tackle the issues independently. I think we can land the code here (that uses H2O_SEND_ERROR_HTTP1_CLOSE_CONNECTION) as-is, and fix it later in #2010 or in a follow-up PR of #2010. Using H2O_SEND_ERROR_HTTP1_CLOSE_CONNECTION has efficiency issues, but works perfect as a short-term solution. |
That works. There's code more or less ready anyway, since i've tested an earlier version of this PR+2010 to help review #2010. |
|
Thank you for working on this complex PR. Merged to master. |
This commit introduces the infrastructure needed to stream request bodies
for http1, we have the two readers, chunked and content-length use the
h2o_req_t::write_req.cb to emit body chunks.