Thanks to visit codestin.com
Credit goes to github.com

Skip to content

HTTP/2 + TLS spends a lot of time in recv (buffering issue) #13416

@laramiel

Description

@laramiel

I did this

As in #13403

In tensorstore we (currently) use a single multi-handle to multiplex a lot of http/2 transfers. Those transfers appear to be slower than expected.

The following benchmark, which downloads 1000 files over http/2 (limited to 32 concurrently), where each file is 64MB, demonstrates an issue with an excess of ::recvfrom calls:

At the lowest level of the stack, nw_in_read calls recv with a sequence of buffers of (typically) <5>, <1403> ... repeatedly.
There is special handling for "small" buffers in cf_socket_recv, however that is defined as < 1024 bytes, which relies on ctx->buffer_recv to be set, however I cannot see how it is ever TRUE.

Relevant stack trace & buffer lengths.

nw_in_read (curl/src/lib/cf-socket.c:863)   len = 1403
cf_socket_recv (curl/src/lib/cf-socket.c:1370)   len = 1403
Curl_cf_def_recv (curl/src/lib/cfilters.c:103)
Curl_conn_cf_recv (curl/src/lib/cfilters.c:322)
ossl_bio_cf_in_read (curl/src/lib/vtls/openssl.c:763)

BIO_read (openssl/boringssl/src/crypto/bio/bio.c:139)   len = 1403
bssl::tls_read_buffer_extend_to(ssl_st*, unsigned long) (openssl/boringssl/src/ssl/ssl_buffer.cc:157)   buf.size = 5, buf.capacity = 1408
bssl::ssl_read_buffer_extend_to(ssl_st*, unsigned long) (openssl/boringssl/src/ssl/ssl_buffer.cc:196)
bssl::ssl_handle_open_record(ssl_st*, bool*, bssl::ssl_open_record_t, unsigned long, unsigned char) (openssl/boringssl/src/ssl/ssl_buffer.cc:222)
ssl_read_impl(ssl_st*) (openssl/boringssl/src/ssl/ssl_lib.cc:1019)    <no buffer size>
::SSL_peek(SSL *, void *, int) (openssl/boringssl/src/ssl/ssl_lib.cc:1053)  bufsize = 15914
::SSL_read(SSL *, void *, int) (openssl/boringssl/src/ssl/ssl_lib.cc:1033) 

ossl_recv (curl/src/lib/vtls/openssl.c:4670)   
ssl_cf_recv (curl/src/lib/vtls/vtls.c:1725)
Curl_cf_def_recv (curl/src/lib/cfilters.c:103)
Curl_conn_cf_recv (curl/src/lib/cfilters.c:322)
nw_in_reader (curl/src/lib/http2.c:362)     buflen = 16384
chunk_slurpn (curl/src/lib/bufq.c:109)
Curl_bufq_sipn (curl/src/lib/bufq.c:594)
bufq_slurpn (curl/src/lib/bufq.c:623)
Curl_bufq_slurp (curl/src/lib/bufq.c:655)
h2_progress_ingress (curl/src/lib/http2.c:1911)
cf_h2_recv (curl/src/lib/http2.c:1969)
Curl_cf_def_recv (curl/src/lib/cfilters.c:103)
Curl_conn_recv (curl/src/lib/cfilters.c:183)
Curl_read (curl/src/lib/sendf.c:813)
Curl_xfer_recv_resp (curl/src/lib/transfer.c:450)
readwrite_data (curl/src/lib/transfer.c:504)

Repro:

git clone https://github.com/google/tensorstore.git

./bazelisk.py build -c opt --copt=-g tensorstore/internal/benchmark:kvstore_benchmark

strace -e recvfrom -f \
./bazel-bin/tensorstore/internal/benchmark/kvstore_benchmark \
   --kvstore_spec='{"driver": "gcs", "bucket":"<MY_GCS_BUCKET>", "path":"default"}' \
   --chunk_size=67108864 \
   --total_bytes=67108864000 \
   --repeat_reads=1 \
   --repeat_writes=0
   

Example strace output:

[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "\313\265\v?\343\346=?\272&e\220\342V\322\207\35X6C\203\203V\250\346\346\301\2067W\204\252"..., 1403, 0, NULL, NULL) = 1403
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "O\1\210\301\307x(f\321\337\3469f\323\nAUtS\326\234\364|\244\\\275\275)K.\250H"..., 1403, 0, NULL, NULL) = 1403
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "\224\255\317Uj\274W\310J\344\32\264\243\254\224(v\262\261\214Y\253}`\374R\323\316nj\222u"..., 1403, 0, NULL, NULL) = 1403
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "\"\21\3606\3106Y5Z/\233\236\211\261\305~\v\17\272<\351\225Qx\303\177p\331\2035o,"..., 1403, 0, NULL, NULL) = 1403
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "\326\352\224bU5\337\23\342p`\0342\327\31\313?R\33rb\34M\237)U\204\302\f\215'P"..., 1403, 0, NULL, NULL) = 1403
[pid 3324193] recvfrom(46, "\27\3\3\5\10", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "\331\214\17\352C\23K\232\34\374\275\357\220\227\16\250\360\355e\342\22$j|\35\313>\274\237N\311\210"..., 1288, 0, NULL, NULL) = 1288
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "B\312\345 \314\7\302s\2346u>\262\r\226'\317\256\3515\311\26\2O\220fr\21\264-\345["..., 1403, 0, NULL, NULL) = 1403

Here we clearly see the TLS header then the TLS body reads as separate ::recv calls.

I took the histogram of recvfrom calls; here are all the calls with more than 500 occurences in the download.
Notice that about 1/2 the calls are for 5 bytes (TLS headers), while the others vary, but a large number are for
actual 8k blocks. The average recv call size is about 4090 bytes.

count buffer_size
516 28
548 8203
552 8212
558 29
569 6980
664 8202
753 8201
988 8204
993 8206
1012 26
1140 34
1158 27
1653 8210
2476 8208
2496 1288
2750 8207
12715 1403
8183520 8218
8228072 5

I expected the following

Fewer calls to recvfrom; buffering at the lowest layer for fewer context switches.

curl/libcurl version

curl 8.7.1

operating system

5.10.0-26-cloud-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions