-
-
Notifications
You must be signed in to change notification settings - Fork 7k
Description
I did this
As in #13403
In tensorstore we (currently) use a single multi-handle to multiplex a lot of http/2 transfers. Those transfers appear to be slower than expected.
The following benchmark, which downloads 1000 files over http/2 (limited to 32 concurrently), where each file is 64MB, demonstrates an issue with an excess of ::recvfrom calls:
At the lowest level of the stack, nw_in_read calls recv with a sequence of buffers of (typically) <5>, <1403> ... repeatedly.
There is special handling for "small" buffers in cf_socket_recv, however that is defined as < 1024 bytes, which relies on ctx->buffer_recv to be set, however I cannot see how it is ever TRUE.
Relevant stack trace & buffer lengths.
nw_in_read (curl/src/lib/cf-socket.c:863) len = 1403
cf_socket_recv (curl/src/lib/cf-socket.c:1370) len = 1403
Curl_cf_def_recv (curl/src/lib/cfilters.c:103)
Curl_conn_cf_recv (curl/src/lib/cfilters.c:322)
ossl_bio_cf_in_read (curl/src/lib/vtls/openssl.c:763)
BIO_read (openssl/boringssl/src/crypto/bio/bio.c:139) len = 1403
bssl::tls_read_buffer_extend_to(ssl_st*, unsigned long) (openssl/boringssl/src/ssl/ssl_buffer.cc:157) buf.size = 5, buf.capacity = 1408
bssl::ssl_read_buffer_extend_to(ssl_st*, unsigned long) (openssl/boringssl/src/ssl/ssl_buffer.cc:196)
bssl::ssl_handle_open_record(ssl_st*, bool*, bssl::ssl_open_record_t, unsigned long, unsigned char) (openssl/boringssl/src/ssl/ssl_buffer.cc:222)
ssl_read_impl(ssl_st*) (openssl/boringssl/src/ssl/ssl_lib.cc:1019) <no buffer size>
::SSL_peek(SSL *, void *, int) (openssl/boringssl/src/ssl/ssl_lib.cc:1053) bufsize = 15914
::SSL_read(SSL *, void *, int) (openssl/boringssl/src/ssl/ssl_lib.cc:1033)
ossl_recv (curl/src/lib/vtls/openssl.c:4670)
ssl_cf_recv (curl/src/lib/vtls/vtls.c:1725)
Curl_cf_def_recv (curl/src/lib/cfilters.c:103)
Curl_conn_cf_recv (curl/src/lib/cfilters.c:322)
nw_in_reader (curl/src/lib/http2.c:362) buflen = 16384
chunk_slurpn (curl/src/lib/bufq.c:109)
Curl_bufq_sipn (curl/src/lib/bufq.c:594)
bufq_slurpn (curl/src/lib/bufq.c:623)
Curl_bufq_slurp (curl/src/lib/bufq.c:655)
h2_progress_ingress (curl/src/lib/http2.c:1911)
cf_h2_recv (curl/src/lib/http2.c:1969)
Curl_cf_def_recv (curl/src/lib/cfilters.c:103)
Curl_conn_recv (curl/src/lib/cfilters.c:183)
Curl_read (curl/src/lib/sendf.c:813)
Curl_xfer_recv_resp (curl/src/lib/transfer.c:450)
readwrite_data (curl/src/lib/transfer.c:504)
Repro:
git clone https://github.com/google/tensorstore.git
./bazelisk.py build -c opt --copt=-g tensorstore/internal/benchmark:kvstore_benchmark
strace -e recvfrom -f \
./bazel-bin/tensorstore/internal/benchmark/kvstore_benchmark \
--kvstore_spec='{"driver": "gcs", "bucket":"<MY_GCS_BUCKET>", "path":"default"}' \
--chunk_size=67108864 \
--total_bytes=67108864000 \
--repeat_reads=1 \
--repeat_writes=0
Example strace output:
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "\313\265\v?\343\346=?\272&e\220\342V\322\207\35X6C\203\203V\250\346\346\301\2067W\204\252"..., 1403, 0, NULL, NULL) = 1403
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "O\1\210\301\307x(f\321\337\3469f\323\nAUtS\326\234\364|\244\\\275\275)K.\250H"..., 1403, 0, NULL, NULL) = 1403
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "\224\255\317Uj\274W\310J\344\32\264\243\254\224(v\262\261\214Y\253}`\374R\323\316nj\222u"..., 1403, 0, NULL, NULL) = 1403
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "\"\21\3606\3106Y5Z/\233\236\211\261\305~\v\17\272<\351\225Qx\303\177p\331\2035o,"..., 1403, 0, NULL, NULL) = 1403
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "\326\352\224bU5\337\23\342p`\0342\327\31\313?R\33rb\34M\237)U\204\302\f\215'P"..., 1403, 0, NULL, NULL) = 1403
[pid 3324193] recvfrom(46, "\27\3\3\5\10", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "\331\214\17\352C\23K\232\34\374\275\357\220\227\16\250\360\355e\342\22$j|\35\313>\274\237N\311\210"..., 1288, 0, NULL, NULL) = 1288
[pid 3324193] recvfrom(46, "\27\3\3\5{", 5, 0, NULL, NULL) = 5
[pid 3324193] recvfrom(46, "B\312\345 \314\7\302s\2346u>\262\r\226'\317\256\3515\311\26\2O\220fr\21\264-\345["..., 1403, 0, NULL, NULL) = 1403
Here we clearly see the TLS header then the TLS body reads as separate ::recv calls.
I took the histogram of recvfrom calls; here are all the calls with more than 500 occurences in the download.
Notice that about 1/2 the calls are for 5 bytes (TLS headers), while the others vary, but a large number are for
actual 8k blocks. The average recv call size is about 4090 bytes.
| count | buffer_size |
|---|---|
| 516 | 28 |
| 548 | 8203 |
| 552 | 8212 |
| 558 | 29 |
| 569 | 6980 |
| 664 | 8202 |
| 753 | 8201 |
| 988 | 8204 |
| 993 | 8206 |
| 1012 | 26 |
| 1140 | 34 |
| 1158 | 27 |
| 1653 | 8210 |
| 2476 | 8208 |
| 2496 | 1288 |
| 2750 | 8207 |
| 12715 | 1403 |
| 8183520 | 8218 |
| 8228072 | 5 |
I expected the following
Fewer calls to recvfrom; buffering at the lowest layer for fewer context switches.
curl/libcurl version
curl 8.7.1
operating system
5.10.0-26-cloud-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux