-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
NB these are the same symptoms, and a similar test setup, as we saw in #21189 but this reproduces with newer releases including RELEASE.2025-07-23T15-54-02Z in which #21189 is fixed.
Expected Behavior
If a multipart upload is initiated and then successfully aborted, it should not be returned in subsequent listings of the active uploads. Moreover, any data related to the aborted upload should be removed from disk (or at least moved to .trash).
Current Behavior
In a case where we initiate a multipart upload, then upload a part while concurrently calling both the AbortMultipartUpload and ListMultipartUploads APIs, we extremely rarely receive a successful 204 No Content response to the AbortMultipartUpload request but without the upload being cleaned up properly. In particular, it continues to be returned by subsequent calls to the ListMultipartUploads API, and appears to leave the uploaded data on disk forever.
Although they should not be necessary, it seems that subsequent attempts to remove the upload by calling AbortMultipartUpload again do not resolve the situation.
Possible Solution
Unknown, sorry.
Steps to Reproduce (for bugs)
Similar to #21189, it seems to relate to attempts to abort a MPU while uploading a part. I cannot reliably reproduce the problem, but I can share a packet capture showing the issue at the API level, a listing of the MinIO data path at the end of the run, and also a strace capture showing how MinIO is interacting with the filesystem at the time of the problem:
run-2025-07-24T09:36:16Z.tar.gz
In this run, the problematic upload has ID YjM1ZThkMmYtZjZmNC00ZjQzLTllYjYtNjRhYzA5MTZmNzU4LjZkM2IxZmJmLTQ1Y2MtNGU3Ny1iOTRlLWZiZjdlMmE3NjI2NHgxNzUzMzQ5NzYzMDMwOTIxMTg5. This upload is initiated here:
2322 3 2025-07-24 09:36:03.029347 127.0.0.1 37938 127.0.0.1 9000 HTTP 1038 3137684410 3137685382 1205827627 POST /testbucket/temp-analysis-rePel9pSRFGvTITOcsB5-A/test-register-contended-hNi6kYzJSIi2-rybzqYvtQ?uploads&x-purpose=RepositoryAnalysis HTTP/1.1
2329 3 2025-07-24 09:36:03.037629 127.0.0.1 9000 127.0.0.1 37938 HTTP/XML 964 1205827627 1205828525 3137685382 HTTP/1.1 200 OK
The UploadPart call is these packets:
2332 1 2025-07-24 09:36:03.039032 127.0.0.1 37922 127.0.0.1 9000 HTTP 1245 3202159751 3202160930 1110287112 PUT /testbucket/temp-analysis-rePel9pSRFGvTITOcsB5-A/test-register-contended-hNi6kYzJSIi2-rybzqYvtQ?partNumber=1&uploadId=YjM1ZThkMmYtZjZmNC00ZjQzLTllYjYtNjRhYzA5MTZmNzU4LjZkM2IxZmJmLTQ1Y2MtNGU3Ny1iOTRlLWZiZjdlMmE3NjI2NHgxNzUzMzQ5NzYzMDMwOTIxMTg5&x-purpose=RepositoryAnalysis HTTP/1.1
2337 1 2025-07-24 09:36:03.048395 127.0.0.1 9000 127.0.0.1 37922 HTTP 91 1110287112 1110287137 3202160930 HTTP/1.1 100 Continue
2342 1 2025-07-24 09:36:03.051469 127.0.0.1 37922 127.0.0.1 9000 HTTP 246 3202160930 3202161110 1110287137 Continuation
2365 1 2025-07-24 09:36:03.092290 127.0.0.1 9000 127.0.0.1 37922 TCP 66 1110287137 1110287137 3202161110 9000 → 37922 [ACK] Seq=1110287137 Ack=3202161110 Win=241304 Len=0 TSval=2841057002 TSecr=2841056961
2370 1 2025-07-24 09:36:03.104973 127.0.0.1 9000 127.0.0.1 37922 HTTP 557 1110287137 1110287628 3202161110 HTTP/1.1 200 OK
The concurrent AbortMultipartUpload happens here:
2343 4 2025-07-24 09:36:03.052320 127.0.0.1 37944 127.0.0.1 9000 HTTP 1036 3011403360 3011404330 2703330078 DELETE /testbucket/temp-analysis-rePel9pSRFGvTITOcsB5-A/test-register-contended-hNi6kYzJSIi2-rybzqYvtQ?uploadId=YjM1ZThkMmYtZjZmNC00ZjQzLTllYjYtNjRhYzA5MTZmNzU4LjZkM2IxZmJmLTQ1Y2MtNGU3Ny1iOTRlLWZiZjdlMmE3NjI2NHgxNzUzMzQ5NzYzMDMwOTIxMTg5&x-purpose=RepositoryAnalysis HTTP/1.1
2350 4 2025-07-24 09:36:03.055967 127.0.0.1 9000 127.0.0.1 37944 HTTP 504 2703330078 2703330516 3011404330 HTTP/1.1 204 No Content
There's several more concurrent listing attempts here too, all of which (acceptably) return this upload:
2324 1 2025-07-24 09:36:03.030075 127.0.0.1 37922 127.0.0.1 9000 HTTP 916 3202158901 3202159751 1110284842 GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1
2326 1 2025-07-24 09:36:03.034061 127.0.0.1 9000 127.0.0.1 37922 HTTP/XML 2336 1110284842 1110287112 3202159751 HTTP/1.1 200 OK
2331 3 2025-07-24 09:36:03.038564 127.0.0.1 37938 127.0.0.1 9000 HTTP 916 3137685382 3137686232 1205828525 GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1
2333 3 2025-07-24 09:36:03.041362 127.0.0.1 9000 127.0.0.1 37938 HTTP/XML 2336 1205828525 1205830795 3137686232 HTTP/1.1 200 OK
2334 3 2025-07-24 09:36:03.043022 127.0.0.1 37938 127.0.0.1 9000 HTTP 916 3137686232 3137687082 1205830795 GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1
2340 3 2025-07-24 09:36:03.049518 127.0.0.1 9000 127.0.0.1 37938 HTTP/XML 2336 1205830795 1205833065 3137687082 HTTP/1.1 200 OK
2336 4 2025-07-24 09:36:03.045071 127.0.0.1 37944 127.0.0.1 9000 HTTP 916 3011402510 3011403360 2703327808 GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1
2341 4 2025-07-24 09:36:03.050199 127.0.0.1 9000 127.0.0.1 37944 HTTP/XML 2336 2703327808 2703330078 3011403360 HTTP/1.1 200 OK
The first listing after the abort completes is this one, which still (unacceptably) returns the upload in its list:
2364 5 2025-07-24 09:36:03.091066 127.0.0.1 57320 127.0.0.1 9000 HTTP 916 1159969941 1159970791 2960446718 GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1
2368 5 2025-07-24 09:36:03.099768 127.0.0.1 9000 127.0.0.1 57320 HTTP/XML 2336 2960446718 2960448988 1159970791 HTTP/1.1 200 OK
We repeat the abort request here, receiving another 204 No Content success response:
3096 1 2025-07-24 09:36:04.178454 127.0.0.1 37922 127.0.0.1 9000 HTTP 1036 3202168270 3202169240 1110295867 DELETE /testbucket/temp-analysis-rePel9pSRFGvTITOcsB5-A/test-register-contended-hNi6kYzJSIi2-rybzqYvtQ?uploadId=YjM1ZThkMmYtZjZmNC00ZjQzLTllYjYtNjRhYzA5MTZmNzU4LjZkM2IxZmJmLTQ1Y2MtNGU3Ny1iOTRlLWZiZjdlMmE3NjI2NHgxNzUzMzQ5NzYzMDMwOTIxMTg5&x-purpose=RepositoryAnalysis HTTP/1.1
3099 1 2025-07-24 09:36:04.182823 127.0.0.1 9000 127.0.0.1 37922 HTTP 504 1110295867 1110296305 3202169240 HTTP/1.1 204 No Content
Even so, the next listing still returns this now-twice-aborted upload:
3102 5 2025-07-24 09:36:04.184039 127.0.0.1 57320 127.0.0.1 9000 HTTP 916 1160116163 1160117013 2960687678 GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1
3104 5 2025-07-24 09:36:04.186561 127.0.0.1 9000 127.0.0.1 57320 HTTP/XML 1451 2960687678 2960689063 1160117013 HTTP/1.1 200 OK
At the end of the run, the data remains on disk in MinIO's data path:
-rw-rw-r-- 1 davidturner davidturner 40 Jul 24 09:36 minio-data/.minio.sys/multipart/17cbdbdcc42fbe23382deea7139bc5928b55786100b8fa636d7ab64c000f85fa/6d3b1fbf-45cc-4e77-b94e-fbf7e2a76264x1753349763030921189/f7156f7b-bdd3-43d3-af32-a7e96f985e9c/part.1
-rw-rw-r-- 1 davidturner davidturner 65 Jul 24 09:36 minio-data/.minio.sys/multipart/17cbdbdcc42fbe23382deea7139bc5928b55786100b8fa636d7ab64c000f85fa/6d3b1fbf-45cc-4e77-b94e-fbf7e2a76264x1753349763030921189/f7156f7b-bdd3-43d3-af32-a7e96f985e9c/part.1.meta
(NB edited these last two lines, sorry, I shared the files from a different occurrence in my OP)
Context
AWS S3 has strongly-consistent (i.e. linearizable) semantics in its multipart upload APIs, and Elasticsearch relies on the consistency of the multipart upload APIs for correct operation of its snapshot functionality. Moreover it's important that partially-uploaded objects are cleaned up on abort to avoid resource exhaustion.
Regression
Unclear.
Your Environment
- Version used (
minio --version):RELEASE.2025-07-23T15-54-02Z - Server setup and configuration: Single-node-single-disk, started by
MINIO_ROOT_USER=testuser MINIO_ROOT_PASSWORD=password strace -o strace-minio.log -tttt -f ./minio server ${PWD}/minio-data. Data path is a regularext4filesystem. - Operating System and version (
uname -a):Linux david-turner-test-runner-20250626 6.14.0-1011-gcp #11-Ubuntu SMP Thu Jul 10 23:06:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux