Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Aborted multipart upload remains in ListMultipartUploads response and leaves uploaded parts on disk #21456

@DaveCTurner

Description

@DaveCTurner

NB these are the same symptoms, and a similar test setup, as we saw in #21189 but this reproduces with newer releases including RELEASE.2025-07-23T15-54-02Z in which #21189 is fixed.

Expected Behavior

If a multipart upload is initiated and then successfully aborted, it should not be returned in subsequent listings of the active uploads. Moreover, any data related to the aborted upload should be removed from disk (or at least moved to .trash).

Current Behavior

In a case where we initiate a multipart upload, then upload a part while concurrently calling both the AbortMultipartUpload and ListMultipartUploads APIs, we extremely rarely receive a successful 204 No Content response to the AbortMultipartUpload request but without the upload being cleaned up properly. In particular, it continues to be returned by subsequent calls to the ListMultipartUploads API, and appears to leave the uploaded data on disk forever.

Although they should not be necessary, it seems that subsequent attempts to remove the upload by calling AbortMultipartUpload again do not resolve the situation.

Possible Solution

Unknown, sorry.

Steps to Reproduce (for bugs)

Similar to #21189, it seems to relate to attempts to abort a MPU while uploading a part. I cannot reliably reproduce the problem, but I can share a packet capture showing the issue at the API level, a listing of the MinIO data path at the end of the run, and also a strace capture showing how MinIO is interacting with the filesystem at the time of the problem:

run-2025-07-24T09:36:16Z.tar.gz

In this run, the problematic upload has ID YjM1ZThkMmYtZjZmNC00ZjQzLTllYjYtNjRhYzA5MTZmNzU4LjZkM2IxZmJmLTQ1Y2MtNGU3Ny1iOTRlLWZiZjdlMmE3NjI2NHgxNzUzMzQ5NzYzMDMwOTIxMTg5. This upload is initiated here:

2322    3   2025-07-24 09:36:03.029347  127.0.0.1   37938   127.0.0.1   9000    HTTP    1038    3137684410  3137685382  1205827627  POST /testbucket/temp-analysis-rePel9pSRFGvTITOcsB5-A/test-register-contended-hNi6kYzJSIi2-rybzqYvtQ?uploads&x-purpose=RepositoryAnalysis HTTP/1.1 
2329    3   2025-07-24 09:36:03.037629  127.0.0.1   9000    127.0.0.1   37938   HTTP/XML    964 1205827627  1205828525  3137685382  HTTP/1.1 200 OK 

The UploadPart call is these packets:

2332    1   2025-07-24 09:36:03.039032  127.0.0.1   37922   127.0.0.1   9000    HTTP    1245    3202159751  3202160930  1110287112  PUT /testbucket/temp-analysis-rePel9pSRFGvTITOcsB5-A/test-register-contended-hNi6kYzJSIi2-rybzqYvtQ?partNumber=1&uploadId=YjM1ZThkMmYtZjZmNC00ZjQzLTllYjYtNjRhYzA5MTZmNzU4LjZkM2IxZmJmLTQ1Y2MtNGU3Ny1iOTRlLWZiZjdlMmE3NjI2NHgxNzUzMzQ5NzYzMDMwOTIxMTg5&x-purpose=RepositoryAnalysis HTTP/1.1 
2337    1   2025-07-24 09:36:03.048395  127.0.0.1   9000    127.0.0.1   37922   HTTP    91  1110287112  1110287137  3202160930  HTTP/1.1 100 Continue 
2342    1   2025-07-24 09:36:03.051469  127.0.0.1   37922   127.0.0.1   9000    HTTP    246 3202160930  3202161110  1110287137  Continuation
2365    1   2025-07-24 09:36:03.092290  127.0.0.1   9000    127.0.0.1   37922   TCP 66  1110287137  1110287137  3202161110  9000 → 37922 [ACK] Seq=1110287137 Ack=3202161110 Win=241304 Len=0 TSval=2841057002 TSecr=2841056961
2370    1   2025-07-24 09:36:03.104973  127.0.0.1   9000    127.0.0.1   37922   HTTP    557 1110287137  1110287628  3202161110  HTTP/1.1 200 OK 

The concurrent AbortMultipartUpload happens here:

2343    4   2025-07-24 09:36:03.052320  127.0.0.1   37944   127.0.0.1   9000    HTTP    1036    3011403360  3011404330  2703330078  DELETE /testbucket/temp-analysis-rePel9pSRFGvTITOcsB5-A/test-register-contended-hNi6kYzJSIi2-rybzqYvtQ?uploadId=YjM1ZThkMmYtZjZmNC00ZjQzLTllYjYtNjRhYzA5MTZmNzU4LjZkM2IxZmJmLTQ1Y2MtNGU3Ny1iOTRlLWZiZjdlMmE3NjI2NHgxNzUzMzQ5NzYzMDMwOTIxMTg5&x-purpose=RepositoryAnalysis HTTP/1.1 
2350    4   2025-07-24 09:36:03.055967  127.0.0.1   9000    127.0.0.1   37944   HTTP    504 2703330078  2703330516  3011404330  HTTP/1.1 204 No Content 

There's several more concurrent listing attempts here too, all of which (acceptably) return this upload:

2324    1   2025-07-24 09:36:03.030075  127.0.0.1   37922   127.0.0.1   9000    HTTP    916 3202158901  3202159751  1110284842  GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1 
2326    1   2025-07-24 09:36:03.034061  127.0.0.1   9000    127.0.0.1   37922   HTTP/XML    2336    1110284842  1110287112  3202159751  HTTP/1.1 200 OK 

2331    3   2025-07-24 09:36:03.038564  127.0.0.1   37938   127.0.0.1   9000    HTTP    916 3137685382  3137686232  1205828525  GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1 
2333    3   2025-07-24 09:36:03.041362  127.0.0.1   9000    127.0.0.1   37938   HTTP/XML    2336    1205828525  1205830795  3137686232  HTTP/1.1 200 OK 

2334    3   2025-07-24 09:36:03.043022  127.0.0.1   37938   127.0.0.1   9000    HTTP    916 3137686232  3137687082  1205830795  GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1 
2340    3   2025-07-24 09:36:03.049518  127.0.0.1   9000    127.0.0.1   37938   HTTP/XML    2336    1205830795  1205833065  3137687082  HTTP/1.1 200 OK 

2336    4   2025-07-24 09:36:03.045071  127.0.0.1   37944   127.0.0.1   9000    HTTP    916 3011402510  3011403360  2703327808  GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1 
2341    4   2025-07-24 09:36:03.050199  127.0.0.1   9000    127.0.0.1   37944   HTTP/XML    2336    2703327808  2703330078  3011403360  HTTP/1.1 200 OK 

The first listing after the abort completes is this one, which still (unacceptably) returns the upload in its list:

2364    5   2025-07-24 09:36:03.091066  127.0.0.1   57320   127.0.0.1   9000    HTTP    916 1159969941  1159970791  2960446718  GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1 
2368    5   2025-07-24 09:36:03.099768  127.0.0.1   9000    127.0.0.1   57320   HTTP/XML    2336    2960446718  2960448988  1159970791  HTTP/1.1 200 OK 

We repeat the abort request here, receiving another 204 No Content success response:

3096    1   2025-07-24 09:36:04.178454  127.0.0.1   37922   127.0.0.1   9000    HTTP    1036    3202168270  3202169240  1110295867  DELETE /testbucket/temp-analysis-rePel9pSRFGvTITOcsB5-A/test-register-contended-hNi6kYzJSIi2-rybzqYvtQ?uploadId=YjM1ZThkMmYtZjZmNC00ZjQzLTllYjYtNjRhYzA5MTZmNzU4LjZkM2IxZmJmLTQ1Y2MtNGU3Ny1iOTRlLWZiZjdlMmE3NjI2NHgxNzUzMzQ5NzYzMDMwOTIxMTg5&x-purpose=RepositoryAnalysis HTTP/1.1 
3099    1   2025-07-24 09:36:04.182823  127.0.0.1   9000    127.0.0.1   37922   HTTP    504 1110295867  1110296305  3202169240  HTTP/1.1 204 No Content 

Even so, the next listing still returns this now-twice-aborted upload:

3102    5   2025-07-24 09:36:04.184039  127.0.0.1   57320   127.0.0.1   9000    HTTP    916 1160116163  1160117013  2960687678  GET /testbucket?uploads&prefix=temp-analysis-rePel9pSRFGvTITOcsB5-A%2Ftest-register-contended-hNi6kYzJSIi2-rybzqYvtQ&x-purpose=RepositoryAnalysis HTTP/1.1 
3104    5   2025-07-24 09:36:04.186561  127.0.0.1   9000    127.0.0.1   57320   HTTP/XML    1451    2960687678  2960689063  1160117013  HTTP/1.1 200 OK 

At the end of the run, the data remains on disk in MinIO's data path:

-rw-rw-r-- 1 davidturner davidturner   40 Jul 24 09:36 minio-data/.minio.sys/multipart/17cbdbdcc42fbe23382deea7139bc5928b55786100b8fa636d7ab64c000f85fa/6d3b1fbf-45cc-4e77-b94e-fbf7e2a76264x1753349763030921189/f7156f7b-bdd3-43d3-af32-a7e96f985e9c/part.1
-rw-rw-r-- 1 davidturner davidturner   65 Jul 24 09:36 minio-data/.minio.sys/multipart/17cbdbdcc42fbe23382deea7139bc5928b55786100b8fa636d7ab64c000f85fa/6d3b1fbf-45cc-4e77-b94e-fbf7e2a76264x1753349763030921189/f7156f7b-bdd3-43d3-af32-a7e96f985e9c/part.1.meta

(NB edited these last two lines, sorry, I shared the files from a different occurrence in my OP)

Context

AWS S3 has strongly-consistent (i.e. linearizable) semantics in its multipart upload APIs, and Elasticsearch relies on the consistency of the multipart upload APIs for correct operation of its snapshot functionality. Moreover it's important that partially-uploaded objects are cleaned up on abort to avoid resource exhaustion.

Regression

Unclear.

Your Environment

  • Version used (minio --version): RELEASE.2025-07-23T15-54-02Z
  • Server setup and configuration: Single-node-single-disk, started by MINIO_ROOT_USER=testuser MINIO_ROOT_PASSWORD=password strace -o strace-minio.log -tttt -f ./minio server ${PWD}/minio-data. Data path is a regular ext4 filesystem.
  • Operating System and version (uname -a): Linux david-turner-test-runner-20250626 6.14.0-1011-gcp #11-Ubuntu SMP Thu Jul 10 23:06:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions