Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jiuker
Copy link
Contributor

@jiuker jiuker commented May 12, 2025

fix: Use mime encode for Non-US-ASCII metadata

Community Contribution License

All community contributions in this pull request are licensed to the project maintainers
under the terms of the Apache 2 license.
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.

Description

fix #21256

Motivation and Context

How to test this PR?

mc cp main.go minio9000/mytest/ --attr="minio=Miniö"

# mc stat main.go s3/mytest/main.go

Name      : main.go
Date      : 2025-05-12 15:50:33 CST
Size      : 113 B
ETag      : 5228a21c835cfe6107c7a19f079aeb01
VersionID : b0ad1ece-d742-4cf0-a881-ca15cd748b6c
Type      : file
Metadata  :
  X-Amz-Meta-Minio: =?UTF-8?b?TWluacO2?=
  Content-Type    : application/octet-stream

Show X-Amz-Meta-Minio: Miniö before

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Optimization (provides speedup with no functional changes)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Unit tests added/updated
  • Internal documentation updated
  • Create a documentation update request here

@jiuker jiuker requested review from Copilot, harshavardhana, klauspost and shtripat and removed request for klauspost May 12, 2025 07:52
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the encoding of Non-US-ASCII metadata by applying MIME encoding with conditional use of either B-encoding or Q-encoding based on the encoded length.

  • Added MIME and Base64 package imports for encoding
  • Updated metadata headers in setObjectHeaders with heuristic-based encoding
  • Introduced the helper function needsMimeEncoding to determine when encoding is required

@klauspost
Copy link
Contributor

It seems like "How to test this PR?" just causes a regression the way you present it. Does AWS S3 do the same? Is there a bug in mc?

@jiuker

This comment was marked as off-topic.

@jiuker jiuker requested a review from klauspost May 12, 2025 08:24
fix: Use mime encode for Non-US-ASCII metadata
@jiuker jiuker force-pushed the fix-Use-mime-encode-for-Non-US-ASCII-metadata branch from e71c5e4 to 67d6f5e Compare May 12, 2025 09:34
@harshavardhana
Copy link
Member

It seems like "How to test this PR?" just causes a regression the way you present it. Does AWS S3 do the same? Is there a bug in mc?

bug in AWS S3 @klauspost

@harshavardhana
Copy link
Member

It looks like @jiuker didn't add the comment that there is a bug in AWS S3, we should write that here in this code path.

@klauspost
Copy link
Contributor

Add tests with samples referenced against AWS, so sending [xyz...] in the header should produce the same response [qwe...] as AWS. If you do that on the Server we are fine.

Also fix up minio-go, so it is round-trip safe. That means you get back what you send. If you did the server correctly, that should work against both AWS and MinIO servers.

@harshavardhana
Copy link
Member

Add tests with samples referenced against AWS, so sending [xyz...] in the header should produce the same response [qwe...] as AWS. If you do that on the Server we are fine.

@klauspost AWS returns wrong characters - here is missing comment from the PR that I had asked @jiuker to add.

// For metadata values like "ö", "ÄMÄZÕÑ S3", and "öha, das 你好 sollte eigentlich
// funktionieren", tested against a real AWS S3 bucket, S3 may encode incorrectly. For
// example, "ö" was encoded as =?UTF-8?B?w4PCtg==?=, producing invalid UTF-8 instead
// of =?UTF-8?B?w7Y=?=. This mirrors errors like the ä½ in another string.
//
// S3 uses B-encoding (Base64) for non-ASCII-heavy metadata and Q-encoding
// (quoted-printable) for mostly ASCII strings. Long strings are split at word
// boundaries to fit RFC 2047’s 75-character limit, ensuring HTTP parser
// compatibility.
//
// However, this splitting increases header size and can introduce errors, unlike Go’s
// mime package in MinIO, which correctly encodes strings with fixed B/Q encodings,
// avoiding S3’s heuristic-driven issues.
//
// For MinIO developers, decode S3 metadata with mime.WordDecoder, validate outputs,
// report encoding bugs to AWS, and use ASCII-only metadata to ensure reliable S3 API
// compatibility.

doc
doc
@jiuker
Copy link
Contributor Author

jiuker commented May 13, 2025

Add tests with samples referenced against AWS, so sending [xyz...] in the header should produce the same response [qwe...] as AWS. If you do that on the Server we are fine.添加测试,其中包含参考 AWS 的示例,因此发送[xyz.]在 header 中应该产生相同的响应[qwe...]作为 AWS。如果你在服务器上这样做,我们就很好。

@klauspost AWS returns wrong characters - here is missing comment from the PR that I had asked @jiuker to add.AWS 返回错误的字符-这里是我要求添加的 PR 中缺少的评论。

// For metadata values like "ö", "ÄMÄZÕÑ S3", and "öha, das 你好 sollte eigentlich
// funktionieren", tested against a real AWS S3 bucket, S3 may encode incorrectly. For
// example, "ö" was encoded as =?UTF-8?B?w4PCtg==?=, producing invalid UTF-8 instead
// of =?UTF-8?B?w7Y=?=. This mirrors errors like the ä½ in another string.
//
// S3 uses B-encoding (Base64) for non-ASCII-heavy metadata and Q-encoding
// (quoted-printable) for mostly ASCII strings. Long strings are split at word
// boundaries to fit RFC 2047’s 75-character limit, ensuring HTTP parser
// compatibility.
//
// However, this splitting increases header size and can introduce errors, unlike Go’s
// mime package in MinIO, which correctly encodes strings with fixed B/Q encodings,
// avoiding S3’s heuristic-driven issues.
//
// For MinIO developers, decode S3 metadata with mime.WordDecoder, validate outputs,
// report encoding bugs to AWS, and use ASCII-only metadata to ensure reliable S3 API
// compatibility.

Added

@klauspost
Copy link
Contributor

@harshavardhana Yes. I saw your comment and asked for clarification since only half of it makes sense.

This should still have tests, either here or in mint.

@jiuker
Copy link
Contributor Author

jiuker commented May 13, 2025

@harshavardhana Yes. I saw your comment and asked for clarification since only half of it makes sense.

This should still have tests, either here or in mint.

Will have test for mint. @klauspost

Copy link
Contributor

@shtripat shtripat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@harshavardhana harshavardhana merged commit 12a6ea8 into minio:master May 22, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Non-US-ASCII metadata can be set using a presigned multipart/form-data request

4 participants