Thanks to visit codestin.com
Credit goes to github.com

Skip to content

CJK Extension B characters cannot be used in Content-Disposition FileNameStar #28145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jayspadie opened this issue Dec 11, 2018 · 8 comments
Closed
Assignees
Milestone

Comments

@jayspadie
Copy link

A file name with CJK Extension B characters, such as 𠀀𠀁𠀂𠀃𪛑𪛒𪛓𪛔𪛕𪛖​.txt, is not properly encoded for the FileNameStar property of the Content-Disposition header. These characters are replaced with \uFFFD, such as ��������������.txt.

Using the code below:

response.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment")
{
  FileName = fileName,
  FileNameStar = fileName
};

We can observe while debugging in Visual Studio:
image

This causes the file download in the browser to be named as ��������������.txt.

The encoding appears to be done in HeaderUtilities.IsInputEncoded5987(). Can anything be done to support these characters?

Thanks - Jay

@karelz
Copy link
Member

karelz commented Dec 12, 2018

@Caesar1995 you looked into encoding bug in HTTP headers recently. Can you please check this one too?

@caesar-chen
Copy link
Contributor

Yes I can take a look.

@caesar-chen
Copy link
Contributor

RFC 6266 Appendix D:

Include a "filename*" parameter where the desired filename cannot
be expressed faithfully using the "filename" form. Note that
legacy user agents will not process this, and will fall back to
using the "filename" parameter's content.

From screen shot, seems FileName field has the information you want. Why do you need FileNameStar field?

@jayspadie
Copy link
Author

@Caesar1995 , Also from RFC 6266 Appendix D:

Avoid using non-ASCII characters in the filename parameter.
Although most existing implementations will decode them as
ISO-8859-1, some will apply heuristics to detect UTF-8, and thus
might fail on certain names.

Since FileName is typically decoded as ISO-8859-1, FileNameStar is used by newer browsers to support UTF-8. The general recommendation is to set them both. I'm showing the debug values from Visual Studio in the bug description, but browsers handle it differently. Edge, for example, will download the file as ��������������.txt if FileNameStar is specified. If it is not specified, and we rely only on FileName, then Edge (and IE) incorrectly download the files as shown here:
image

Chrome is apparently smart enough to decode FileName correctly, but prefers the incorrect FileNameStar when they are both available. Again, the general recommendation from many sources on the web are to set them both, which typically works great, but these Ext-B characters aren't getting encoded correctly for FileNameStar.

@wfurt
Copy link
Member

wfurt commented Feb 22, 2019

cc: @rmkerr since you had fun with unicode recently.

@rmkerr
Copy link
Contributor

rmkerr commented Feb 22, 2019

The bug is in the encoding function linked in the original post:

https://github.com/dotnet/corefx/blob/144beee5993051c910410f7e6fd6bdf9cd12e6fa/src/System.Net.Http/src/System/Net/Http/Headers/HeaderUtilities.cs#L86-L92

The foreach loop on line 86 does not consider that individual Unicode scalars may be composed of surrogate pairs. It tries to get the UTF-8 encoding of the high and low surrogates separately. Since surrogates are invalid on their own, they get replaced with �.

We should rework this loop to handle surrogate pairs correctly.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the Future milestone Feb 1, 2020
@ghost
Copy link

ghost commented Aug 25, 2023

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

@ghost ghost added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Aug 25, 2023
@karelz karelz removed no-recent-activity backlog-cleanup-candidate An inactive issue that has been marked for automated closure. labels Aug 25, 2023
@ManickaP ManickaP self-assigned this Apr 16, 2025
@ManickaP
Copy link
Member

This seems to be long time fixed by dotnet/corefx#36627

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants