Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Charset filter: improved validation of charset_map with utf-8. #553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 9, 2025

Conversation

pluknet
Copy link
Contributor

@pluknet pluknet commented Feb 27, 2025

It was possible to write outside of the buffer boundary used to keep UTF-8 decoded values when parsing conversion table configuration.

As it happens before UTF-8 decoding, the fix is to check in advance if character codes are of more than 3-byte sequence. Note that this is already enforced later by checking ngx_utf8_decode() decoded values for 0xffff, which corresponds to the maximum value encoded as a valid 3-byte sequence, so the fix should not result in functional changes.

Found with AddressSanitizer.
Fixes GitHub issue #529.

Copy link
Contributor

@arut arut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks ok. However the commit log says that ngx_utf8_decode() return value check enforces <= 3 byte sequence. This does not seem to be the case. Here's an example of a 4-byte sequence, for which this function returns 0xb0, which passes the check. This 4-byte sequence at the last table position (0xff) triggers buffer overflow.

    charset_map foo utf-8 {                                                      
        ff c2f0f0f0;                                                             
    }

@pluknet
Copy link
Contributor Author

pluknet commented Mar 27, 2025

The patch looks ok. However the commit log says that ngx_utf8_decode() return value check enforces <= 3 byte sequence. This does not seem to be the case. Here's an example of a 4-byte sequence, for which this function returns 0xb0, which passes the check. This 4-byte sequence at the last table position (0xff) triggers buffer overflow.

    charset_map foo utf-8 {                                                      
        ff c2f0f0f0;                                                             
    }

0xc2f0f0f0 is an invalid sequence. 0xc2 prefix is used to represent 2-byte UTF8 character, but here appears to be some trailers such as trailing "f0f0", which is fairly ignored by ngx_utf8_decode()
This behaviour is perfectly fine as ngx_utf8_decode() is used to be called over combined byte sequences that go one after another.

It was possible to write outside of the buffer used to keep UTF-8
decoded values when parsing conversion table configuration.

Since this happened before UTF-8 decoding, the fix is to check in
advance if character codes are of more than 3-byte sequence.  Note
that this is already enforced by a later check for ngx_utf8_decode()
decoded values for 0xffff, which corresponds to the maximum value
encoded as a valid 3-byte sequence, so the fix does not affect the
valid values.

Found with AddressSanitizer.
Fixes GitHub issue nginx#529.
@pluknet pluknet merged commit a813c63 into nginx:master Apr 9, 2025
1 check passed
@pluknet pluknet deleted the charset branch April 9, 2025 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants