Charset filter: improved validation of charset_map with utf-8. #553

pluknet · 2025-02-27T16:13:18Z

It was possible to write outside of the buffer boundary used to keep UTF-8 decoded values when parsing conversion table configuration.

As it happens before UTF-8 decoding, the fix is to check in advance if character codes are of more than 3-byte sequence. Note that this is already enforced later by checking ngx_utf8_decode() decoded values for 0xffff, which corresponds to the maximum value encoded as a valid 3-byte sequence, so the fix should not result in functional changes.

Found with AddressSanitizer.
Fixes GitHub issue #529.

arut

The patch looks ok. However the commit log says that ngx_utf8_decode() return value check enforces <= 3 byte sequence. This does not seem to be the case. Here's an example of a 4-byte sequence, for which this function returns 0xb0, which passes the check. This 4-byte sequence at the last table position (0xff) triggers buffer overflow.

    charset_map foo utf-8 {                                                      
        ff c2f0f0f0;                                                             
    }

pluknet · 2025-03-27T17:03:26Z

The patch looks ok. However the commit log says that ngx_utf8_decode() return value check enforces <= 3 byte sequence. This does not seem to be the case. Here's an example of a 4-byte sequence, for which this function returns 0xb0, which passes the check. This 4-byte sequence at the last table position (0xff) triggers buffer overflow.
    charset_map foo utf-8 {                                                      
        ff c2f0f0f0;                                                             
    }

0xc2f0f0f0 is an invalid sequence. 0xc2 prefix is used to represent 2-byte UTF8 character, but here appears to be some trailers such as trailing "f0f0", which is fairly ignored by ngx_utf8_decode()
This behaviour is perfectly fine as ngx_utf8_decode() is used to be called over combined byte sequences that go one after another.

It was possible to write outside of the buffer used to keep UTF-8 decoded values when parsing conversion table configuration. Since this happened before UTF-8 decoding, the fix is to check in advance if character codes are of more than 3-byte sequence. Note that this is already enforced by a later check for ngx_utf8_decode() decoded values for 0xffff, which corresponds to the maximum value encoded as a valid 3-byte sequence, so the fix does not affect the valid values. Found with AddressSanitizer. Fixes GitHub issue nginx#529.

pluknet mentioned this pull request Feb 27, 2025

ngx_http_charset_map: segmentatino fault (read out-of-bounds) #529

Closed

Maryna-f5 added this to the nginx-1.27.5 milestone Mar 4, 2025

pluknet requested a review from arut March 4, 2025 19:27

arut reviewed Mar 20, 2025

View reviewed changes

arut approved these changes Apr 7, 2025

View reviewed changes

pluknet force-pushed the charset branch from 6be390c to 723369f Compare April 9, 2025 15:36

pluknet merged commit a813c63 into nginx:master Apr 9, 2025
1 check passed

pluknet deleted the charset branch April 9, 2025 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Charset filter: improved validation of charset_map with utf-8. #553

Charset filter: improved validation of charset_map with utf-8. #553

pluknet commented Feb 27, 2025

arut left a comment

pluknet commented Mar 27, 2025

Charset filter: improved validation of charset_map with utf-8. #553

Charset filter: improved validation of charset_map with utf-8. #553

Conversation

pluknet commented Feb 27, 2025

arut left a comment

Choose a reason for hiding this comment

pluknet commented Mar 27, 2025