[uchardet] Encoding detection error for Chinese document containing Chinese and Japanese text

Notepad3 (x64) 6.26.227.1 beta

This is a condensed sample for a larger document.

Reproduction steps:
Enable the [Settings2] `DevDebugMode=1` in Notepad3.ini. (Optional)
Save the following text to a file with Chinese GB18030. `寄生少女ゆうちゃん`
Reopen the file with Notepad3.
Incorrect encoding and content. `UCD='EUC-JP' Conf=95% (reliable (90%)) [Invalid UTF-8] - OS-CP='ANSI(CP-936)` in the title bar.

This sometimes happens for files that contain something similar, but not always, depending on the content. e.g. the file with following text are correctly identified: `图书 寄生少女ゆうちゃん`. But once you remove the "图" or "书" character, the error occurs.

EmEditor always does the correct and identifies the file as GB2312.
Any ideas to change?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[uchardet] Encoding detection error for Chinese document containing Chinese and Japanese text #5557

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[uchardet] Encoding detection error for Chinese document containing Chinese and Japanese text #5557

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions