Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[uchardet] Encoding detection error for Chinese document containing Chinese and Japanese text #5557

@yfdyh000

Description

@yfdyh000

Notepad3 (x64) 6.26.227.1 beta

This is a condensed sample for a larger document.

Reproduction steps:
Enable the [Settings2] DevDebugMode=1 in Notepad3.ini. (Optional)
Save the following text to a file with Chinese GB18030. 寄生少女ゆうちゃん
Reopen the file with Notepad3.
Incorrect encoding and content. UCD='EUC-JP' Conf=95% (reliable (90%)) [Invalid UTF-8] - OS-CP='ANSI(CP-936) in the title bar.

This sometimes happens for files that contain something similar, but not always, depending on the content. e.g. the file with following text are correctly identified: 图书 寄生少女ゆうちゃん. But once you remove the "图" or "书" character, the error occurs.

EmEditor always does the correct and identifies the file as GB2312.
Any ideas to change?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions