Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@mnako
Copy link
Owner

@mnako mnako commented Sep 17, 2023

Description

As reported by @nikonov1101 in #49, CharsetReader was not performing a correct lookup for labels that do start with windows-.

This PR uses Thai language and encodings to illustrate the problem, add test cases, and fix the bug by first attempting a lookup of the original label, then (if not found) attempting a lookup of the normalised label, and (if not found again) raising an informative error. Assuming that most labels are correct and do not need the string replace, this should be the fastest approach.

Commits:

  1. Tests should fail on: Add test cases for iso-8859-11, windows-874, and tis-620 encoding using Thai as an example;
  2. Fix: modify decoders.decodeHeader.CharsetReader to lookup the original label, replace windows- with cp only if not found, and raise informative error, if not found again and show all test cases passing again.

@mnako mnako self-assigned this Sep 17, 2023
…p the original label, replace `windows-` with `cp` only if not found, and raise informative error, if not found again.
@mnako mnako marked this pull request as ready for review September 17, 2023 06:58
@mnako mnako merged commit da597ba into main Sep 17, 2023
@mnako mnako deleted the bugfix/49/windows-874-enc branch September 17, 2023 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants