-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError #85287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A number of codecs raise bare UnicodeError, rather than Unicode{Decode,Encode}Error. Example: File "/home/antoine/miniconda3/envs/pyarrow/lib/python3.7/encodings/utf_16.py", line 67, in _buffer_decode A more complete list can be found here: |
This looks like an easy task. Shall I create a PR? |
Hi, IMO this can be mark as an easy issue. @thatiparthy please, go ahead |
UnicodeEncodeError and UnicodeDecodeError are used to report un(en|de)codedable ranges in the source object, so it wouldn't make sense to use them for errors that have nothing to do with problems in the source object. Their constructor requires 5 arguments (encoding, object, start, end, reason), not just a simple message: e.g. UnicodeEncodeError("utf-8", "foo", 17, 23, "bad string"). But for reporting e.g. missing BOMs at the start it would be useful to use (0, 0) as the offending range. |
@utk You could have taken some other easy issue from https://bugs.python.org/issue?status=1&@sort=-activity&@columns=id%2Cactivity%2Ctitle%2Ccreator%2Cstatus&@dispname=Easy%20issues&@startwith=0&@group=priority&keywords=6&@action=search&@filter=&@pagesize=50 instead of copy pasting my work. |
@thatiparthy These were the most logical changes, standard error messages, which were already there in the existing code, I just edited them as mentioned here. What part of your "work" do you think i copied? |
Hello @pitrouIs this issue still open ? |
@SahillMulani Unfortunately yes, and we need a PR. Your contribution is welcomed, no assignment is required. |
@SahillMulani, I am looking to perhaps work on this issue, but will avoid it if you are. Please let me know what your status is. |
@pitrou Regarding the start and end values that are used to communicate where the bad values are, what should be done if the absolute location cannot be determined? There are portions of code, such as with punycode, where the encoded byes is divided up for processing and the only positional data available at the time of exception is the relative position within the given segment. Should 0,0 be used in a situation like this? |
@methane I was told you may be an interested party that could answer questions about unicode exception handling. Regarding my question above, what are you expectations with regards to the reported start and end positions? |
It seems this issue is not fixed yet. c
python:
|
Decided to give this a shot as I see it's still unresolved on the The biggest question is whether it's worth having the codec functions save their original arguments just so they can be shown if an exception happens. It might not even be extra memory overhead since the caller is probably still holding a reference to the argument object in most cases. Also this is my first time looking at the CPython codebase and my first time using the C API so please let me know if I'm making any beginner mistakes. |
…deDecodeError (#113674) Co-authored-by: Inada Naoki <[email protected]>
… UnicodeDecodeError (python#113674) Co-authored-by: Inada Naoki <[email protected]>
… UnicodeDecodeError (python#113674) Co-authored-by: Inada Naoki <[email protected]>
… UnicodeDecodeError (python#113674) Co-authored-by: Inada Naoki <[email protected]>
Isn't this fixed? |
Uh oh!
There was an error while loading. Please reload this page.
Unicode{Encode, Decode}Error
rather than plainUnicodeError
#21170Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: