-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
bpo-33305: Improve SyntaxError for invalid numerical literals. #6517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-33305: Improve SyntaxError for invalid numerical literals. #6517
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be great.
Out of curiosity, would it make sense to add the character causing troubles in all cases ?
"invalid digit '%c' in octal literal", c); | ||
} | ||
else { | ||
return syntaxerror(tok, "invalid octal literal"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add the value of c in this message as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is raised in the case if an underscore or 0o
is not followed by a digit. What error messages could be helpful for 0o+2
, 0o + 2
, (2+0o)
, 0or[]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I'd expect "invalid character '%c' in octal literal" to be useful in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is easy to report only if an invalid digit (in the range 2-9 or 8-9) is occurred. In general case there are much subtle details, handling them will complicate the code too much:
- Not always an invalid character exists. This error can be raised at the end of the input.
- It can be non-ASCII. In this case we need to decode a multibyte UTF-8 for getting a character.
- It can be non-printable.
- Even if it is printable from the Unicode's point of view, it can look indistinguishably from other characters. For example, non-breakable space character looks like an ordinary space for humans, but not for the Python parser.
- Even in ASCII there are non-printable characters, or characters that need special handling: tab, newline, single quote, backslash, ...
It may be worth to produce more specialized error message for some cases, but just reporting the next invalid character is no a way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ho, indeed, I didn't think about all these issues...
I'm going to merge this PR if there are no other suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new messages look great to me; I haven't reviewed the C code changes in detail.
https://bugs.python.org/issue33305