-
-
Notifications
You must be signed in to change notification settings - Fork 36
Rationalize name-char #1008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationalize name-char #1008
Conversation
Here is a first cut at changes for name-char. Feedback welcome |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A rather cursory initial pass.
In addition to line comments, this PR needs to be accompanied by some test cases.
spec/message.abnf
Outdated
/ %xA1-61B ; omit Cc %x7F-9F, Whitespace %xA0, Ascii 【`】 【{|}~】 | ||
/ %x61D-167F ; omit BidiControl %x61C | ||
/ %x1681-1FFF ; omit Whitespace %x1680 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the other character range definitions, the "omit" comments are offset by one compared to how they're here, as in (showing these three lines only as an example):
/ %xA1-61B ; omit Cc %x7F-9F, Whitespace %xA0, Ascii 【`】 【{|}~】 | |
/ %x61D-167F ; omit BidiControl %x61C | |
/ %x1681-1FFF ; omit Whitespace %x1680 | |
/ %xA1-61B ; omit BidiControl %x61C | |
/ %x61D-167F ; omit Whitespace %x1680 | |
/ %x1681-1FFF ; omit Whitespace %x2000-200A |
The same style should be used in all these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a difference on my screen. Do you want an additional space before or after 'omit', or a space deleted before or after 'omit'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the EA brackets to guillemets, since they line up better for monospace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant offset in a vertical direction, so a comment like "omit BidiControl %x61C" should follow the range %xA1-61B
, rather than the range %x61D-167F
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will work on that.
spec/message.abnf
Outdated
/ %xA1-61B ; omit Cc %x7F-9F, Whitespace %xA0, Ascii 【`】 【{|}~】 | ||
/ %x61D-167F ; omit BidiControl %x61C | ||
/ %x1681-1FFF ; omit Whitespace %x1680 | ||
/ %x200B-200D ; omit Whitespace %x2000-200A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set is ZWSP, ZWNJ, and ZWJ. Should they really be included in name-start
? That seems surprising to me, and with no positive utility.
We will need ZWNJ and ZWJ within names, though, so maybe it's fine for them to be here. But why ZWSP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have name-char
. Why not put the joiners in there?
I kind of also question ZWSP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no particular utility to having ZWSP start a name-char, nor any utility to having it end a name-char. (
It doesn't hurt to move that one (ZWSP) to name-char, but it doesn't really make a dent either — and we really wouldn't want to go too far down the very long and slippery slope. That's for linters and guidance.
That being said, if people want it out I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question here is why put these characters in name-start
, where they have no utility? At least in name-char
they would be enclosed or at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm afraid of is that if we move that to name-char, it will just open it up to people endlessly complaining that:
"ZWSP" is in name-chart instead of name-start: why is XXX in name-start when it should also be just be in name-chart???:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, the basic difference between name-char and name-start is that
- name is used in identifiers and variables, and can't start with a digit, -, or .
- name-char is used in literals, and can start with digit, -, .
The syntactic motivation is clear: to make sure that identifiers and variables are distinguishable from numbers. That is a clear syntactic need.
ZWSP certainly isn't needed at the start of an identifier or variable, but there is an large and complicated list of characters that are also not needed at start of identifiers and variables, and plucking just one of those characters out, without any syntactic need, doesn't actually provide much value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we do not allow space characters or control characters, I'd prefer not allowing zero-width spaces in names or unquoted literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A zero-width space is not a space; that is just a name used for familiarity. It is a Format character, like many others.
https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7Bgc%3Dformat%7D&g=&i=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good start. Some small things.
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Addison Phillips <[email protected]>
This also needs to be fixed: message-format-wg/spec/syntax.md Lines 789 to 793 in 1fccd1e
|
spec/syntax.md
Outdated
They are similar to <cite>Namespaces in XML 1.0</cite>'s [NCName](https://www.w3.org/TR/xml-names/#NT-NCName), | ||
but have been updated to be more consistent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how far we've departed from that spec, this note seems only relevant within the history of the spec, and not with where we're ending up with this PR. I'd prefer dropping it, and moving the preceding sentence (if keeping) above the preceding note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
Co-authored-by: Addison Phillips <[email protected]> Co-authored-by: Eemeli Aro <[email protected]>
/ %x2B ; «+» omit Ascii: «,-./0123456789:;<=>?@» «[\]^» | ||
/ %x5F ; «_» omit Cc: %x7F-9F, Whitespace: %xA0, Ascii: «`» «{|}~» |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there two sets of guillemet-quoted ASCII characters here?
/ %x2B ; «+» omit Ascii: «,-./0123456789:;<=>?@» «[\]^» | |
/ %x5F ; «_» omit Cc: %x7F-9F, Whitespace: %xA0, Ascii: «`» «{|}~» | |
/ %x2B ; «+» omit Ascii: «,-./0123456789:;<=>?@[]^» and REVERSE SOLIDUS "\" | |
/ %x5F ; «_» omit Ascii: «`{|}~», Cc: %x7F-9F, Whitespace: %xA0 |
#724 Rationalize name-char