Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b343bfa

Browse files
macchiatiaphillipseemeli
authored
Rationalize name-char (#1008)
* Rationalize name-char #724 * Make corresponding changes in syntax.md * Fix long lines * Fix long lines 2 * Update spec/syntax.md Co-authored-by: Eemeli Aro <[email protected]> * Apply suggestions from code review Co-authored-by: Addison Phillips <[email protected]> * Review comment re XML * Drop Cs * Remove Cs from exclusions * Fix reference to XML * Put 'omit' on line before * Put omit on previous line * Drop XML reference * Apply suggestions from code review Co-authored-by: Addison Phillips <[email protected]> Co-authored-by: Eemeli Aro <[email protected]> --------- Co-authored-by: Addison Phillips <[email protected]> Co-authored-by: Eemeli Aro <[email protected]>
1 parent 1fccd1e commit b343bfa

File tree

2 files changed

+88
-20
lines changed

2 files changed

+88
-20
lines changed

spec/message.abnf

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -49,18 +49,41 @@ local = %s".local"
4949
match = %s".match"
5050

5151
; Names and identifiers
52-
; identifier matches https://www.w3.org/TR/REC-xml-names/#NT-QName
53-
; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD and U+061C
5452
identifier = [namespace ":"] name
5553
namespace = name
5654
name = [bidi] name-start *name-char [bidi]
57-
name-start = ALPHA / "_"
58-
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
59-
/ %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D
60-
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
61-
/ %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
55+
name-start = ALPHA
56+
; omit Cc: %x0-1F, Whitespace: SPACE, Ascii: «!"#$%&'()*»
57+
/ %x2B ; «+» omit Ascii: «,-./0123456789:;<=>?@» «[\]^»
58+
/ %x5F ; «_» omit Cc: %x7F-9F, Whitespace: %xA0, Ascii: «`» «{|}~»
59+
/ %xA1-61B ; omit BidiControl: %x61C
60+
/ %x61D-167F ; omit Whitespace: %x1680
61+
/ %x1681-1FFF ; omit Whitespace: %x2000-200A
62+
/ %x200B-200D ; omit BidiControl: %x200E-200F
63+
/ %x2010-2027 ; omit Whitespace: %x2028-2029 %x202F, BidiControl: %x202A-202E
64+
/ %x2030-205E ; omit Whitespace: %x205F
65+
/ %x2060-2065 ; omit BidiControl: %x2066-2069
66+
/ %x206A-2FFF ; omit Whitespace: %x3000
67+
/ %x3001-D7FF ; omit Cs: %xD800-DFFF
68+
/ %xE000-FDCF ; omit NChar: %xFDD0-FDEF
69+
/ %xFDF0-FFFD ; omit NChar: %xFFFE-FFFF
70+
/ %x10000-1FFFD ; omit NChar: %x1FFFE-1FFFF
71+
/ %x20000-2FFFD ; omit NChar: %x2FFFE-2FFFF
72+
/ %x30000-3FFFD ; omit NChar: %x3FFFE-3FFFF
73+
/ %x40000-4FFFD ; omit NChar: %x4FFFE-4FFFF
74+
/ %x50000-5FFFD ; omit NChar: %x5FFFE-5FFFF
75+
/ %x60000-6FFFD ; omit NChar: %x6FFFE-6FFFF
76+
/ %x70000-7FFFD ; omit NChar: %x7FFFE-7FFFF
77+
/ %x80000-8FFFD ; omit NChar: %x8FFFE-8FFFF
78+
/ %x90000-9FFFD ; omit NChar: %x9FFFE-9FFFF
79+
/ %xA0000-AFFFD ; omit NChar: %xAFFFE-AFFFF
80+
/ %xB0000-BFFFD ; omit NChar: %xBFFFE-BFFFF
81+
/ %xC0000-CFFFD ; omit NChar: %xCFFFE-CFFFF
82+
/ %xD0000-DFFFD ; omit NChar: %xDFFFE-DFFFF
83+
/ %xE0000-EFFFD ; omit NChar: %xEFFFE-EFFFF
84+
/ %xF0000-FFFFD ; omit NChar: %xFFFFE-FFFFF
85+
/ %x100000-10FFFD ; omit NChar: %x10FFFE-10FFFF
6286
name-char = name-start / DIGIT / "-" / "."
63-
/ %xB7 / %x300-36F / %x203F-2040
6487

6588
; Restrictions on characters in various contexts
6689
simple-start-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)

spec/syntax.md

Lines changed: 57 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -777,6 +777,8 @@ that is, if they consist of the same sequence of Unicode code points after
777777
[Unicode Normalization Form C](https://unicode.org/reports/tr15/) ("NFC")
778778
has been applied to both.
779779

780+
The _names_ are [immutable identifiers](https://www.unicode.org/reports/tr31/#Immutable_Identifier_Syntax).
781+
780782
> [!NOTE]
781783
> Implementations are not required to normalize all _names_.
782784
> Comparisons of _name_ values only need be done "as-if" normalization
@@ -786,12 +788,6 @@ has been applied to both.
786788
> implementations can often substitute checking for actually applying normalization
787789
> to _name_ values.
788790
789-
Valid content for _names_ is based on <cite>Namespaces in XML 1.0</cite>'s
790-
[NCName](https://www.w3.org/TR/xml-names/#NT-NCName).
791-
This is different from XML's [Name](https://www.w3.org/TR/xml/#NT-Name)
792-
in that it MUST NOT contain a U+003A COLON `:`.
793-
Otherwise, the set of characters allowed in a _name_ is large.
794-
795791
> [!NOTE]
796792
> _External variables_ can be passed in that are not valid _names_.
797793
> Such variables cannot be referenced in a _message_,
@@ -843,15 +839,64 @@ option = identifier o "=" o (literal / variable)
843839
identifier = [namespace ":"] name
844840
namespace = name
845841
name = [bidi] name-start *name-char [bidi]
846-
name-start = ALPHA / "_"
847-
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
848-
/ %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D
849-
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
850-
/ %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
842+
name-start = ALPHA
843+
; omit Cc: %x0-1F, Whitespace: « », Ascii: «!"#$%&'()*»
844+
/ %x2B ; «+» omit Ascii: «,-./0123456789:;<=>?@» «[\]^»
845+
/ %x5F ; «_» omit Cc: %x7F-9F, Whitespace: %xA0, Ascii: «`» «{|}~»
846+
/ %xA1-61B ; omit BidiControl: %x61C
847+
/ %x61D-167F ; omit Whitespace: %x1680
848+
/ %x1681-1FFF ; omit Whitespace: %x2000-200A
849+
/ %x200B-200D ; omit BidiControl: %x200E-200F
850+
/ %x2010-2027 ; omit Whitespace: %x2028-2029 %x202F, BidiControl: %x202A-202E
851+
/ %x2030-205E ; omit Whitespace: %x205F
852+
/ %x2060-2065 ; omit BidiControl: %x2066-2069
853+
/ %x206A-2FFF ; omit Whitespace: %x3000
854+
/ %x3001-D7FF ; omit Cs: %xD800-DFFF
855+
/ %xE000-FDCF ; omit NChar: %xFDD0-FDEF
856+
/ %xFDF0-FFFD ; omit NChar: %xFFFE-FFFF
857+
/ %x10000-1FFFD ; omit NChar: %x1FFFE-1FFFF
858+
/ %x20000-2FFFD ; omit NChar: %x2FFFE-2FFFF
859+
/ %x30000-3FFFD ; omit NChar: %x3FFFE-3FFFF
860+
/ %x40000-4FFFD ; omit NChar: %x4FFFE-4FFFF
861+
/ %x50000-5FFFD ; omit NChar: %x5FFFE-5FFFF
862+
/ %x60000-6FFFD ; omit NChar: %x6FFFE-6FFFF
863+
/ %x70000-7FFFD ; omit NChar: %x7FFFE-7FFFF
864+
/ %x80000-8FFFD ; omit NChar: %x8FFFE-8FFFF
865+
/ %x90000-9FFFD ; omit NChar: %x9FFFE-9FFFF
866+
/ %xA0000-AFFFD ; omit NChar: %xAFFFE-AFFFF
867+
/ %xB0000-BFFFD ; omit NChar: %xBFFFE-BFFFF
868+
/ %xC0000-CFFFD ; omit NChar: %xCFFFE-CFFFF
869+
/ %xD0000-DFFFD ; omit NChar: %xDFFFE-DFFFF
870+
/ %xE0000-EFFFD ; omit NChar: %xEFFFE-EFFFF
871+
/ %xF0000-FFFFD ; omit NChar: %xFFFFE-FFFFF
872+
/ %x100000-10FFFD ; omit NChar: %x10FFFE-10FFFF
851873
name-char = name-start / DIGIT / "-" / "."
852-
/ %xB7 / %x300-36F / %x203F-2040
853874
```
854875
876+
> [!NOTE]
877+
> Syntactically, the definitions of `identifier` and `name-char` provide backwards compatibility over time by allowing a stable,
878+
> wide range of characters.
879+
> So when there is a new character in a version of Unicode, it can be used in any conformant implementation of MessageFormat.
880+
> The definition currently excludes:
881+
> * Most ASCII except for letters and characters used for numbers
882+
> * This avoids conflicts with syntax characters, and reserves some characters for future syntax.
883+
> * Bidirectional controls (`Bidi_C`)
884+
> * Control characters (`GC=Cc`, but not Format characters: `GC=Cf`)
885+
> * Whitespace characters (`WSpace`)
886+
> * Surrogate code points (`GC=Cs`)
887+
> * Non-Characters (`NChar`)
888+
889+
This syntax allows a wide range of characters in _names_ and _identifiers_.
890+
Implementers and authors of _functions_ and _messages_,
891+
including _functions_, _options_, and _operands_ (variable names),
892+
SHOULD avoid creating _names_ that could produce confusion or harm usability
893+
by choosing names consistent with the following guidelines.
894+
MessageFormat tools, such as linters, SHOULD warn when _names_ chosen by users
895+
violate these constraints.
896+
>
897+
> 1. [Unicode Default Identifier Syntax](https://www.unicode.org/reports/tr31/#Default_Identifier_Syntax)
898+
> 2. [Unicode General Security Profile for Identifiers](https://www.unicode.org/reports/tr39/#General_Security_Profile)
899+
855900
### Escape Sequences
856901

857902
An **_<dfn>escape sequence</dfn>_** is a two-character sequence starting with

0 commit comments

Comments
 (0)