diff --git a/spec/formatting.md b/spec/formatting.md
index 34b5c5f028..c141451217 100644
--- a/spec/formatting.md
+++ b/spec/formatting.md
@@ -768,7 +768,16 @@ That is, the text can can consist of a mixture of left-to-right and right-to-lef
The display of bidirectional text is defined by the
[Unicode Bidirectional Algorithm](http://www.unicode.org/reports/tr9/) [UAX9].
-The directionality of the message as a whole is provided by the _formatting context_.
+The directionality of the formatted _message_ as a whole is provided by the _formatting context_.
+
+> [!NOTE]
+> Keep in mind the difference between the formatted output of a _message_,
+> which is the topic of this section,
+> and the syntax of _message_ prior to formatting.
+> The processing of a _message_ depends on the logical sequence of Unicode code points,
+> not on the presentation of the _message_.
+> Affordances to allow users appropriate control over the appearance of the
+> _message_'s syntax have been provided.
When a _message_ is formatted, _placeholders_ are replaced
with their formatted representation.
diff --git a/spec/message.abnf b/spec/message.abnf
index a5966ee0bf..0b251ce270 100644
--- a/spec/message.abnf
+++ b/spec/message.abnf
@@ -1,41 +1,41 @@
message = simple-message / complex-message
-simple-message = [s] [simple-start pattern]
+simple-message = o [simple-start pattern]
simple-start = simple-start-char / escaped-char / placeholder
pattern = *(text-char / escaped-char / placeholder)
placeholder = expression / markup
-complex-message = [s] *(declaration [s]) complex-body [s]
+complex-message = o *(declaration o) complex-body o
declaration = input-declaration / local-declaration
complex-body = quoted-pattern / matcher
-input-declaration = input [s] variable-expression
-local-declaration = local s variable [s] "=" [s] expression
+input-declaration = input o variable-expression
+local-declaration = local s variable o "=" o expression
-quoted-pattern = "{{" pattern "}}"
+quoted-pattern = o "{{" pattern "}}"
-matcher = match-statement s variant *([s] variant)
+matcher = match-statement s variant *(o variant)
match-statement = match 1*(s selector)
selector = variable
-variant = key *(s key) [s] quoted-pattern
+variant = key *(s key) quoted-pattern
key = literal / "*"
; Expressions
expression = literal-expression
/ variable-expression
/ function-expression
-literal-expression = "{" [s] literal [s function] *(s attribute) [s] "}"
-variable-expression = "{" [s] variable [s function] *(s attribute) [s] "}"
-function-expression = "{" [s] function *(s attribute) [s] "}"
+literal-expression = "{" o literal [s function] *(s attribute) o "}"
+variable-expression = "{" o variable [s function] *(s attribute) o "}"
+function-expression = "{" o function *(s attribute) o "}"
-markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone
- / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close
+markup = "{" o "#" identifier *(s option) *(s attribute) o ["/"] "}" ; open and standalone
+ / "{" o "/" identifier *(s option) *(s attribute) o "}" ; close
; Expression and literal parts
function = ":" identifier *(s option)
-option = identifier [s] "=" [s] (literal / variable)
+option = identifier o "=" o (literal / variable)
-attribute = "@" identifier [[s] "=" [s] (literal / variable)]
+attribute = "@" identifier [o "=" o (literal / variable)]
variable = "$" name
@@ -52,13 +52,13 @@ match = %s".match"
; Names and identifiers
; identifier matches https://www.w3.org/TR/REC-xml-names/#NT-QName
-; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD
+; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD and U+061C
identifier = [namespace ":"] name
namespace = name
-name = name-start *name-char
+name = [bidi] name-start *name-char [bidi]
name-start = ALPHA / "_"
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
- / %x370-37D / %x37F-1FFF / %x200C-200D
+ / %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
/ %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
name-char = name-start / DIGIT / "-" / "."
@@ -66,8 +66,8 @@ name-char = name-start / DIGIT / "-" / "."
; Restrictions on characters in various contexts
simple-start-char = content-char / "@" / "|"
-text-char = content-char / s / "." / "@" / "|"
-quoted-char = content-char / s / "." / "@" / "{" / "}"
+text-char = content-char / ws / "." / "@" / "|"
+quoted-char = content-char / ws / "." / "@" / "{" / "}"
content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
/ %x0B-0C ; omit CR (%x0D)
/ %x0E-1F ; omit SP (%x20)
@@ -83,5 +83,15 @@ content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
escaped-char = backslash ( backslash / "{" / "|" / "}" )
backslash = %x5C ; U+005C REVERSE SOLIDUS "\"
-; Whitespace
-s = 1*( SP / HTAB / CR / LF / %x3000 )
+; Required whitespace
+s = *bidi ws o
+
+; Optional whitespace
+o = *(ws / bidi)
+
+; Bidirectional marks and isolates
+; ALM / LRM / RLM / LRI, RLI, FSI & PDI
+bidi = %x061C / %x200E / %x200F / %x2066-2069
+
+; Whitespace characters
+ws = SP / HTAB / CR / LF / %x3000
diff --git a/spec/syntax.md b/spec/syntax.md
index aef6720684..ea55af8a06 100644
--- a/spec/syntax.md
+++ b/spec/syntax.md
@@ -134,17 +134,23 @@ A **_local variable_** is a _variable_ created as the result of a _lo
> > An exception to this is: whitespace inside a _pattern_ is **always** significant.
> [!NOTE]
-> The syntax assumes that each _message_ will be displayed with a left-to-right display order
+> The MessageFormat 2 syntax assumes that each _message_ will be displayed
+> with a left-to-right display order
> and be processed in the logical character order.
-> The syntax also permits the use of right-to-left characters in _identifiers_,
+> The syntax permits the use of right-to-left characters in _identifiers_,
> _literals_, and other values.
-> This can result in confusion when viewing the _message_.
+> This can result in confusion when viewing the message
+> or users might incorrectly insert bidi controls or marks that negatively affect the output
+> of the message.
+>
+> To assist with this, the syntax permits the use of various controls and
+> strongly-directional markers in both optional and required _whitespace_
+> in a _message_, as well was encouraging the use of isolating controls
+> with _expressions_ and _quoted patterns_.
+> See: [whitespace](#whitespace) (below) for more information.
>
-> Additional restrictions or requirements,
-> such as permitting the use of certain bidirectional control characters in the syntax,
-> might be added during the Tech Preview to better manage bidirectional text.
-> Feedback on the creation and management of _messages_
-> containing bidirectional tokens is strongly desired.
+> Additional restrictions or requirements might be added during the
+> Tech Preview to better manage bidirectional text.
A _message_ can be a _simple message_ or it can be a _complex message_.
@@ -160,7 +166,7 @@ Whitespace at the start or end of a _simple message_ is significant,
and a part of the _text_ of the _message_.
```abnf
-simple-message = [s] [simple-start pattern]
+simple-message = o [simple-start pattern]
simple-start = simple-start-char / escaped-char / placeholder
```
@@ -176,7 +182,7 @@ Whitespace at the start or end of a _complex message_ is not significant,
and does not affect the processing of the _message_.
```abnf
-complex-message = [s] *(declaration [s]) complex-body [s]
+complex-message = o *(declaration o) complex-body o
```
### Declarations
@@ -193,8 +199,8 @@ A **_local-declaration_** binds a _variable_ to the resolved value of
```abnf
declaration = input-declaration / local-declaration
-input-declaration = input [s] variable-expression
-local-declaration = local s variable [s] "=" [s] expression
+input-declaration = input o variable-expression
+local-declaration = local s variable o "=" o expression
```
_Variables_, once declared, MUST NOT be redeclared.
@@ -254,7 +260,7 @@ A _quoted pattern_ starts with a sequence of two U+007B LEFT CURLY BRACKET `{{`
and ends with a sequence of two U+007D RIGHT CURLY BRACKET `}}`.
```abnf
-quoted-pattern = "{{" pattern "}}"
+quoted-pattern = o "{{" pattern "}}"
```
A _quoted pattern_ MAY be empty.
@@ -285,8 +291,8 @@ be preserved during formatting.
```abnf
simple-start-char = content-char / "@" / "|"
-text-char = content-char / s / "." / "@" / "|"
-quoted-char = content-char / s / "." / "@" / "{" / "}"
+text-char = content-char / ws / "." / "@" / "|"
+quoted-char = content-char / ws / "." / "@" / "{" / "}"
content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
/ %x0B-0C ; omit CR (%x0D)
/ %x0E-1F ; omit SP (%x20)
@@ -352,7 +358,7 @@ otherwise, a corresponding _Data Model Error_ will be produced during processing
_Literal_ _keys_ are compared by their contents, not their syntactical appearance.
```abnf
-matcher = match-statement s variant *([s] variant)
+matcher = match-statement s variant *(o variant)
match-statement = match 1*(s selector)
```
@@ -425,7 +431,7 @@ Each _key_ is separated from each other by whitespace.
Whitespace is permitted but not required between the last _key_ and the _quoted pattern_.
```abnf
-variant = key *(s key) [s] quoted-pattern
+variant = key *(s key) quoted-pattern
key = literal / "*"
```
@@ -461,9 +467,9 @@ A **_function-expression_** contains a _function_ without an _operand
expression = literal-expression
/ variable-expression
/ function-expression
-literal-expression = "{" [s] literal [s function] *(s attribute) [s] "}"
-variable-expression = "{" [s] variable [s function] *(s attribute) [s] "}"
-function-expression = "{" [s] function *(s attribute) [s] "}"
+literal-expression = "{" o literal [s function] *(s attribute) o "}"
+variable-expression = "{" o variable [s function] *(s attribute) o "}"
+function-expression = "{" o function *(s attribute) o "}"
```
There are several types of _expression_ that can appear in a _message_.
@@ -549,7 +555,7 @@ and will produce a _Duplicate Option Name_ error during processing.
The order of _options_ is not significant.
```abnf
-option = identifier [s] "=" [s] (literal / variable)
+option = identifier o "=" o (literal / variable)
```
> Examples of _functions_ with _options_
@@ -594,8 +600,8 @@ It MAY include _options_.
is a _pattern_ part ending a span.
```abnf
-markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone
- / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close
+markup = "{" o "#" identifier *(s option) *(s attribute) o ["/"] "}" ; open and standalone
+ / "{" o "/" identifier *(s option) *(s attribute) o "}" ; close
```
> A _message_ with one `button` markup span and a standalone `img` markup element:
@@ -637,7 +643,7 @@ all but the last _attribute_ with the same _identifier_ are ignored.
The order of _attributes_ is not otherwise significant.
```abnf
-attribute = "@" identifier [[s] "=" [s] literal]
+attribute = "@" identifier [o "=" o literal]
```
> Examples of _expressions_ and _markup_ with _attributes_:
@@ -727,7 +733,12 @@ A **_name_** is a character sequence used in an _identifier_
or as the name for a _variable_
or the value of an _unquoted literal_.
-_Variable_ names are prefixed with `$`.
+A _name_ can be preceded or followed by bidirectional marks or isolating controls
+to aid in presenting names that contain right-to-left or neutral characters.
+These characters are **not** part of the value of the _name_ and MUST be treated as if they were not present
+when matching _name_ or _identifier_ strings or _unquoted literal_ values.
+
+_Variable_ _names_ are prefixed with `$`.
Valid content for _names_ is based on Namespaces in XML 1.0's
[NCName](https://www.w3.org/TR/xml-names/#NT-NCName).
@@ -763,14 +774,14 @@ in this release.
```abnf
variable = "$" name
-option = identifier [s] "=" [s] (literal / variable)
+option = identifier o "=" o (literal / variable)
identifier = [namespace ":"] name
namespace = name
-name = name-start *name-char
+name = [bidi] name-start *name-char [bidi]
name-start = ALPHA / "_"
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
- / %x370-37D / %x37F-1FFF / %x200C-200D
+ / %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
/ %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
name-char = name-start / DIGIT / "-" / "."
@@ -803,24 +814,112 @@ and inside _patterns_ only escape `{` and `}`.
### Whitespace
-**_Whitespace_** is defined as one or more of
-U+0009 CHARACTER TABULATION (tab),
-U+000A LINE FEED (new line),
-U+000D CARRIAGE RETURN,
-U+3000 IDEOGRAPHIC SPACE,
-or U+0020 SPACE.
+The syntax limits whitespace characters outside of a _pattern_ to the following:
+`U+0009 CHARACTER TABULATION` (tab),
+`U+000A LINE FEED` (new line),
+`U+000D CARRIAGE RETURN`,
+`U+3000 IDEOGRAPHIC SPACE`,
+or `U+0020 SPACE`.
Inside _patterns_ and _quoted literals_,
whitespace is part of the content and is recorded and stored verbatim.
Whitespace is not significant outside translatable text, except where required by the syntax.
+There are two whitespace productions in the syntax.
+**_Optional whitespace_** is whitespace that is not required by the syntax,
+but which users might want to include to increase the readability of a _message_.
+**_Required whitespace_** is whitespace that is required by the syntax.
+
+Both types of whitespace optionally permit the use of the bidirectional isolate controls
+and certain strongly directional marks.
+These can assist users in presenting _messages_ that contain right-to-left
+text, _literals_, or _names_ (including those for _functions_, _options_,
+_option values_, and _keys_)
+
+_Messages_ that contain right-to-left (aka RTL) characters SHOULD use one of the
+following mechanisms to make messages display intelligibly in plain-text editors:
+
+1. Use paired isolating bidi controls `U+2066 LEFT-TO-RIGHT ISOLATE` ("LRI")
+ and `U+2069 POP DIRECTIONAL ISOLATE` ("PDI") as permitted by the ABNF around
+ parts of any _message_ containing RTL characters:
+ - _inside_ of _placeholder_ markers `{` and `}`
+ - _outside_ _quoted-pattern_ markers `{{` and `}}`
+ - _outside_ of _variable_, _function_, _markup_, or _attribute_,
+ including the identifying sigil (e.g. `$var` or `:ns:name`)
+2. Use the 'local-effect' bidi marks
+ `U+061C ARABIC LETTER MARK`, `U+200E LEFT-TO-RIGHT MARK` or
+ `U+200F RIGHT-TO-LEFT MARK` as permitted by the ABNF before or after _identifiers_,
+ _names_, unquoted _literals_, or _option_ values,
+ especially when the values contain a mix of neutral, weakly directional, and
+ strongly directional characters.
+
+> [!IMPORTANT]
+> Always take care **not** to add bidirectional controls or marks
+> where they would be semantically significant
+> or where they would unintentionally become part of the _message_'s output:
+> - do not put them inside of a _literal_ except when they are part of the value,
+> (instead put them outside of _literal_ quotes, such as `|...|`)
+> - do not put them inside quoted _patterns_ except when they are part of the text,
+> (instead put them outside of quoted _patterns_, such as `{{...}}`)
+> - do not put them outside _placeholders_,
+> (instead put them inside the _placeholder_, such as `{$foo :number}`)
+>
+> Controls placed inside _literal_ quotes or quoted _patterns_ are part of the _literal_
+> or _pattern_.
+> Controls in a _pattern_ will appear in the output of the message.
+> Controls inside _literal_ quotes are part of the _literal_ and
+> will be considered in operations such as matching a _key_ to a _selector_.
+
+> [!NOTE]
+> Users cannot be expected to create or manage bidirectional controls or
+> marks in _messages_, since the characters are invisible and can be difficult
+> to manage.
+> Tools (such as resource editors or translation editors)
+> and other implementations of MessageFormat 2 serialization are strongly
+> encouraged to provide paired isolates around any right-to-left
+> syntax as described above so that _messages_ display appropriately as plain text.
+
+These definitions of _whitespace_ implement
+[UAX#31 Requirement R3a-2](https://www.unicode.org/reports/tr31/#R3a-2).
+It is a profile of R3a-1 in that specification because:
+- The following pattern whitespace characters are not allowed:
+ `U+000B FORM FEED`,
+ `U+000C VERTICAL TABULATION`,
+ `U+0085 NEXT LINE`,
+ `U+2028 LINE SEPARATOR` and
+ `U+2029 PARAGRAPH SEPARATOR`.
+- The character `U+3000 IDEOGRAPHIC SPACE`
+ _is_ interpreted as whitespace.
+ - The following directional marks and isolates
+ are treated as ignorable format controls:
+ `U+061C ARABIC LETTER MARK`,
+ `U+200E LEFT-TO-RIGHT MARK`,
+ `U+200F RIGHT-TO-LEFT MARK`,
+ `U+2066 LEFT-TO-RIGHT ISOLATE`,
+ `U+2067 RIGHT-TO-LEFT ISOLATE`,
+ `U+2068 FIRST STRONG ISOLATE`,
+ and `U+2069 POP DIRECTIONAL ISOLATE`.
+ (The character `U+061C` is an addition according to R3a.)
+
+
> [!NOTE]
> The character U+3000 IDEOGRAPHIC SPACE is included in whitespace for
> compatibility with certain East Asian keyboards and input methods,
> in which users might accidentally create these characters in a _message_.
```abnf
-s = 1*( SP / HTAB / CR / LF / %x3000 )
+; Required whitespace
+s = *bidi ws o
+
+; Optional whitespace
+o = *(s / bidi)
+
+; Bidirectional marks and isolates
+; ALM / LRM / RLM / LRI, RLI, FSI & PDI
+bidi = %x061C / %x200E / %x200F / %x2066-2069
+
+; Whitespace characters
+ws = SP / HTAB / CR / LF / %x3000
```
## Complete ABNF