MF2.0 compromise syntax #266

markusicu · 2022-05-13T23:41:01Z

No description provided.

aphillips

Lots of tiny comments, but think this is directionally correct and similar to what we've been discussing.

spec/compromise-syntax.md

aphillips · 2022-05-14T00:03:56Z

spec/compromise-syntax.md

+instead of using an argument name;
+and we also allow for invoking functions without using argument names or value literals.
+```
+{$name}


I guess I'm discovering a little reluctance around using $ as the variable identifier, mainly based on "I have tons of strings with place holders like {someVar} that need to be {$someVar}. It also means that I can't just take my arg map--I need to decorate the variable names with a $ before I can use it. Since function and format names are decorated with a : and literals are delimited with <>, do we need the $?

This compromise syntax builds on what the committee has done. AFAICT the dollar prefix seemed part of what consensus was able to form. I am personally not particularly wedded to it.

For a parser, it would be slightly easier to look for one of very few special characters. If argument names didn't have a prefix, then a parser would have to look for any identifier-start character. Given that it has to anyway do so immediately after a prefix character, it would not really add significant complication. It just comes down to what we think developers reading and writing message strings will find helpful or confusing.

aphillips · 2022-05-14T00:08:25Z

spec/compromise-syntax.md

+then the formatting function is inferred from the run-time type of the argument value.
+For example, a string value would simply be inserted,
+and a numeric type could be formatted using some kind of default number formatter.
+- TODO: In the registry, specify the default formatters for a small set of value types.


Perhaps this is upside down. The registry should specify a set of formatters (to which an implementation can add) and these can "register" what types they service (and in what priority order). At Amazon our message formatter has a currency formatter function (PriceFormat) that handles Price objects--the Price object extends Number, but PriceFormat takes priority for that type.

With “the registry” I mean the future CLDR file that defines functions with their names, options, and semantics. That should include what formatter to use for a numeric argument when no function is explicitly specified in the message. This registry could specify a different formatter for a subtype.

It sounds like what you are referring to would be some runtime object that can dynamically handle types and formatters. I think that's out of scope for this document.

aphillips · 2022-05-14T00:09:38Z

spec/compromise-syntax.md

+then the function is usually a formatter for its expected input types.
+- TODO: There still seems to be discussion about the function prefix character.
+  It could be some other ASCII punctuation, for example `@`.
+- TODO: Functions must be listed in a registry.


... or installed by the implementation

Probably specify that unrecognized formats are an error or run toString equivalent?

Right, “private use” functions need not be in the CLDR registry. I suspect that each organization would have its own registry of some kind, but mostly what this means is that there is documentation for the name, options, and semantics of each function. I don't expect this sort of registry to be parsed by implementations to actually implement formatters -- only to do validation and linting. So the formatter implementations are of course implementation-defined.

I think that a message formatting library should by default fail with an error when it does not recognize a function name. That includes functions that are registered, but not supported by a particular implementation.

spec/compromise-syntax.md

aphillips · 2022-05-14T00:31:54Z

spec/compromise-syntax.md

+   For example, selectors for plural variants
+   (different selectors for cardinal-number vs. ordinal-number variants)


that's not formatting: that's the selector type. If I say [{$count :plural type=ordinal}] I expect to get keywords out like one, few, etc. or access the numeric value of $count for selectors such as =2---just like plural rules work today.

I agree that options are needed for the selector (as shown), but tend to expect that I can still format the value with a placeholder later. In fact, I might format differently several times.

that's not formatting: that's the selector type. If I say [{$count :plural type=ordinal}] I expect to get keywords out like one, few, etc. or access the numeric value of $count for selectors such as =2---just like plural rules work today.

I assumed that there would be different function names for plural/cardinal vs. plural/ordinal, like we have in ICU MessageFormat. But yes, it could be one "plural" function with an option.

I agree that options are needed for the selector (as shown), but tend to expect that I can still format the value with a placeholder later. In fact, I might format differently several times.

That should be strongly discouraged, especially looking at plurals. Formatting differently from what the selection was based on creates a jarring mismatch. We should design this to make it easy to do the right thing. If you need different formatting in a different part of the sentence, you can pass the same value in another argument, and you can also define a named expression for it.

aphillips · 2022-05-14T00:32:51Z

spec/compromise-syntax.md

+   For example, selectors for plural variants
+   (different selectors for cardinal-number vs. ordinal-number variants)
+   have to take into account how the number is formatted.
+2. Format-only functions can be used as selectors via


but only if the output doesn't contain spaces?

We can decide to support spaces by allowing or requiring delimiters around the variant values.

aphillips · 2022-05-14T00:33:51Z

spec/compromise-syntax.md

+3. Select-only functions select among variant values, but they cannot be used in pattern placeholders.
+
+There is a simple format-only function that can be used for simple string matching.
+TODO: Decide on a name for this format-only function. Consider `:string`.


:select recommends itself, since we already have one just like this? Or is this different?

The suggestion is to build on allowing format-only functions as selectors. Calling a formatter :string makes more sense than calling it :select.

aphillips · 2022-05-14T00:38:09Z

spec/compromise-syntax.md

+Inside selected patterns,
+the selector argument variables must not be used with the normal `$` placeholder syntax –
+for example, the patterns in the preceding example must not use `{$count}`.


So I can't write:

[{$count :plural}] [=0] {You have no items in your cart} [one] {You have {$count :number style=spellout} item in your cart} [_] {You have {$count : number style=spellout} items in your cart}

This seems hard for users to understand. They passed the argument by name. Why can't the format it? It isn't like the value has been consumed by whatever selector ate it previously.

For plurals in particular, the formatting and selection are tied at the hip. If the spelled-out version of the number does not work grammatically like the :plural select-and-format function expected, then you get unhappy users.

Ignore the style. The point I'm making is that your text says I cannot use the variable $count and a different formatter after having used it with the plural selector.

Yes, for the stated reasons. Don't give users rope to hang themselves if we can avoid it :-)

There is a problem with the following

[{$count :plural}] [=0] {You have no items in your cart} [one] {You have {$count :number style=spellout} item in your cart} [_] {You have {$count : number style=spellout} items in your cart}

The plural categories are tied to the hip with the formatting. With the input number 1.01d, in some languages the category is 'one' if the format is an integer, but 'other' if the format has one (or more) decimals. So you can't actually correctly compute the plural category until you've formatted.

There are two ways to solve this:

tie the plural category to the formatted value, by having the formatting information up front, or

require the formatting information to be identical for every instance of the placeholder (eg it is an error if they are different)

It actually works pretty nicely to have the formatter return the plural category as an (optional) byproduct of formatting, because an intermediate step to producing the formatted number is typically the exact data necessary to compute the plural category. So the cleanest is to have a syntax that draws on that in some way. There are of course a few ways to do that. One is to use an assignment, and the other would be to have the formatting options in the selector, eg

[{$count :number style=spellout}] [=0] {You have no items in your cart} [one] {You have {$count} item in your cart} [_] {You have {$count} items in your cart}

eemeli

The MFWG has recently spent a considerable amount of effort in coming up with a single starting point for our syntax that is sufficiently good to act as a base for further discussions. Many of those further discussion topics have been identified, with the intent that we might be able to discuss and resolve them somewhat independently.

This PR upends that working model quite thoroughly, and once again sets us up with two entire solutions pitted against each other. Our past experience of discussions around the data model in particular would indicate that this is not a likely source of joy and success.

While there are certainly good ideas in this PR, it does need to be split up into multiple PRs modifying spec/syntax.md so that each part may be considered on its own merits.

markusicu · 2022-05-19T05:37:57Z

The MFWG has recently spent a considerable amount of effort in coming up with a single starting point for our syntax that is sufficiently good to act as a base for further discussions. Many of those further discussion topics have been identified, with the intent that we might be able to discuss and resolve them somewhat independently.

The current syntax.md in the "develop" branch has a lot of good points, but I and others have also pointed out a number of problems and made friendly suggestions on the previous slide deck version and then on the pull request that were largely not accommodated before merging.

I didn't think that having a file in the "develop" branch gives it special status in the WG's process; maybe I was wrong.

This PR upends that working model quite thoroughly, and once again sets us up with two entire solutions pitted against each other. Our past experience of discussions around the data model in particular would indicate that this is not a likely source of joy and success.

While there are certainly good ideas in this PR, it does need to be split up into multiple PRs modifying spec/syntax.md so that each part may be considered on its own merits.

It might work to debate lots of feedback items in isolation, but that can also lead to going around in circles; I have experienced that myself in another standards effort last year. It's like getting lost in the trees and not seeing the forest. In the end, in that other committee we had to consider many issues together and look at what the whole system looks like with one whole set of choices vs. another whole set.

So this is what I am offering here: Roll a whole set of choices, intended to deal with multiple problems together, into one complete and coherent compromise syntax.

Note that I did not start from scratch and just invent my own thing out of whole cloth. I started from what I think are the good points of "develop" syntax.md and tried to fix what I think are the problems with it.

As I worked on this compromise version of the whole thing, I actually ended up doing some things differently from some of my own earlier feedback, especially the part about starting in "code" mode -- because as I was looking at the whole thing, it became clear that that is the better option.

I also tried to provide a rationale for every part and choice of the compromise syntax.

I left various TODOs for details where I think there is no obvious right answer, or it's really just a style preference.

echeran · 2022-05-20T00:11:33Z

Prior to post-CLDR-committee syntax discussions, my preferred syntax was the one put forth in the EM proposal. It's slightly more verbose than what was merged in #230, but I like the consistency and readability of it. As mentioned previously (in the PR and mtg), I don't feel that syntax and some other comments were reflected much in #230.

This PR uses #230 as a starting point, so it differs from what I would like, but it does avoid taking on some of the things that I have found confusing or unnecessary, etc. I do appreciate that the rationale for the significant design points are explained well, and the areas of options/bike-shedding are demarcated with comments. Some of the explanations and differences are things I/we had not considered previously but are important. So I'm okay with the differences between this PR as it is now and my original preferred syntax -- they definitely feel acceptable.

I agree with earlier comments that this PR is heading in the right direction. I think that this PR gets us to a place that would be closer to a solution than before.

zbraniecki · 2022-05-20T02:42:03Z

I'm concerned about conflating the value of the arguments made in this PR with the format in which they're proposed. Elango's comment in particular feels like it links the two.

I think that this PR gets us to a place that would be closer to a solution than before.

Would you write the same if this PR was, in your opinion, moving us away from the solution you'd like us to end up with?

In other words. I think as a WG we're bouncing back and forth between collaborative ("Let's find a common ground") and competitive ("Here's my proposal and here are people that agree with me") approaches and micro ("Let's zero-down on question X in isolation") and macro ("Let's propose a cohesive solution to a class of questions") approaches.
I think the variations are healthy, but the way they are introduced feels hectic and deteriorates trust in process we establish.

In particular, Eemeli and Stas asked everyone multiple times if they should take the task of coming up with a single proposal that could be merged into the tree as a "competitive macro" approach, and serve to start "collaborative micro" discussions, issues and PRs out of.
They did this, in my understanding, because they wanted to make sure that they have support of all of us to go ahead, take all the feedback and arguments we all laid down over the last two years, break the ties and design a cohesive, opinionated solution. They promised to take all of the feedback and arguments into account, but asked for license to decide on their own which ones to follow.
That means, they asked if if they decide to dismiss one of us preferences, will we accept this outcome as a starting point and file an issue, or dismiss the result of their work as "not sufficiently incorporating what I want".

I felt we, as a WG, explicitly responded to their question by giving them authority to go on.

They spent last months debating every argument and every spectrum we disagreed on, taking feedback from CLDR-TC and all stakeholders and wrote the #230.

Now, this PR does similar thing, with, I assume, slightly less background since Markus did not spend 2 years in every WG meeting debating every point till exhaustion.
This is valuable, since some of those disagreements are more likely to lead to "win by persistence" than "the best solution wins", but it also asks everyone who notices Markus' PR not including something we discussed to now explain that to Markus which repeats the work we've all done over those two years of surfacing all possible arguments.
I consider Stas and Eemeli to posses deeper understanding of the MF2.0 WG problem space.

My main point is that I think there are two separate themes in response to this PR:

Merit value of this PR and this proposal. How good is it, how many "checkboxes" it checks, what tradeoffs it makes and how aligned it is with our goals.
How it affects the work group

On the merit, I think it's a good proposal. I don't think it's better than what Stas and Eemeli put forth, but I'd be happy to see a revision of Stas and Eemeli's proposal with Markus' key themes incorporated.

On the effect on the group, I imagine it may be deteriorating trust in the system if the WG asks people to do a very daunting and challenging task based on a agreed process, and then discards it on a whim.
I can see Elango's frustration that the result of Stas and Eemeli's work does not align with Elango's positions - but I believe this is what we asked them to do. Decide. And they trusted that if they take a challenging task of making decisions, we will accept them (as a starting point) even if those decisions diverge from our preference.
And they asked us to go on and file issues against their solution, rather than introduce a second full proposal.

If we as a group believe that that process was not optimal, and it's better to have new proposal be the starting point, I think we should explicitly discuss it, recognize the change and maybe ask ourselves how can we avoid in the future putting people in position that we put Stas and Eemeli in.

spec/compromise-syntax.md

echeran · 2022-05-23T16:30:11Z

...the way they are introduced feels hectic and deteriorates trust in process we establish.

These are the thoughts and feelings behind my previous concerns about #230 and the discussion we had in Monday's meeting. It sounds like, as a group, we still need to discuss these issues, because these particular non-technical issues are important to how well we work as a group.

... dismiss the result of their work as "not sufficiently incorporating what I want".

To be clear, I'm most interested in getting the best technical solution, and the technical discussions to get us there. (...which is why I care about the thoroughness of our technical arguments, too.) It's not about my ideas as it is having the best ideas percolate to the top. Of course, ideas need to be represented to have a fair shake at that. I don't enjoy the non-technical discussions that take us away from the technical ones, but these particular concerns are important to hash out for the health of the group.

...discards it on a whim. I can see Elango's frustration that the result of Stas and Eemeli's work does not align with Elango's positions

Please don't misrepresent. We clearly need to talk this out.

... how can we avoid in the future putting people in position that we put Stas and Eemeli in.

Keep in mind that we also had other people not included or whose our previous feedback was at risk of the same, as we discussed earlier.

...I think we should explicitly discuss...

Yes

as suggested in #270

gibson042 · 2022-05-31T18:11:38Z

spec/compromise-syntax.md

+- TODO: Value literals need to be delimited (they may contain spaces),
+  and the starting delimiter needs to be distinct from the prefixes for
+  argument names and functions.
+  Reasonable choices include `<>`, `()`, `[]`, `||`, or a pair of `` characters.


I am not sure how to use Markdown to show a pair of grave accents in code style...

By using a longer string of enclosing backticks (cf. CommonMark Code spans).

Suggested change

Reasonable choices include `<>`, `()`, `[]`, `||`, or a pair of `` characters.

Reasonable choices include `<>`, `()`, `[]`, `||`, or ``` `` ```.

macchiati · 2022-05-31T18:16:37Z

If at all possible, you want string delimiters that are less likely to occur in strings.

…

On Tue, May 31, 2022 at 11:11 AM Richard Gibson ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In spec/compromise-syntax.md <#266 (comment)> : > + +Options are not allowed when no function is specified. + +Value literals are important for developers to control the output. +For example, certain strings may need to be inlined as literals so that +they are not changed during translation. +Numeric constants need to be formatted differently depending on the target language +(e.g., which digits and separators, and the grouping style). +Date constants need to be formatted according to the target language’s calendar system. + +If only a value literal is given, without specifying a function, +then its string value is used verbatim and it is read-only for translators. +- TODO: Value literals need to be delimited (they may contain spaces), + and the starting delimiter needs to be distinct from the prefixes for + argument names and functions. + Reasonable choices include `<>`, `()`, `[]`, `||`, or a pair of `` characters. I am not sure how to use Markdown to show a pair of grave accents in code style... By using a longer string of enclosing backticks (cf. CommonMark Code spans <https://spec.commonmark.org/0.30/#code-spans>). ⬇️ Suggested change - Reasonable choices include `<>`, `()`, `[]`, `||`, or a pair of `` characters. + Reasonable choices include `<>`, `()`, `[]`, `||`, or ``` `` ```. — Reply to this email directly, view it on GitHub <#266 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJLEMDUIUF72ODINR5XPVTVMZI6PANCNFSM5V4XXWOQ> . You are receiving this because you commented.Message ID: ***@***.***>

stasm · 2022-10-18T14:16:24Z

@markusicu We're cleaning up old branches and PRs in the repo. For posterity, would you mind moving the contents of your branch to the experiments folder on the experiments branch?

eemeli · 2023-01-23T15:29:06Z

The syntax proposed in this PR is now available here: https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/markus/compromise-syntax.md

eemeli · 2023-02-01T06:54:02Z

Closing the PR, as agreed at the last meeting.

MF2.0 compromise syntax

0475f8f

markusicu requested review from aphillips, stasm, zbraniecki, eemeli, echeran, mihnita, romulocintra and nbouvrette May 13, 2022 23:41

aphillips approved these changes May 14, 2022

View reviewed changes

eemeli requested changes May 14, 2022

View reviewed changes

macchiati reviewed May 22, 2022

View reviewed changes

spec/compromise-syntax.md Outdated Show resolved Hide resolved

markusicu added 2 commits May 30, 2022 15:38

use * for selection wildcard

572caeb

as suggested in #270

discuss more value literal delimiters

64743ff

markusicu force-pushed the markus-compromise-syntax branch from 645d2d3 to 64743ff Compare May 30, 2022 23:02

markusicu mentioned this pull request May 31, 2022

Preamble: If we start in the "code" mode, do we need to delimit the preamble? #251

Closed

gibson042 reviewed May 31, 2022

View reviewed changes

arg names with dots, fn name like Java pkg, option value $arg

df28b99

markusicu mentioned this pull request Jun 7, 2022

References to runtime values and attributes #209

Closed

echeran mentioned this pull request Jun 22, 2022

Drop restriction on using keywords in the syntax #286

Closed

romulocintra mentioned this pull request Oct 19, 2022

Prepare for the tech preview release #303

Closed

10 tasks

eemeli added the resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. label Jan 23, 2023

eemeli closed this Feb 1, 2023

eemeli deleted the markus-compromise-syntax branch February 1, 2023 06:54

		For example, selectors for plural variants
		(different selectors for cardinal-number vs. ordinal-number variants)

	Reasonable choices include `<>`, `()`, `[]`, `\|\|`, or a pair of `` characters.
	Reasonable choices include `<>`, `()`, `[]`, `\|\|`, or ``` `` ```.

Uh oh!

MF2.0 compromise syntax #266

MF2.0 compromise syntax #266

Uh oh!

Conversation

markusicu commented May 13, 2022

Uh oh!

aphillips left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eemeli left a comment

Choose a reason for hiding this comment

Uh oh!

markusicu commented May 19, 2022

Uh oh!

echeran commented May 20, 2022

Uh oh!

zbraniecki commented May 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

echeran commented May 23, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

macchiati commented May 31, 2022 via email

Uh oh!

stasm commented Oct 18, 2022

Uh oh!

eemeli commented Jan 23, 2023

Uh oh!

eemeli commented Feb 1, 2023

Uh oh!

Uh oh!

zbraniecki commented May 20, 2022 •

edited

Loading