Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MF2.0 compromise syntax #266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

markusicu
Copy link
Member

No description provided.

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of tiny comments, but think this is directionally correct and similar to what we've been discussing.

instead of using an argument name;
and we also allow for invoking functions without using argument names or value literals.
```
{$name}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm discovering a little reluctance around using $ as the variable identifier, mainly based on "I have tons of strings with place holders like {someVar} that need to be {$someVar}. It also means that I can't just take my arg map--I need to decorate the variable names with a $ before I can use it. Since function and format names are decorated with a : and literals are delimited with <>, do we need the $?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This compromise syntax builds on what the committee has done. AFAICT the dollar prefix seemed part of what consensus was able to form. I am personally not particularly wedded to it.

For a parser, it would be slightly easier to look for one of very few special characters. If argument names didn't have a prefix, then a parser would have to look for any identifier-start character. Given that it has to anyway do so immediately after a prefix character, it would not really add significant complication. It just comes down to what we think developers reading and writing message strings will find helpful or confusing.

then the formatting function is inferred from the run-time type of the argument value.
For example, a string value would simply be inserted,
and a numeric type could be formatted using some kind of default number formatter.
- TODO: In the registry, specify the default formatters for a small set of value types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is upside down. The registry should specify a set of formatters (to which an implementation can add) and these can "register" what types they service (and in what priority order). At Amazon our message formatter has a currency formatter function (PriceFormat) that handles Price objects--the Price object extends Number, but PriceFormat takes priority for that type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With “the registry” I mean the future CLDR file that defines functions with their names, options, and semantics. That should include what formatter to use for a numeric argument when no function is explicitly specified in the message. This registry could specify a different formatter for a subtype.

It sounds like what you are referring to would be some runtime object that can dynamically handle types and formatters. I think that's out of scope for this document.

then the function is usually a formatter for its expected input types.
- TODO: There still seems to be discussion about the function prefix character.
It could be some other ASCII punctuation, for example `@`.
- TODO: Functions must be listed in a registry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or installed by the implementation

Probably specify that unrecognized formats are an error or run toString equivalent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, “private use” functions need not be in the CLDR registry. I suspect that each organization would have its own registry of some kind, but mostly what this means is that there is documentation for the name, options, and semantics of each function. I don't expect this sort of registry to be parsed by implementations to actually implement formatters -- only to do validation and linting. So the formatter implementations are of course implementation-defined.

I think that a message formatting library should by default fail with an error when it does not recognize a function name. That includes functions that are registered, but not supported by a particular implementation.

Comment on lines +179 to +180
For example, selectors for plural variants
(different selectors for cardinal-number vs. ordinal-number variants)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not formatting: that's the selector type. If I say [{$count :plural type=ordinal}] I expect to get keywords out like one, few, etc. or access the numeric value of $count for selectors such as =2---just like plural rules work today.

I agree that options are needed for the selector (as shown), but tend to expect that I can still format the value with a placeholder later. In fact, I might format differently several times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not formatting: that's the selector type. If I say [{$count :plural type=ordinal}] I expect to get keywords out like one, few, etc. or access the numeric value of $count for selectors such as =2---just like plural rules work today.

I assumed that there would be different function names for plural/cardinal vs. plural/ordinal, like we have in ICU MessageFormat. But yes, it could be one "plural" function with an option.

I agree that options are needed for the selector (as shown), but tend to expect that I can still format the value with a placeholder later. In fact, I might format differently several times.

That should be strongly discouraged, especially looking at plurals. Formatting differently from what the selection was based on creates a jarring mismatch. We should design this to make it easy to do the right thing. If you need different formatting in a different part of the sentence, you can pass the same value in another argument, and you can also define a named expression for it.

For example, selectors for plural variants
(different selectors for cardinal-number vs. ordinal-number variants)
have to take into account how the number is formatted.
2. Format-only functions can be used as selectors via
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but only if the output doesn't contain spaces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can decide to support spaces by allowing or requiring delimiters around the variant values.

3. Select-only functions select among variant values, but they cannot be used in pattern placeholders.

There is a simple format-only function that can be used for simple string matching.
TODO: Decide on a name for this format-only function. Consider `:string`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:select recommends itself, since we already have one just like this? Or is this different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion is to build on allowing format-only functions as selectors. Calling a formatter :string makes more sense than calling it :select.

Comment on lines +198 to +200
Inside selected patterns,
the selector argument variables must not be used with the normal `$` placeholder syntax –
for example, the patterns in the preceding example must not use `{$count}`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can't write:

[{$count :plural}]
[=0] {You have no items in your cart}
[one] {You have {$count :number style=spellout} item in your cart}
[_] {You have {$count : number style=spellout} items in your cart}

This seems hard for users to understand. They passed the argument by name. Why can't the format it? It isn't like the value has been consumed by whatever selector ate it previously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For plurals in particular, the formatting and selection are tied at the hip. If the spelled-out version of the number does not work grammatically like the :plural select-and-format function expected, then you get unhappy users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore the style. The point I'm making is that your text says I cannot use the variable $count and a different formatter after having used it with the plural selector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the stated reasons. Don't give users rope to hang themselves if we can avoid it :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem with the following

[{$count :plural}]
[=0] {You have no items in your cart}
[one] {You have {$count :number style=spellout} item in your cart}
[_] {You have {$count : number style=spellout} items in your cart}

The plural categories are tied to the hip with the formatting. With the input number 1.01d, in some languages the category is 'one' if the format is an integer, but 'other' if the format has one (or more) decimals. So you can't actually correctly compute the plural category until you've formatted.

There are two ways to solve this:

  1. tie the plural category to the formatted value, by having the formatting information up front, or
  2. require the formatting information to be identical for every instance of the placeholder (eg it is an error if they are different)

It actually works pretty nicely to have the formatter return the plural category as an (optional) byproduct of formatting, because an intermediate step to producing the formatted number is typically the exact data necessary to compute the plural category. So the cleanest is to have a syntax that draws on that in some way. There are of course a few ways to do that. One is to use an assignment, and the other would be to have the formatting options in the selector, eg

[{$count :number style=spellout}]
[=0] {You have no items in your cart}
[one] {You have {$count} item in your cart}
[_] {You have {$count} items in your cart}

Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MFWG has recently spent a considerable amount of effort in coming up with a single starting point for our syntax that is sufficiently good to act as a base for further discussions. Many of those further discussion topics have been identified, with the intent that we might be able to discuss and resolve them somewhat independently.

This PR upends that working model quite thoroughly, and once again sets us up with two entire solutions pitted against each other. Our past experience of discussions around the data model in particular would indicate that this is not a likely source of joy and success.

While there are certainly good ideas in this PR, it does need to be split up into multiple PRs modifying spec/syntax.md so that each part may be considered on its own merits.

@markusicu
Copy link
Member Author

The MFWG has recently spent a considerable amount of effort in coming up with a single starting point for our syntax that is sufficiently good to act as a base for further discussions. Many of those further discussion topics have been identified, with the intent that we might be able to discuss and resolve them somewhat independently.

The current syntax.md in the "develop" branch has a lot of good points, but I and others have also pointed out a number of problems and made friendly suggestions on the previous slide deck version and then on the pull request that were largely not accommodated before merging.

I didn't think that having a file in the "develop" branch gives it special status in the WG's process; maybe I was wrong.

This PR upends that working model quite thoroughly, and once again sets us up with two entire solutions pitted against each other. Our past experience of discussions around the data model in particular would indicate that this is not a likely source of joy and success.

While there are certainly good ideas in this PR, it does need to be split up into multiple PRs modifying spec/syntax.md so that each part may be considered on its own merits.

It might work to debate lots of feedback items in isolation, but that can also lead to going around in circles; I have experienced that myself in another standards effort last year. It's like getting lost in the trees and not seeing the forest. In the end, in that other committee we had to consider many issues together and look at what the whole system looks like with one whole set of choices vs. another whole set.

So this is what I am offering here: Roll a whole set of choices, intended to deal with multiple problems together, into one complete and coherent compromise syntax.

Note that I did not start from scratch and just invent my own thing out of whole cloth. I started from what I think are the good points of "develop" syntax.md and tried to fix what I think are the problems with it.

As I worked on this compromise version of the whole thing, I actually ended up doing some things differently from some of my own earlier feedback, especially the part about starting in "code" mode -- because as I was looking at the whole thing, it became clear that that is the better option.

I also tried to provide a rationale for every part and choice of the compromise syntax.

I left various TODOs for details where I think there is no obvious right answer, or it's really just a style preference.

@echeran
Copy link
Collaborator

echeran commented May 20, 2022

Prior to post-CLDR-committee syntax discussions, my preferred syntax was the one put forth in the EM proposal. It's slightly more verbose than what was merged in #230, but I like the consistency and readability of it. As mentioned previously (in the PR and mtg), I don't feel that syntax and some other comments were reflected much in #230.

This PR uses #230 as a starting point, so it differs from what I would like, but it does avoid taking on some of the things that I have found confusing or unnecessary, etc. I do appreciate that the rationale for the significant design points are explained well, and the areas of options/bike-shedding are demarcated with comments. Some of the explanations and differences are things I/we had not considered previously but are important. So I'm okay with the differences between this PR as it is now and my original preferred syntax -- they definitely feel acceptable.

I agree with earlier comments that this PR is heading in the right direction. I think that this PR gets us to a place that would be closer to a solution than before.

@zbraniecki
Copy link
Member

zbraniecki commented May 20, 2022

I'm concerned about conflating the value of the arguments made in this PR with the format in which they're proposed. Elango's comment in particular feels like it links the two.

I think that this PR gets us to a place that would be closer to a solution than before.

Would you write the same if this PR was, in your opinion, moving us away from the solution you'd like us to end up with?

In other words. I think as a WG we're bouncing back and forth between collaborative ("Let's find a common ground") and competitive ("Here's my proposal and here are people that agree with me") approaches and micro ("Let's zero-down on question X in isolation") and macro ("Let's propose a cohesive solution to a class of questions") approaches.
I think the variations are healthy, but the way they are introduced feels hectic and deteriorates trust in process we establish.

In particular, Eemeli and Stas asked everyone multiple times if they should take the task of coming up with a single proposal that could be merged into the tree as a "competitive macro" approach, and serve to start "collaborative micro" discussions, issues and PRs out of.
They did this, in my understanding, because they wanted to make sure that they have support of all of us to go ahead, take all the feedback and arguments we all laid down over the last two years, break the ties and design a cohesive, opinionated solution. They promised to take all of the feedback and arguments into account, but asked for license to decide on their own which ones to follow.
That means, they asked if if they decide to dismiss one of us preferences, will we accept this outcome as a starting point and file an issue, or dismiss the result of their work as "not sufficiently incorporating what I want".

I felt we, as a WG, explicitly responded to their question by giving them authority to go on.

They spent last months debating every argument and every spectrum we disagreed on, taking feedback from CLDR-TC and all stakeholders and wrote the #230.

Now, this PR does similar thing, with, I assume, slightly less background since Markus did not spend 2 years in every WG meeting debating every point till exhaustion.
This is valuable, since some of those disagreements are more likely to lead to "win by persistence" than "the best solution wins", but it also asks everyone who notices Markus' PR not including something we discussed to now explain that to Markus which repeats the work we've all done over those two years of surfacing all possible arguments.
I consider Stas and Eemeli to posses deeper understanding of the MF2.0 WG problem space.

My main point is that I think there are two separate themes in response to this PR:

  • Merit value of this PR and this proposal. How good is it, how many "checkboxes" it checks, what tradeoffs it makes and how aligned it is with our goals.
  • How it affects the work group

On the merit, I think it's a good proposal. I don't think it's better than what Stas and Eemeli put forth, but I'd be happy to see a revision of Stas and Eemeli's proposal with Markus' key themes incorporated.

On the effect on the group, I imagine it may be deteriorating trust in the system if the WG asks people to do a very daunting and challenging task based on a agreed process, and then discards it on a whim.
I can see Elango's frustration that the result of Stas and Eemeli's work does not align with Elango's positions - but I believe this is what we asked them to do. Decide. And they trusted that if they take a challenging task of making decisions, we will accept them (as a starting point) even if those decisions diverge from our preference.
And they asked us to go on and file issues against their solution, rather than introduce a second full proposal.

If we as a group believe that that process was not optimal, and it's better to have new proposal be the starting point, I think we should explicitly discuss it, recognize the change and maybe ask ourselves how can we avoid in the future putting people in position that we put Stas and Eemeli in.

@echeran
Copy link
Collaborator

echeran commented May 23, 2022

...the way they are introduced feels hectic and deteriorates trust in process we establish.

These are the thoughts and feelings behind my previous concerns about #230 and the discussion we had in Monday's meeting. It sounds like, as a group, we still need to discuss these issues, because these particular non-technical issues are important to how well we work as a group.

... dismiss the result of their work as "not sufficiently incorporating what I want".

To be clear, I'm most interested in getting the best technical solution, and the technical discussions to get us there. (...which is why I care about the thoroughness of our technical arguments, too.) It's not about my ideas as it is having the best ideas percolate to the top. Of course, ideas need to be represented to have a fair shake at that. I don't enjoy the non-technical discussions that take us away from the technical ones, but these particular concerns are important to hash out for the health of the group.

...discards it on a whim. I can see Elango's frustration that the result of Stas and Eemeli's work does not align with Elango's positions

Please don't misrepresent. We clearly need to talk this out.

... how can we avoid in the future putting people in position that we put Stas and Eemeli in.

Keep in mind that we also had other people not included or whose our previous feedback was at risk of the same, as we discussed earlier.

...I think we should explicitly discuss...

Yes

- TODO: Value literals need to be delimited (they may contain spaces),
and the starting delimiter needs to be distinct from the prefixes for
argument names and functions.
Reasonable choices include `<>`, `()`, `[]`, `||`, or a pair of `` characters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to use Markdown to show a pair of grave accents in code style...

By using a longer string of enclosing backticks (cf. CommonMark Code spans).

Suggested change
Reasonable choices include `<>`, `()`, `[]`, `||`, or a pair of `` characters.
Reasonable choices include `<>`, `()`, `[]`, `||`, or ``` `` ```.

@macchiati
Copy link
Member

macchiati commented May 31, 2022 via email

@stasm
Copy link
Collaborator

stasm commented Oct 18, 2022

@markusicu We're cleaning up old branches and PRs in the repo. For posterity, would you mind moving the contents of your branch to the experiments folder on the experiments branch?

@eemeli
Copy link
Collaborator

eemeli commented Jan 23, 2023

The syntax proposed in this PR is now available here: https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/markus/compromise-syntax.md

@eemeli eemeli added the resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. label Jan 23, 2023
@eemeli
Copy link
Collaborator

eemeli commented Feb 1, 2023

Closing the PR, as agreed at the last meeting.

@eemeli eemeli closed this Feb 1, 2023
@eemeli eemeli deleted the markus-compromise-syntax branch February 1, 2023 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolve-candidate This issue appears to have been answered or resolved, and may be closed soon.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants