-
-
Notifications
You must be signed in to change notification settings - Fork 36
MF2.0 compromise syntax #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of tiny comments, but think this is directionally correct and similar to what we've been discussing.
instead of using an argument name; | ||
and we also allow for invoking functions without using argument names or value literals. | ||
``` | ||
{$name} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm discovering a little reluctance around using $
as the variable identifier, mainly based on "I have tons of strings with place holders like {someVar}
that need to be {$someVar}
. It also means that I can't just take my arg map--I need to decorate the variable names with a $
before I can use it. Since function and format names are decorated with a :
and literals are delimited with <>
, do we need the $
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This compromise syntax builds on what the committee has done. AFAICT the dollar prefix seemed part of what consensus was able to form. I am personally not particularly wedded to it.
For a parser, it would be slightly easier to look for one of very few special characters. If argument names didn't have a prefix, then a parser would have to look for any identifier-start character. Given that it has to anyway do so immediately after a prefix character, it would not really add significant complication. It just comes down to what we think developers reading and writing message strings will find helpful or confusing.
then the formatting function is inferred from the run-time type of the argument value. | ||
For example, a string value would simply be inserted, | ||
and a numeric type could be formatted using some kind of default number formatter. | ||
- TODO: In the registry, specify the default formatters for a small set of value types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this is upside down. The registry should specify a set of formatters (to which an implementation can add) and these can "register" what types they service (and in what priority order). At Amazon our message formatter has a currency formatter function (PriceFormat
) that handles Price
objects--the Price
object extends Number
, but PriceFormat
takes priority for that type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With “the registry” I mean the future CLDR file that defines functions with their names, options, and semantics. That should include what formatter to use for a numeric argument when no function is explicitly specified in the message. This registry could specify a different formatter for a subtype.
It sounds like what you are referring to would be some runtime object that can dynamically handle types and formatters. I think that's out of scope for this document.
then the function is usually a formatter for its expected input types. | ||
- TODO: There still seems to be discussion about the function prefix character. | ||
It could be some other ASCII punctuation, for example `@`. | ||
- TODO: Functions must be listed in a registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or installed by the implementation
Probably specify that unrecognized formats are an error or run toString equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, “private use” functions need not be in the CLDR registry. I suspect that each organization would have its own registry of some kind, but mostly what this means is that there is documentation for the name, options, and semantics of each function. I don't expect this sort of registry to be parsed by implementations to actually implement formatters -- only to do validation and linting. So the formatter implementations are of course implementation-defined.
I think that a message formatting library should by default fail with an error when it does not recognize a function name. That includes functions that are registered, but not supported by a particular implementation.
For example, selectors for plural variants | ||
(different selectors for cardinal-number vs. ordinal-number variants) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not formatting: that's the selector type. If I say [{$count :plural type=ordinal}]
I expect to get keywords out like one
, few
, etc. or access the numeric value of $count
for selectors such as =2
---just like plural rules work today.
I agree that options are needed for the selector (as shown), but tend to expect that I can still format the value with a placeholder later. In fact, I might format differently several times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not formatting: that's the selector type. If I say
[{$count :plural type=ordinal}]
I expect to get keywords out likeone
,few
, etc. or access the numeric value of$count
for selectors such as=2
---just like plural rules work today.
I assumed that there would be different function names for plural/cardinal vs. plural/ordinal, like we have in ICU MessageFormat. But yes, it could be one "plural" function with an option.
I agree that options are needed for the selector (as shown), but tend to expect that I can still format the value with a placeholder later. In fact, I might format differently several times.
That should be strongly discouraged, especially looking at plurals. Formatting differently from what the selection was based on creates a jarring mismatch. We should design this to make it easy to do the right thing. If you need different formatting in a different part of the sentence, you can pass the same value in another argument, and you can also define a named expression for it.
For example, selectors for plural variants | ||
(different selectors for cardinal-number vs. ordinal-number variants) | ||
have to take into account how the number is formatted. | ||
2. Format-only functions can be used as selectors via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but only if the output doesn't contain spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can decide to support spaces by allowing or requiring delimiters around the variant values.
3. Select-only functions select among variant values, but they cannot be used in pattern placeholders. | ||
|
||
There is a simple format-only function that can be used for simple string matching. | ||
TODO: Decide on a name for this format-only function. Consider `:string`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:select
recommends itself, since we already have one just like this? Or is this different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion is to build on allowing format-only functions as selectors. Calling a formatter :string
makes more sense than calling it :select
.
Inside selected patterns, | ||
the selector argument variables must not be used with the normal `$` placeholder syntax – | ||
for example, the patterns in the preceding example must not use `{$count}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I can't write:
[{$count :plural}]
[=0] {You have no items in your cart}
[one] {You have {$count :number style=spellout} item in your cart}
[_] {You have {$count : number style=spellout} items in your cart}
This seems hard for users to understand. They passed the argument by name. Why can't the format it? It isn't like the value has been consumed by whatever selector ate it previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For plurals in particular, the formatting and selection are tied at the hip. If the spelled-out version of the number does not work grammatically like the :plural
select-and-format function expected, then you get unhappy users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore the style. The point I'm making is that your text says I cannot use the variable $count
and a different formatter after having used it with the plural selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for the stated reasons. Don't give users rope to hang themselves if we can avoid it :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a problem with the following
[{$count :plural}]
[=0] {You have no items in your cart}
[one] {You have {$count :number style=spellout} item in your cart}
[_] {You have {$count : number style=spellout} items in your cart}
The plural categories are tied to the hip with the formatting. With the input number 1.01d, in some languages the category is 'one' if the format is an integer, but 'other' if the format has one (or more) decimals. So you can't actually correctly compute the plural category until you've formatted.
There are two ways to solve this:
- tie the plural category to the formatted value, by having the formatting information up front, or
- require the formatting information to be identical for every instance of the placeholder (eg it is an error if they are different)
It actually works pretty nicely to have the formatter return the plural category as an (optional) byproduct of formatting, because an intermediate step to producing the formatted number is typically the exact data necessary to compute the plural category. So the cleanest is to have a syntax that draws on that in some way. There are of course a few ways to do that. One is to use an assignment, and the other would be to have the formatting options in the selector, eg
[{$count :number style=spellout}]
[=0] {You have no items in your cart}
[one] {You have {$count} item in your cart}
[_] {You have {$count} items in your cart}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MFWG has recently spent a considerable amount of effort in coming up with a single starting point for our syntax that is sufficiently good to act as a base for further discussions. Many of those further discussion topics have been identified, with the intent that we might be able to discuss and resolve them somewhat independently.
This PR upends that working model quite thoroughly, and once again sets us up with two entire solutions pitted against each other. Our past experience of discussions around the data model in particular would indicate that this is not a likely source of joy and success.
While there are certainly good ideas in this PR, it does need to be split up into multiple PRs modifying spec/syntax.md
so that each part may be considered on its own merits.
The current syntax.md in the "develop" branch has a lot of good points, but I and others have also pointed out a number of problems and made friendly suggestions on the previous slide deck version and then on the pull request that were largely not accommodated before merging. I didn't think that having a file in the "develop" branch gives it special status in the WG's process; maybe I was wrong.
It might work to debate lots of feedback items in isolation, but that can also lead to going around in circles; I have experienced that myself in another standards effort last year. It's like getting lost in the trees and not seeing the forest. In the end, in that other committee we had to consider many issues together and look at what the whole system looks like with one whole set of choices vs. another whole set. So this is what I am offering here: Roll a whole set of choices, intended to deal with multiple problems together, into one complete and coherent compromise syntax. Note that I did not start from scratch and just invent my own thing out of whole cloth. I started from what I think are the good points of "develop" syntax.md and tried to fix what I think are the problems with it. As I worked on this compromise version of the whole thing, I actually ended up doing some things differently from some of my own earlier feedback, especially the part about starting in "code" mode -- because as I was looking at the whole thing, it became clear that that is the better option. I also tried to provide a rationale for every part and choice of the compromise syntax. I left various TODOs for details where I think there is no obvious right answer, or it's really just a style preference. |
Prior to post-CLDR-committee syntax discussions, my preferred syntax was the one put forth in the EM proposal. It's slightly more verbose than what was merged in #230, but I like the consistency and readability of it. As mentioned previously (in the PR and mtg), I don't feel that syntax and some other comments were reflected much in #230. This PR uses #230 as a starting point, so it differs from what I would like, but it does avoid taking on some of the things that I have found confusing or unnecessary, etc. I do appreciate that the rationale for the significant design points are explained well, and the areas of options/bike-shedding are demarcated with comments. Some of the explanations and differences are things I/we had not considered previously but are important. So I'm okay with the differences between this PR as it is now and my original preferred syntax -- they definitely feel acceptable. I agree with earlier comments that this PR is heading in the right direction. I think that this PR gets us to a place that would be closer to a solution than before. |
I'm concerned about conflating the value of the arguments made in this PR with the format in which they're proposed. Elango's comment in particular feels like it links the two.
Would you write the same if this PR was, in your opinion, moving us away from the solution you'd like us to end up with? In other words. I think as a WG we're bouncing back and forth between collaborative ("Let's find a common ground") and competitive ("Here's my proposal and here are people that agree with me") approaches and micro ("Let's zero-down on question X in isolation") and macro ("Let's propose a cohesive solution to a class of questions") approaches. In particular, Eemeli and Stas asked everyone multiple times if they should take the task of coming up with a single proposal that could be merged into the tree as a "competitive macro" approach, and serve to start "collaborative micro" discussions, issues and PRs out of. I felt we, as a WG, explicitly responded to their question by giving them authority to go on. They spent last months debating every argument and every spectrum we disagreed on, taking feedback from CLDR-TC and all stakeholders and wrote the #230. Now, this PR does similar thing, with, I assume, slightly less background since Markus did not spend 2 years in every WG meeting debating every point till exhaustion. My main point is that I think there are two separate themes in response to this PR:
On the merit, I think it's a good proposal. I don't think it's better than what Stas and Eemeli put forth, but I'd be happy to see a revision of Stas and Eemeli's proposal with Markus' key themes incorporated. On the effect on the group, I imagine it may be deteriorating trust in the system if the WG asks people to do a very daunting and challenging task based on a agreed process, and then discards it on a whim. If we as a group believe that that process was not optimal, and it's better to have new proposal be the starting point, I think we should explicitly discuss it, recognize the change and maybe ask ourselves how can we avoid in the future putting people in position that we put Stas and Eemeli in. |
These are the thoughts and feelings behind my previous concerns about #230 and the discussion we had in Monday's meeting. It sounds like, as a group, we still need to discuss these issues, because these particular non-technical issues are important to how well we work as a group.
To be clear, I'm most interested in getting the best technical solution, and the technical discussions to get us there. (...which is why I care about the thoroughness of our technical arguments, too.) It's not about my ideas as it is having the best ideas percolate to the top. Of course, ideas need to be represented to have a fair shake at that. I don't enjoy the non-technical discussions that take us away from the technical ones, but these particular concerns are important to hash out for the health of the group.
Please don't misrepresent. We clearly need to talk this out.
Keep in mind that we also had other people not included or whose our previous feedback was at risk of the same, as we discussed earlier.
Yes |
645d2d3
to
64743ff
Compare
- TODO: Value literals need to be delimited (they may contain spaces), | ||
and the starting delimiter needs to be distinct from the prefixes for | ||
argument names and functions. | ||
Reasonable choices include `<>`, `()`, `[]`, `||`, or a pair of `` characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to use Markdown to show a pair of grave accents in code style...
By using a longer string of enclosing backticks (cf. CommonMark Code spans).
Reasonable choices include `<>`, `()`, `[]`, `||`, or a pair of `` characters. | |
Reasonable choices include `<>`, `()`, `[]`, `||`, or ``` `` ```. |
If at all possible, you want string delimiters that are less likely to
occur in strings.
…On Tue, May 31, 2022 at 11:11 AM Richard Gibson ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In spec/compromise-syntax.md
<#266 (comment)>
:
> +
+Options are not allowed when no function is specified.
+
+Value literals are important for developers to control the output.
+For example, certain strings may need to be inlined as literals so that
+they are not changed during translation.
+Numeric constants need to be formatted differently depending on the target language
+(e.g., which digits and separators, and the grouping style).
+Date constants need to be formatted according to the target language’s calendar system.
+
+If only a value literal is given, without specifying a function,
+then its string value is used verbatim and it is read-only for translators.
+- TODO: Value literals need to be delimited (they may contain spaces),
+ and the starting delimiter needs to be distinct from the prefixes for
+ argument names and functions.
+ Reasonable choices include `<>`, `()`, `[]`, `||`, or a pair of `` characters.
I am not sure how to use Markdown to show a pair of grave accents in code
style...
By using a longer string of enclosing backticks (cf. CommonMark Code spans
<https://spec.commonmark.org/0.30/#code-spans>).
⬇️ Suggested change
- Reasonable choices include `<>`, `()`, `[]`, `||`, or a pair of `` characters.
+ Reasonable choices include `<>`, `()`, `[]`, `||`, or ``` `` ```.
—
Reply to this email directly, view it on GitHub
<#266 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMDUIUF72ODINR5XPVTVMZI6PANCNFSM5V4XXWOQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@markusicu We're cleaning up old branches and PRs in the repo. For posterity, would you mind moving the contents of your branch to the |
The syntax proposed in this PR is now available here: https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/markus/compromise-syntax.md |
Closing the PR, as agreed at the last meeting. |
No description provided.