-
-
Notifications
You must be signed in to change notification settings - Fork 36
Describe number selection fully #621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ea23789
8514254
309bf73
2c6399a
cb2e108
1e52c6c
338198f
065b3df
e09c370
d4a096e
772fdf5
e21fd36
22576b1
304c7ed
a944dc4
47957d2
1c0bf19
e878d88
c3e6edd
5aa134e
55f9591
986ad71
0448d35
8cc6969
82feb78
9c76366
f3cc38d
8fc4958
3175c44
09838e8
a8746ed
2b56b1e
f9b521c
10b0f03
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -7,10 +7,12 @@ Status: **Accepted** | |||||
<dl> | ||||||
<dt>Contributors</dt> | ||||||
<dd>@eemeli</dd> | ||||||
<dd>@aphillips</dd> | ||||||
<dt>First proposed</dt> | ||||||
<dd>2023-09-06</dd> | ||||||
<dt>Pull Request</dt> | ||||||
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/471">#471</a></dd> | ||||||
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/621">#621</a></dd> | ||||||
</dl> | ||||||
</details> | ||||||
|
||||||
|
@@ -45,6 +47,7 @@ but ordinal rules use `one` (_1st_, _21st_, etc.), `few` (_2nd_, _22nd_, etc.), | |||||
Additionally, | ||||||
MF1 provides `ChoiceFormat` selection based on a complex rule set | ||||||
(and which allows determining if a number falls into a specific range). | ||||||
This capability is not supported by the default functions of MF2. | ||||||
|
||||||
Both JS and ICU PluralRules implementations provide for determining the plural category | ||||||
of a range based on its start and end values. | ||||||
|
@@ -92,44 +95,303 @@ ICU MF1 messages using `plural` and `selectordinal` should be representable in M | |||||
|
||||||
## Proposed Design | ||||||
|
||||||
Given that we already have a `:number`, | ||||||
it makes sense to add a `<matchSignature>` to it with an option | ||||||
### Number Selection | ||||||
|
||||||
```xml | ||||||
<option name="select" values="plural ordinal exact" default="plural" /> | ||||||
``` | ||||||
Number selection has three modes: | ||||||
- `exact` selection matches the operand to explicit numeric keys exactly | ||||||
- `plural` selection matches the operand to explicit numeric keys exactly | ||||||
or to plural rule categories if there is no explicit match | ||||||
- `ordinal` selection matches the operand to explicit numeric keys exactly | ||||||
or to ordinal rule categories if there is no explicit match | ||||||
|
||||||
|
||||||
### Functions | ||||||
|
||||||
The following functions use numeric selection: | ||||||
|
||||||
The function `:number` is the default selector for numeric values. | ||||||
|
||||||
The function `:integer` provides a reduced set of options for selecting | ||||||
and formatting numeric values as integers. | ||||||
|
||||||
### Operands | ||||||
|
||||||
The _operand_ of a number function is either an implementation-defined type or | ||||||
a literal that matches the `number-literal` production in the [ABNF](/main/spec/message.abnf). | ||||||
All other values produce a _Selection Error_ when evaluated for selection | ||||||
or a _Formatting Error_ when attempting to format the value. | ||||||
|
||||||
> For example, in Java, any subclass of `java.lang.Number` plus the primitive | ||||||
> types (`byte`, `short`, `int`, `long`, `float`, `double`, etc.) | ||||||
> might be considered as the "implementation-defined numeric types". | ||||||
> Implementations in other programming languages would define different types | ||||||
> or classes according to their local needs. | ||||||
|
||||||
> [!NOTE] | ||||||
> String values passed as variables in the _formatting context_'s | ||||||
> _input mapping_ can be formatted as numeric values as long as their | ||||||
> contents match the `number-literal` production in the [ABNF](/main/spec/message.abnf). | ||||||
> | ||||||
> For example, if the value of the variable `num` were the string | ||||||
> `-1234.567`, it would behave identically to the local | ||||||
> variable in this example: | ||||||
> ``` | ||||||
> .local $example = {|-1234.567| :number} | ||||||
> {{{$num :number} == {$example}}} | ||||||
> ``` | ||||||
|
||||||
The default `plural` value is presumed to be the most common use case, | ||||||
and it affords the least bad fallback when used incorrectly: | ||||||
Using "plural" for "exact" still selects exactly matching cases, | ||||||
whereas using "exact" for "plural" will not select LDML category matches. | ||||||
This might not be noticeable in the source language, | ||||||
> [!NOTE] | ||||||
> Implementations are encouraged to provide support for compound types or data structures | ||||||
> that provide additional semantic meaning to the formatting of number-like values. | ||||||
> For example, in ICU4J, the type `com.ibm.icu.util.Measure` can be used to communicate | ||||||
> a value that include a unit | ||||||
> or the type `com.ibm.icu.util.CurrencyAmount` can be used to set the currency and related | ||||||
> options (such as the number of fraction digits). | ||||||
|
||||||
|
||||||
### Options | ||||||
|
||||||
The following options and their values are required in the default registry to be available on the | ||||||
function `:number`: | ||||||
- `select` | ||||||
- `plural` (default) | ||||||
- `ordinal` | ||||||
- `exact` | ||||||
- `compactDisplay` // this option only has meaning when combined with the option `notation=compact` | ||||||
- `short` (default) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question, isn't the default for compactDisplay=none? That is, for the other cases, the default means "what happens if the option is not present" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MDN says:
So probably this option should be moved under (depend on) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. See above. |
||||||
- `long` | ||||||
- `notation` | ||||||
- `standard` (default) | ||||||
- `scientific` | ||||||
- `engineering` | ||||||
- `compact` | ||||||
- `numberingSystem` | ||||||
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier) | ||||||
(default is locale-specific) | ||||||
- `signDisplay` | ||||||
- `auto` (default) | ||||||
- `always` | ||||||
- `exceptZero` | ||||||
- `negative` | ||||||
- `never` | ||||||
aphillips marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- `style` | ||||||
- `decimal` (default) | ||||||
- `percent` (see [Percent Style](#percent-style) below) | ||||||
- `useGrouping` | ||||||
- `auto` (default) | ||||||
- `always` | ||||||
- `never` | ||||||
- `min2` | ||||||
- `minimumIntegerDigits` | ||||||
- (non-negative integer, default: `1`) | ||||||
- | ||||||
> [!NOTE] | ||||||
> The following options do not have default values because they are only to be used | ||||||
> as overrides for an existing locale-and-value dependent implementation-defined | ||||||
> default | ||||||
|
||||||
- `minimumFractionDigits` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. minimum default is zero? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These items (min*/max*) don't have defaults because they are overrides. The default is implementation defined. |
||||||
- (non-negative integer) | ||||||
- `maximumFractionDigits` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there a maximum? Or if unstated would I see 0.3333333333333333333333333333333333 for the value of 1/3? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. etc There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see above |
||||||
- (non-negative integer) | ||||||
- `minimumSignificantDigits` | ||||||
- (non-negative integer) | ||||||
- `maximumSignificantDigits` | ||||||
- (non-negative integer) | ||||||
|
||||||
The following options and their values are required in the default registry to be available on the | ||||||
function `:integer`: | ||||||
- `select` | ||||||
- `plural` (default) | ||||||
- `ordinal` | ||||||
- `exact` | ||||||
- `numberingSystem` | ||||||
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier) | ||||||
(default is locale-specific) | ||||||
- `signDisplay` | ||||||
- `auto` (default) | ||||||
- `always` | ||||||
- `exceptZero` | ||||||
- `negative` | ||||||
- `never` | ||||||
- `style` | ||||||
- `decimal` (default) | ||||||
- `percent` (see [Percent Style](#percent-style) below) | ||||||
- `useGrouping` | ||||||
- `auto` (default) | ||||||
- `true` | ||||||
- `false` | ||||||
- `min2` | ||||||
- `always` | ||||||
- `minimumIntegerDigits` | ||||||
- (non-negative integer, default: `1`) | ||||||
|
||||||
> [!NOTE] | ||||||
> The following option does not have a default value because it is only to be used | ||||||
> as an override for an existing locale-and-value dependent implementation-defined | ||||||
> default | ||||||
|
||||||
- `maximumSignificantDigits` | ||||||
- (non-negative integer) | ||||||
|
||||||
> [!NOTE] | ||||||
> The following options or option values are being developed during the Technical Preview | ||||||
> period. | ||||||
|
||||||
The following values for the option `style` are _not_ part of the default registry. | ||||||
Implementations SHOULD avoid creating options that conflict with these, but | ||||||
are encouraged to track development of these options during Tech Preview: | ||||||
- `currency` | ||||||
- `unit` | ||||||
|
||||||
The following options are _not_ part of the default registry. | ||||||
Implementations SHOULD avoid creating options that conflict with these, but | ||||||
are encouraged to track development of these options during Tech Preview: | ||||||
- `currency` | ||||||
- valid [Unicode Currency Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCurrencyIdentifier) | ||||||
(no default) | ||||||
- `currencyDisplay` | ||||||
- `symbol` (default) | ||||||
- `narrowSymbol` | ||||||
- `code` | ||||||
- `name` | ||||||
- `currencySign` | ||||||
- `accounting` | ||||||
- `standard` (default) | ||||||
- `unit` | ||||||
- (anything not empty) | ||||||
- `unitDisplay` | ||||||
- `long` | ||||||
- `short` (default) | ||||||
- `narrow` | ||||||
|
||||||
### Default Value of `select` Option | ||||||
|
||||||
The value `plural` is default for the option `select` | ||||||
because it is the most common use case for numeric selection. | ||||||
It can be used for exact value matches but also allows for the grammatical needs of other | ||||||
languages using CLDR's plural rules. | ||||||
This might not be noticeable in the source language (particularly English), | ||||||
but can cause problems in target locales that the original developer is not considering. | ||||||
|
||||||
> For example, a naive developer might use a special message for the value `1` without | ||||||
> considering other locale's need for a `one` plural: | ||||||
> | ||||||
> ``` | ||||||
> .match {$var} | ||||||
> [1] {{You have one last chance}} | ||||||
> [one] {{You have {$var} chance remaining}} // needed by languages such as Polish or Russian | ||||||
> [*] {{You have {$var} chances remaining}} | ||||||
> 1 {{You have one last chance}} | ||||||
> one {{You have {$var} chance remaining}} // needed by languages such as Polish or Russian | ||||||
> // such locales typically require other keywords | ||||||
> // such as two, few, many, and so forth | ||||||
> * {{You have {$var} chances remaining}} | ||||||
macchiati marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
> ``` | ||||||
|
||||||
Additional options such as `minimumFractionDigits` and others already supported by `:number` | ||||||
should also be supported. | ||||||
|
||||||
If PR [#532](https://github.com/unicode-org/message-format-wg/pull/532) is accepted, | ||||||
also add the following `<alias>` definitions to `<function name="number">`: | ||||||
### Percent Style | ||||||
|
||||||
When implementing `style=percent`, the numeric value of the operand | ||||||
MUST be divided by 100 for the purposes of formatting. | ||||||
|
||||||
### Selection | ||||||
|
||||||
When implementing [`MatchSelectorKeys`](spec/formatting.md#resolve-preferences), | ||||||
numeric selectors perform as described below. | ||||||
|
||||||
- Let `return_value` be a new empty list of strings. | ||||||
- Let `operand` be the resolved value of the _operand_. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
The intermediate formatted value may have notation information associated with it, such as "1.0c3" for "1.0K" or potentially "2.3e4" for "2.3*10^4". We should be clear that this is the amount of information we want from the intermediate value, and it also allows us to select on compact values, which may be important in French, for example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "resolved value" has special meaning in our specification (even though it is not highlighted here). It's important to say that Note that We say what to do with the literal (which ones are supported and how to parse them) We mustn't define what the object is. That's up to the implementation. The implementer is responsible for resolving the decimal value, if that's what they need. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, are you referencing this language?
So the "resolved value" may or may not be the intermediate value? |
||||||
If the `operand` is not a number type, emit a _Selection Error_ | ||||||
and return `return_value`. | ||||||
- Let `keys` be a list of strings containing keys to match. | ||||||
(Hint: this list is an argument to `MatchSelectorKeys`) | ||||||
- For each string `key` in `keys`: | ||||||
- If the value of `key` matches the production `number-literal`: | ||||||
- If the parsed value of `key` is an [exact match](#determining-exact-literal-match) | ||||||
of the value of the `operand`, then `key` matches the selector. | ||||||
Add `key` to the front of the `return_value` list. | ||||||
- Else, if the value of `key` is a keyword: | ||||||
- Let `keyword` be a string which is the result of [rule selection](#rule-selection). | ||||||
- If `keyword` equals `key`, then `key` matches the selector. | ||||||
Append `key` to the end of the `return_value` list. | ||||||
- Else, `key` is invalid; | ||||||
emit a _Selection Error_. | ||||||
Do not add `key` to `return_value` | ||||||
- Return `return_value` | ||||||
|
||||||
aphillips marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
### Plural/Ordinal Keywords | ||||||
The _plural/ordinal keywords_ are: `zero`, `one`, `two`, `few`, `many`, and | ||||||
`other`. | ||||||
|
||||||
### Rule Selection | ||||||
|
||||||
If the option `select` is set to `exact`, rule-based selection is not used. | ||||||
Return the empty string. | ||||||
|
||||||
> [!NOTE] | ||||||
> Since keys cannot be the empty string in a numeric selector, returning the | ||||||
> empty string disables keyword selection | ||||||
|
||||||
If the option `select` is set to `plural`, selection should be based on CLDR plural rule data | ||||||
of type `cardinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) | ||||||
for examples. | ||||||
|
||||||
If the option `select` is set to `ordinal`, selection should be based on CLDR plural rule data | ||||||
of type `ordinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) | ||||||
for examples. | ||||||
|
||||||
Apply the rules defined by CLDR to the resolved value of the operand and the function options, | ||||||
and return the resulting keyword. | ||||||
If no rules match, return `other`. | ||||||
|
||||||
> **Example.** | ||||||
> In CLDR 44, the Czech (`cs`) plural rule set can be found | ||||||
> [here](https://www.unicode.org/cldr/charts/44/supplemental/language_plural_rules.html#cs). | ||||||
> | ||||||
> A message in Czech might be: | ||||||
> ``` | ||||||
> .match {$numDays :number} | ||||||
> one {{{$numDays} den}} | ||||||
> few {{{$numDays} dny}} | ||||||
> many {{{$numDays} dne}} | ||||||
> * {{{$numDays} dní}} | ||||||
> ``` | ||||||
> Using the rules found above, the results of various `operand` values might look like: | ||||||
> | Operand value | Keyword | Formatted Message | | ||||||
> |---|---|---| | ||||||
> | 1 | `one` | 1 den | | ||||||
> | 2 | `few` | 2 dny | | ||||||
> | 5 | `other` | 5 dní | | ||||||
> | 22 | `few` | 22 dny | | ||||||
> | 27 | `other` | 27 dní | | ||||||
> | 2.4 | `many` | 2,4 dne | | ||||||
|
||||||
|
||||||
|
||||||
### Determining Exact Literal Match | ||||||
|
||||||
> [!IMPORTANT] | ||||||
> The exact behavior of exact literal match is only defined for non-zero-filled | ||||||
> integer values. | ||||||
> Annotations that use fraction digits or significant digits might work in specific | ||||||
> implementation-defined ways. | ||||||
> Users should avoid depending on these types of keys in message selection. | ||||||
|
||||||
|
||||||
Number literals in the MessageFormat 2 syntax use the | ||||||
[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6). | ||||||
The resolved value of an `operand` exactly matches a numeric literal `key` | ||||||
if, when the `operand` is serialized using the format for a JSON number | ||||||
the two strings are equal. | ||||||
|
||||||
> [!NOTE] | ||||||
> Implementations are not expected to implement this exactly as written, | ||||||
> as there are clearly optimizations that can be applied. | ||||||
|
||||||
> [!NOTE] | ||||||
> Only integer matching is required in the Technical Preview. | ||||||
> Feedback describing use cases for fractional and significant digits-based | ||||||
> selection would be helpful. | ||||||
Otherwise, users should avoid using matching with fractional numbers or significant digits. | ||||||
|
||||||
```xml | ||||||
<alias name="plural" supports="match"> | ||||||
<setOption name="select" value="plural"/> | ||||||
</alias> | ||||||
<alias name="ordinal" supports="match"> | ||||||
<setOption name="select" value="ordinal"/> | ||||||
</alias> | ||||||
``` | ||||||
|
||||||
## Alternatives Considered | ||||||
|
||||||
|
Uh oh!
There was an error while loading. Please reload this page.