Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Describe number selection fully #621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Feb 15, 2024
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
ea23789
Describe number selection fully
aphillips Feb 3, 2024
8514254
fix typo and some formatting
aphillips Feb 3, 2024
309bf73
Apply suggestions from code review
aphillips Feb 3, 2024
2c6399a
Address comments with a major rewrite
aphillips Feb 4, 2024
cb2e108
Tweak operand handling
aphillips Feb 4, 2024
1e52c6c
Typos
aphillips Feb 4, 2024
338198f
Improve Czech example
aphillips Feb 5, 2024
065b3df
Update number-selection.md
aphillips Feb 5, 2024
e09c370
Address @macchiati's comments
aphillips Feb 5, 2024
d4a096e
Add text for determining exact literal match
aphillips Feb 5, 2024
772fdf5
Update number-selection.md
aphillips Feb 5, 2024
e21fd36
typo
aphillips Feb 5, 2024
22576b1
Implement numeric literal selection using strings
aphillips Feb 5, 2024
304c7ed
Address comment about plural example
aphillips Feb 5, 2024
a944dc4
remove "boolean"
aphillips Feb 6, 2024
47957d2
Apply suggestions from code review
aphillips Feb 6, 2024
1c0bf19
Fix the operands section
aphillips Feb 7, 2024
e878d88
Update exploration/number-selection.md
aphillips Feb 7, 2024
c3e6edd
Add significant digits to key matching
aphillips Feb 7, 2024
5aa134e
Fix `:ordinal` as a formatter
aphillips Feb 12, 2024
55f9591
Changes based on 2024-02-14 call
aphillips Feb 14, 2024
986ad71
Fix implementation defined types
aphillips Feb 14, 2024
0448d35
remove fraction digits from `:integer`
aphillips Feb 14, 2024
8cc6969
Address comments, fix useGrouping
aphillips Feb 14, 2024
82feb78
Fix `signDisplay`
aphillips Feb 14, 2024
9c76366
Fix a typo
aphillips Feb 14, 2024
f3cc38d
Remove signif digits from `:integer`; add note about no min/max defaults
aphillips Feb 14, 2024
8fc4958
Fix minimumIntegerDigits
aphillips Feb 14, 2024
3175c44
Update exploration/number-selection.md
aphillips Feb 15, 2024
09838e8
Update number-selection.md
aphillips Feb 15, 2024
a8746ed
only use integer matching
aphillips Feb 15, 2024
2b56b1e
Update exploration/number-selection.md
aphillips Feb 15, 2024
f9b521c
Update exploration/number-selection.md
aphillips Feb 15, 2024
10b0f03
Update exploration/number-selection.md
aphillips Feb 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
312 changes: 287 additions & 25 deletions exploration/number-selection.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@ Status: **Accepted**
<dl>
<dt>Contributors</dt>
<dd>@eemeli</dd>
<dd>@aphillips</dd>
<dt>First proposed</dt>
<dd>2023-09-06</dd>
<dt>Pull Request</dt>
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/471">#471</a></dd>
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/621">#621</a></dd>
</dl>
</details>

Expand Down Expand Up @@ -45,6 +47,7 @@ but ordinal rules use `one` (_1st_, _21st_, etc.), `few` (_2nd_, _22nd_, etc.),
Additionally,
MF1 provides `ChoiceFormat` selection based on a complex rule set
(and which allows determining if a number falls into a specific range).
This capability is not supported by the default functions of MF2.

Both JS and ICU PluralRules implementations provide for determining the plural category
of a range based on its start and end values.
Expand Down Expand Up @@ -92,44 +95,303 @@ ICU MF1 messages using `plural` and `selectordinal` should be representable in M

## Proposed Design

Given that we already have a `:number`,
it makes sense to add a `<matchSignature>` to it with an option
### Number Selection

```xml
<option name="select" values="plural ordinal exact" default="plural" />
```
Number selection has three modes:
- `exact` selection matches the operand to explicit numeric keys exactly
- `plural` selection matches the operand to explicit numeric keys exactly
or to plural rule categories if there is no explicit match
- `ordinal` selection matches the operand to explicit numeric keys exactly
or to ordinal rule categories if there is no explicit match


### Functions

The following functions use numeric selection:

The function `:number` is the default selector for numeric values.

The function `:integer` provides a reduced set of options for selecting
and formatting numeric values as integers.

### Operands

The _operand_ of a number function is either an implementation-defined type or
a literal that matches the `number-literal` production in the [ABNF](/main/spec/message.abnf).
All other values produce a _Selection Error_ when evaluated for selection
or a _Formatting Error_ when attempting to format the value.

> For example, in Java, any subclass of `java.lang.Number` plus the primitive
> types (`byte`, `short`, `int`, `long`, `float`, `double`, etc.)
> might be considered as the "implementation-defined numeric types".
> Implementations in other programming languages would define different types
> or classes according to their local needs.

> [!NOTE]
> String values passed as variables in the _formatting context_'s
> _input mapping_ can be formatted as numeric values as long as their
> contents match the `number-literal` production in the [ABNF](/main/spec/message.abnf).
>
> For example, if the value of the variable `num` were the string
> `-1234.567`, it would behave identically to the local
> variable in this example:
> ```
> .local $example = {|-1234.567| :number}
> {{{$num :number} == {$example}}}
> ```

The default `plural` value is presumed to be the most common use case,
and it affords the least bad fallback when used incorrectly:
Using "plural" for "exact" still selects exactly matching cases,
whereas using "exact" for "plural" will not select LDML category matches.
This might not be noticeable in the source language,
> [!NOTE]
> Implementations are encouraged to provide support for compound types or data structures
> that provide additional semantic meaning to the formatting of number-like values.
> For example, in ICU4J, the type `com.ibm.icu.util.Measure` can be used to communicate
> a value that include a unit
> or the type `com.ibm.icu.util.CurrencyAmount` can be used to set the currency and related
> options (such as the number of fraction digits).


### Options

The following options and their values are required in the default registry to be available on the
function `:number`:
- `select`
- `plural` (default)
- `ordinal`
- `exact`
- `compactDisplay` // this option only has meaning when combined with the option `notation=compact`
- `short` (default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question, isn't the default for compactDisplay=none? That is, for the other cases, the default means "what happens if the option is not present"

Copy link
Member Author

@aphillips aphillips Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDN says:

Whether to use short or long form when using compact notation. This is the value provided in the options.compactDisplay argument of the constructor, or the default value: "short". The value is only present if notation is set to "compact", and otherwise is undefined.

So probably this option should be moved under (depend on) notation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. See above.

- `long`
- `notation`
- `standard` (default)
- `scientific`
- `engineering`
- `compact`
- `numberingSystem`
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
(default is locale-specific)
- `signDisplay`
- `auto` (default)
- `always`
- `exceptZero`
- `negative`
- `never`
- `style`
- `decimal` (default)
- `percent` (see [Percent Style](#percent-style) below)
- `useGrouping`
- `auto` (default)
- `always`
- `never`
- `min2`
- `minimumIntegerDigits`
- (non-negative integer, default: `1`)
-
> [!NOTE]
> The following options do not have default values because they are only to be used
> as overrides for an existing locale-and-value dependent implementation-defined
> default

- `minimumFractionDigits`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimum default is zero?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These items (min*/max*) don't have defaults because they are overrides. The default is implementation defined.

- (non-negative integer)
- `maximumFractionDigits`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a maximum? Or if unstated would I see 0.3333333333333333333333333333333333 for the value of 1/3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

- (non-negative integer)
- `minimumSignificantDigits`
- (non-negative integer)
- `maximumSignificantDigits`
- (non-negative integer)

The following options and their values are required in the default registry to be available on the
function `:integer`:
- `select`
- `plural` (default)
- `ordinal`
- `exact`
- `numberingSystem`
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
(default is locale-specific)
- `signDisplay`
- `auto` (default)
- `always`
- `exceptZero`
- `negative`
- `never`
- `style`
- `decimal` (default)
- `percent` (see [Percent Style](#percent-style) below)
- `useGrouping`
- `auto` (default)
- `true`
- `false`
- `min2`
- `always`
- `minimumIntegerDigits`
- (non-negative integer, default: `1`)

> [!NOTE]
> The following option does not have a default value because it is only to be used
> as an override for an existing locale-and-value dependent implementation-defined
> default

- `maximumSignificantDigits`
- (non-negative integer)

> [!NOTE]
> The following options or option values are being developed during the Technical Preview
> period.

The following values for the option `style` are _not_ part of the default registry.
Implementations SHOULD avoid creating options that conflict with these, but
are encouraged to track development of these options during Tech Preview:
- `currency`
- `unit`

The following options are _not_ part of the default registry.
Implementations SHOULD avoid creating options that conflict with these, but
are encouraged to track development of these options during Tech Preview:
- `currency`
- valid [Unicode Currency Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCurrencyIdentifier)
(no default)
- `currencyDisplay`
- `symbol` (default)
- `narrowSymbol`
- `code`
- `name`
- `currencySign`
- `accounting`
- `standard` (default)
- `unit`
- (anything not empty)
- `unitDisplay`
- `long`
- `short` (default)
- `narrow`

### Default Value of `select` Option

The value `plural` is default for the option `select`
because it is the most common use case for numeric selection.
It can be used for exact value matches but also allows for the grammatical needs of other
languages using CLDR's plural rules.
This might not be noticeable in the source language (particularly English),
but can cause problems in target locales that the original developer is not considering.

> For example, a naive developer might use a special message for the value `1` without
> considering other locale's need for a `one` plural:
>
> ```
> .match {$var}
> [1] {{You have one last chance}}
> [one] {{You have {$var} chance remaining}} // needed by languages such as Polish or Russian
> [*] {{You have {$var} chances remaining}}
> 1 {{You have one last chance}}
> one {{You have {$var} chance remaining}} // needed by languages such as Polish or Russian
> // such locales typically require other keywords
> // such as two, few, many, and so forth
> * {{You have {$var} chances remaining}}
> ```

Additional options such as `minimumFractionDigits` and others already supported by `:number`
should also be supported.

If PR [#532](https://github.com/unicode-org/message-format-wg/pull/532) is accepted,
also add the following `<alias>` definitions to `<function name="number">`:
### Percent Style

When implementing `style=percent`, the numeric value of the operand
MUST be divided by 100 for the purposes of formatting.

### Selection

When implementing [`MatchSelectorKeys`](spec/formatting.md#resolve-preferences),
numeric selectors perform as described below.

- Let `return_value` be a new empty list of strings.
- Let `operand` be the resolved value of the _operand_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Let `operand` be the resolved value of the _operand_.
- Let `operand` be the decimal value of _operand_ after formatting options from `:number` have been applied conforming to [the `sampleValue` grammar of UTS 35](https://unicode.org/reports/tr35/tr35-numbers.html#Plural_rules_syntax).

The intermediate formatted value may have notation information associated with it, such as "1.0c3" for "1.0K" or potentially "2.3e4" for "2.3*10^4". We should be clear that this is the amount of information we want from the intermediate value, and it also allows us to select on compact values, which may be important in French, for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"resolved value" has special meaning in our specification (even though it is not highlighted here). It's important to say that operand is the "resolved value". We can add requirements after that, such as what you suggest.

Note that operand is untyped by us. It is either an object or it is a literal.

We say what to do with the literal (which ones are supported and how to parse them)

We mustn't define what the object is. That's up to the implementation. The implementer is responsible for resolving the decimal value, if that's what they need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are you referencing this language?

The form of the resolved value is implementation defined and the
value might not be evaluated or formatted yet.
However, it needs to be "formattable", i.e. it contains everything required
by the eventual formatting.

So the "resolved value" may or may not be the intermediate value?

If the `operand` is not a number type, emit a _Selection Error_
and return `return_value`.
- Let `keys` be a list of strings containing keys to match.
(Hint: this list is an argument to `MatchSelectorKeys`)
- For each string `key` in `keys`:
- If the value of `key` matches the production `number-literal`:
- If the parsed value of `key` is an [exact match](#determining-exact-literal-match)
of the value of the `operand`, then `key` matches the selector.
Add `key` to the front of the `return_value` list.
- Else, if the value of `key` is a keyword:
- Let `keyword` be a string which is the result of [rule selection](#rule-selection).
- If `keyword` equals `key`, then `key` matches the selector.
Append `key` to the end of the `return_value` list.
- Else, `key` is invalid;
emit a _Selection Error_.
Do not add `key` to `return_value`
- Return `return_value`

### Plural/Ordinal Keywords
The _plural/ordinal keywords_ are: `zero`, `one`, `two`, `few`, `many`, and
`other`.

### Rule Selection

If the option `select` is set to `exact`, rule-based selection is not used.
Return the empty string.

> [!NOTE]
> Since keys cannot be the empty string in a numeric selector, returning the
> empty string disables keyword selection

If the option `select` is set to `plural`, selection should be based on CLDR plural rule data
of type `cardinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html)
for examples.

If the option `select` is set to `ordinal`, selection should be based on CLDR plural rule data
of type `ordinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html)
for examples.

Apply the rules defined by CLDR to the resolved value of the operand and the function options,
and return the resulting keyword.
If no rules match, return `other`.

> **Example.**
> In CLDR 44, the Czech (`cs`) plural rule set can be found
> [here](https://www.unicode.org/cldr/charts/44/supplemental/language_plural_rules.html#cs).
>
> A message in Czech might be:
> ```
> .match {$numDays :number}
> one {{{$numDays} den}}
> few {{{$numDays} dny}}
> many {{{$numDays} dne}}
> * {{{$numDays} dní}}
> ```
> Using the rules found above, the results of various `operand` values might look like:
> | Operand value | Keyword | Formatted Message |
> |---|---|---|
> | 1 | `one` | 1 den |
> | 2 | `few` | 2 dny |
> | 5 | `other` | 5 dní |
> | 22 | `few` | 22 dny |
> | 27 | `other` | 27 dní |
> | 2.4 | `many` | 2,4 dne |



### Determining Exact Literal Match

> [!IMPORTANT]
> The exact behavior of exact literal match is only defined for non-zero-filled
> integer values.
> Annotations that use fraction digits or significant digits might work in specific
> implementation-defined ways.
> Users should avoid depending on these types of keys in message selection.


Number literals in the MessageFormat 2 syntax use the
[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6).
The resolved value of an `operand` exactly matches a numeric literal `key`
if, when the `operand` is serialized using the format for a JSON number
the two strings are equal.

> [!NOTE]
> Implementations are not expected to implement this exactly as written,
> as there are clearly optimizations that can be applied.

> [!NOTE]
> Only integer matching is required in the Technical Preview.
> Feedback describing use cases for fractional and significant digits-based
> selection would be helpful.
Otherwise, users should avoid using matching with fractional numbers or significant digits.

```xml
<alias name="plural" supports="match">
<setOption name="select" value="plural"/>
</alias>
<alias name="ordinal" supports="match">
<setOption name="select" value="ordinal"/>
</alias>
```

## Alternatives Considered

Expand Down