Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Data model feedback: I think we should have string and numeric literals #712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mihnita opened this issue Mar 8, 2024 · 12 comments
Closed
Labels
blocker-candidate The submitter thinks this might be a block for the next release data model Issues related to the Interchange Data Model LDML46.1 MF2.0 Draft Candidate Preview-Feedback Feedback gathered during the technical preview resolve-candidate This issue appears to have been answered or resolved, and may be closed soon.

Comments

@mihnita
Copy link
Collaborator

mihnita commented Mar 8, 2024

At this point the data model only has string literals:

interface Literal {
  type: "literal";
  value: string;
}

The parser also has number-literal

literal        = quoted / unquoted
quoted         = "|" *(quoted-char / quoted-escape) "|"
unquoted       = name / number-literal

When we format a message we use the data model only.
Which means there is no way to tell the difference between "...{|123456789|}..." and "...{123456789}..."
Because in the data model we only have a string, and "The presence or absence of quotes is not preserved by the data model."

But I think one would expect that {|123456.789|} to result in "123456.789" (because it is a string),
and would expect {123456.789} formatted as "123,456.789" (or "123.456,789", maybe with alternate digits).
Because "it is a number".

It means the placeholders without functions are not intuitive:
"...{123456789}..." => "...123456789..."
"...{123456789 :number}..." => "...123,456.789}..."

Numeric literals are also found in options: ...{$foo :function opt1=bar opt2=baz opt3=42}..., and in decision keys.


TLDR:
We have numeric literals in syntax.
We need to know if a literal was numeric when we format to string.
But we drop that info in the data model, which sits in the middle.

@mihnita mihnita added the LDML46 LDML46 Release (Tech Preview - October 2024) label Mar 8, 2024
@eemeli
Copy link
Collaborator

eemeli commented Mar 9, 2024

When a _literal_ is used as an _operand_
or on the right-hand side of an _option_,
the formatting function MUST treat its resolved value the same
whether its value was originally _quoted_ or _unquoted_.

@aphillips
Copy link
Member

Numeric literals are not numbers.
They are a sub-production of literal that makes it convenient to use numeric values in the syntax.
We have number-literal instead of mutating name a bunch.

That is, it is acceptable to add quotes to any numeric-literal.

I see the problem that you're grappling with, @mihnita, which is that you can't reflect off of a string in a placeholder to get a number. You might like number-literal to turn into a number. but {|123|} is just as valid as {123}. What I think you'll have to do to get the intuitive behavior you're after is check if the literal parses as a number in order to support automatic assignment of :number instead of :string.

@macchiati
Copy link
Member

macchiati commented Mar 9, 2024 via email

@mihnita
Copy link
Collaborator Author

mihnita commented Mar 9, 2024

What I think you'll have to do to get the intuitive behavior you're after is check if the literal parses as a number in order to support automatic assignment of :number instead of :string.

This is independent of :number
We have places where we take numbers in options (...{$foo :bar opt=21}...).

The function :bar should not know about :number
Every single function taking "numeric options" will need a way to parse a string to a number :-(
Without calling the :number (internal) parser.

And the MF2 implementation itself should not know about :number (which is a function like any other, it "just happens" to be standard).

And when I say "intuitive behavior" I am mostly thinking about a users of MF2, someone writing messages.
Intuition works without thinking. If I have to read the registry and decide "ok, this is parseable by :number" then it is not intuition anymore. It is "learn to live with it" against intuition.

And that intuition might be programming language dependent :-)
1 == "1" is true in JavaScript and Perl, but not in Java or Python.

Literals in the message text are always strings

Absolutely.
But here we are talking about the data model.

@aphillips
Copy link
Member

@mihnita I think options are the same thing. Functions need to specify what string serialization they accept. For an expression like {$count :number minimumFractionDigits=1}, the 1 has to be a specific pattern which the :number backing function parses into the value.

In your case, you're probably using NumberFormatter as your ultimate formatter, but you'll have some code that parses the option value to make it into a number (or kvetches that it isn't sufficiently numeric).

In the data model, the value of the option is a string. In the function registry, the value of that string might be constrained.

@eemeli
Copy link
Collaborator

eemeli commented Mar 10, 2024

@mihnita How would you represent the operand of this expression in the data model?

{ 1.00 :x:number }

@mihnita
Copy link
Collaborator Author

mihnita commented Mar 10, 2024

@mihnita How would you represent the operand of this expression in the data model?

{ 1.00 :x:number }

Same as today, except that 1.00 would be a NumberLiteral instead of Literal.
And {|1.00| :x:number} would be a StringLiteral.

Same as JS and most programming languages, 1.00 is a number, "1.00" or '1.00' is a string.
So 1.00 == 1.0, but |1.00| != |1.0|

type Literal = StringLiteral | NumberLiteral

interface StringLiteral {
  type: "string-literal";
  value: string;
}

interface NumberLiteral {
  type: "number-literal";
  value: number;
  source: string; // Maybe, TBD
}

@macchiati
Copy link
Member

macchiati commented Mar 10, 2024 via email

@aphillips aphillips added Preview-Feedback Feedback gathered during the technical preview data model Issues related to the Interchange Data Model labels Mar 10, 2024
@catamorphism
Copy link
Collaborator

catamorphism commented Mar 11, 2024

IMO distinguishing between types of literals isn't too useful without introducing a type system.

On the one hand we have "all literals are strings". On the other hand, we could introduce typing rules, which could mean requiring input variables to be annotated with types, or could mean a sort of hybrid approach where type errors involving only literals are statically checked (that is, checked whenever data models are checked). My feeling is that points on the design spectrum between those two points aren't too helpful, because eventually you stumble into a type system and you might as well start out with one.

I'm not against a type system, but it might take some thought to figure out how to let custom function writers specify the types of their functions in a programming-language-neutral way. It would be a hard problem how to reconcile a type system for MessageFormat with the ability to write custom functions and the possibility that those functions might be implemented in a unityped language like JS.

@macchiati
Copy link
Member

macchiati commented Mar 11, 2024 via email

@aphillips
Copy link
Member

@macchiati

"The" data model in our discussions refers to the data model defined in the specification. It is intended as an interchange format and thus can be formalized. Implementations are not required to implement it (or any other data model) and we say this explicitly. They can also extend "the" data model.

I think the spec should be neutral as to whether the implementation uses
strong typing, weak typing, or completely untyped. That is, a data model in
a real implementation should be able to use strong typing, but we should
not prescribe it.

We go out of our way not to be typed or to favor a given type system, but we recognize that implementations cannot avoid typing. The whole point of message formatting, after all, is to insert data values in a locale-appropriate way into a string. This dichotomy is why the spec has tortured locutions about "implementation defined types": we never say what these types are and we generally restrict discussion of them to registry.md. The only way to coerce a type is via a function (annotation). MF never knows nor cares about the types of operands or any values. Only the (locally-supplied) functions care. At the same time, we don't require implementations to remove typing information either.

@mihnita mihnita added the blocker-candidate The submitter thinks this might be a block for the next release label Aug 28, 2024
@aphillips
Copy link
Member

This is related to the discussion we had in the 2024-09-16 call, which we deferred resolution until 46.1

@aphillips aphillips added LDML46.1 MF2.0 Draft Candidate and removed LDML46 LDML46 Release (Tech Preview - October 2024) labels Sep 16, 2024
@aphillips aphillips added the resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker-candidate The submitter thinks this might be a block for the next release data model Issues related to the Interchange Data Model LDML46.1 MF2.0 Draft Candidate Preview-Feedback Feedback gathered during the technical preview resolve-candidate This issue appears to have been answered or resolved, and may be closed soon.
Projects
None yet
Development

No branches or pull requests

5 participants