-
-
Notifications
You must be signed in to change notification settings - Fork 36
Data model feedback: I think we should have string and numeric literals #712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
message-format-wg/spec/formatting.md Lines 175 to 178 in e761964
|
Numeric literals are not numbers. That is, it is acceptable to add quotes to any I see the problem that you're grappling with, @mihnita, which is that you can't reflect off of a string in a placeholder to get a number. You might like |
Literals in the message text are always strings. Their structure is given
meaning by the function that looks at them. We might have some specific
formats defined in by the standard registry functions that can be shared
across other functions, inside and outside the registry, but a custom
function can define its own structure for a literal. Moreover, there are
different environments where literals are found, and the structure may be
specific to them.
For example:
.local $consumption {|1-liter-per-100-kilometers| :u:measure}
.match {$distance}
|[0,5)-meter| {{Not far enough.}}.
...
The second literal above represents a range (open, closed, or half-open
It would not be acceptable as the operand of a .local for :u:measure, but
it might be for a u:measureRange.
I think literals as part of a message part, because they have no function,
cannot be other than just strings; typically because some character needs
escaping.
…On Fri, Mar 8, 2024 at 4:32 PM Addison Phillips ***@***.***> wrote:
Numeric literals are not numbers.
They are a sub-production of literal that makes it convenient to use
numeric values in the syntax.
We have number-literal instead of mutating name a bunch.
That is, it is acceptable to add quotes to any numeric-literal.
I see the problem that you're grappling with, @mihnita
<https://github.com/mihnita>, which is that you can't reflect off of a
string in a placeholder to get a number. You might like number-literal to
turn into a number. but {|123|} is just as valid as {123}. What I think
you'll have to do to get the intuitive behavior you're after is check if
the literal parses as a number in order to support automatic assignment of
:number instead of :string.
—
Reply to this email directly, view it on GitHub
<#712 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMFKCDAVPGBIAQH6NY3YXJKALAVCNFSM6AAAAABENS2OI6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBWGYYDANJWG4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
This is independent of The function And the MF2 implementation itself should not know about And when I say "intuitive behavior" I am mostly thinking about a users of MF2, someone writing messages. And that intuition might be programming language dependent :-)
Absolutely. |
@mihnita I think options are the same thing. Functions need to specify what string serialization they accept. For an expression like In your case, you're probably using NumberFormatter as your ultimate formatter, but you'll have some code that parses the option value to make it into a number (or kvetches that it isn't sufficiently numeric). In the data model, the value of the option is a string. In the function registry, the value of that string might be constrained. |
@mihnita How would you represent the operand of this expression in the data model?
|
Same as today, except that 1.00 would be a Same as JS and most programming languages, 1.00 is a number, "1.00" or '1.00' is a string. type Literal = StringLiteral | NumberLiteral
interface StringLiteral {
type: "string-literal";
value: string;
}
interface NumberLiteral {
type: "number-literal";
value: number;
source: string; // Maybe, TBD
} |
I think it can be misleading to talk about 'the' data model, without
context.
"This section defines *a* data model representation of MessageFormat 2
*messages*.
Implementations are not required to use this data model for their internal
representation of messages. Neither are they required to provide an
interface that accepts or produces representations of this data model."
So I presume you mean, in *the* data model (used by *a particular*
implementation).
And there can be a lot of variation.
An implementation could produce a 'deep' data model where as much as
possible is transformed into internal data types and optimized for runtime
use. In this particular case, it would do that by calling *x:number* to
parse the literal operand, and that 1.00 might be represented as a double,
a BigNumber, a Rational, a ComplexNumber, or some other datatype.
Similarly, in {$var :number *numberingSystem=arab*} the option value might
be converted to a NumberingSystem enum, so that at runtime it does not need
to be parsed in order to pass it to a UnlocalizedNumberFormatter. In fact,
most of the :number options could be used to build
an UnlocalizedNumberFormatter, and then at runtime the only additional
parameters that are needed are the locale and the value of $var.
…On Sun, Mar 10, 2024 at 1:28 AM Mihai Nita ***@***.***> wrote:
@mihnita <https://github.com/mihnita> How would you represent the operand
of this expression in the data model?
{ 1.00 ❌number }
Same as today, except that 1.00 would be a NumberLiteral instead of
Literal.
And {|1.00| ❌number} would be a StringLiteral.
Same as JS and most programming languages, 1.00 is a number, "1.00" or
'1.00' is a string.
So 1.00 == 1.0, but |1.00| != |1.0|
type Literal = StringLiteral | NumberLiteral
interface StringLiteral {
type: "string-literal";
value: string;}
interface NumberLiteral {
type: "number-literal";
value: number;
source: string; // Maybe, TBD}
—
Reply to this email directly, view it on GitHub
<#712 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMBQNAPLFW5XIHBXXBDYXQRSHAVCNFSM6AAAAABENS2OI6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXGE2TQMRYGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
IMO distinguishing between types of literals isn't too useful without introducing a type system. On the one hand we have "all literals are strings". On the other hand, we could introduce typing rules, which could mean requiring input variables to be annotated with types, or could mean a sort of hybrid approach where type errors involving only literals are statically checked (that is, checked whenever data models are checked). My feeling is that points on the design spectrum between those two points aren't too helpful, because eventually you stumble into a type system and you might as well start out with one. I'm not against a type system, but it might take some thought to figure out how to let custom function writers specify the types of their functions in a programming-language-neutral way. It would be a hard problem how to reconcile a type system for MessageFormat with the ability to write custom functions and the possibility that those functions might be implemented in a unityped language like JS. |
I think the spec should be neutral as to whether the implementation uses
strong typing, weak typing, or completely untyped. That is, a data model in
a real implementation should be able to use strong typing, but we should
not prescribe it.
That is, it would be perfectly fine to have an implementation generate a
data model where a string in the message source like
.local $var = {1 ❌number style=compact foo=bar}
Turns into strong-typed, front-loaded:
MFFunction f = registry.lookup("x:number");
put("var", new Expression(f.parseLiteral("1"), f,
f.parseOptions("style=compact foo=bar"));
where
f.parseLiteral("1") produces new Complex(1,0)
f.parseOptions produces Map.of("style", Style.valueOf("compact")));
Or, it could turn into the completely untyped, back-loaded:
put("var", new Expression("1", "x:function", "style=compact, foo=bar"))
…On Mon, Mar 11, 2024 at 11:26 AM Tim Chevalier ***@***.***> wrote:
IMO distinguishing between types of literals isn't too useful without
introducing a type system. I'm not against that, but it might take some
thought to figure out how to let custom function writers specify the types
of their functions in a programming-language-neutral way. It would be a
hard problem how to reconcile a type system for MessageFormat with the
ability to write custom functions and the possibility that those functions
might be implemented in a unityped language like JS.
—
Reply to this email directly, view it on GitHub
<#712 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMCBCDTOUGXVBB6AN6DYXYAM3AVCNFSM6AAAAABENS2OI6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBZGE2TINBSG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
"The" data model in our discussions refers to the data model defined in the specification. It is intended as an interchange format and thus can be formalized. Implementations are not required to implement it (or any other data model) and we say this explicitly. They can also extend "the" data model.
We go out of our way not to be typed or to favor a given type system, but we recognize that implementations cannot avoid typing. The whole point of message formatting, after all, is to insert data values in a locale-appropriate way into a string. This dichotomy is why the spec has tortured locutions about "implementation defined types": we never say what these types are and we generally restrict discussion of them to |
This is related to the discussion we had in the 2024-09-16 call, which we deferred resolution until 46.1 |
At this point the data model only has string literals:
The parser also has
number-literal
When we format a message we use the data model only.
Which means there is no way to tell the difference between
"...{|123456789|}..."
and"...{123456789}..."
Because in the data model we only have a string, and "The presence or absence of quotes is not preserved by the data model."
But I think one would expect that
{|123456.789|}
to result in "123456.789" (because it is a string),and would expect
{123456.789}
formatted as "123,456.789" (or "123.456,789", maybe with alternate digits).Because "it is a number".
It means the placeholders without functions are not intuitive:
"...{123456789}..."
=>"...123456789..."
"...{123456789 :number}..."
=>"...123,456.789}..."
Numeric literals are also found in options:
...{$foo :function opt1=bar opt2=baz opt3=42}...
, and in decision keys.TLDR:
We have numeric literals in syntax.
We need to know if a literal was numeric when we format to string.
But we drop that info in the data model, which sits in the middle.
The text was updated successfully, but these errors were encountered: