-
-
Notifications
You must be signed in to change notification settings - Fork 36
Discussion thread for code mode introducer #526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Out of interest, I went and looked at the frequencies with which the sigils suggested for Option D
So if going this way, I think One negative for If we go with |
In #521 @macchiati made this comment, which I copy here for visibility and because I am closing that thread. Note that @duerst also made a comment about non-ASCII there, which I'm not moving over. It seems to me that there is a higher-level breakdown, where 'special sequence' below is a sequence of one or more ASCII punctuation/symbol characters. Alpha: mark the start of a complex message with a special sequence, and either the end of the declarations or the the end of the complex message with a special sequence (Option A, F) Beta: mark the start of a complex message with a special sequence (Option B, C, E) Gamma: mark start of each keyword that starts statements with a special sequence (Option D) Personally, I don't think either Alpha or Gamma add much value; and the key to Beta is to pick a sequence that is distinctive, doesn't collide with any other sequence, and very unlikely to start a simple message (thus probably a multi character sequence of symbols instead of a single symbol.
|
@macchiati I think that's one way one could break down the options. I think your "Alpha" would be more succinctly put as: "enclose all or part of a complex message with a special sequence". There's nothing wrong with this (many many syntaxes feature enclosure--our own syntax encloses both patterns and expressions [placeholders] already). But my feeling here is that the enclosure is one layer too many. My preference for "Gamma" (option D) is precisely that it adds nothing to the syntax. One has to learn the keywords anyway: ours are just spelled funny. It's easy to understand because there's only one concept. I could be okay with "Beta", but the challenge (as Mark points out) is finding the right sequence. My concern would be that I don't see what the value is to the user with B/C/E (except, of course, that the message parses correctly). It's a piece of extra gunk that we require everyone to type in order to serve the parser. @stasm Proposed option F, which is a preamble syntax, and he's expressed in teleconference (and elsewhere) that it's still underdeveloped as a proposal. I like the idea of having a conceptual model for messages that make it easier to manage the code and patterns. I also like the idea of getting rid of On the other hand, I don't see how the preamble concept works by dividing
I think it's useful to spend time writing messages in a syntax to get a feel for it. Thinking about message in the abstract doesn't tell me much finger knowledge or "feel" of the syntax compared to trying to do different things. |
My preference would be to use multiple
A vast majority of messages with selectors will only ever have one selector; for those, this change would let us get rid of one |
Option F proposes to shift from thinking about I see F as a solution to the problem of having all this additional stuff that we want to decorate the message with. Selectors, input, locals... where do we put it all? We only have the message itself to work with, so it needs to be in-band. Instead of solving this through nesting (text → code → text), F solves it with the preamble (text → code; text). The preamble is where we declare inputs, locals, and selectors. It's a bit as if the message was a map with a semantic comment:
You're right, parsers will do fine without an explicit closing delimiter for the preamble; that's how it works right now after all. I meant that the closing delimiter can help people understand how the message is structured. On a related note, I'm not fond of the fact that both |
My first preference is for F, but I also see benefits to A which I'd like to make sure we don't omit. The closing
We'd need to figure out the exact syntax and whitespace handling for simple messages (perhaps wrap them in
This way we could use a single brace, followed by a |
Actually, it's the other way around. Consider for instance this YAML representation of your example: key-simple: A simple message
key-complex: '{{
input {$count :number}
match {$count :plural}
when one {{One thing}}
when * {{Many things}}
}}'
another-key: '{{ {{Pattern}} }}' First of all, many such formats consider Second, that's actually invalid YAML. Can you see the error? |
Can you give a few other examples, besides YAML?
I don't know YAML well enough to be good at parsing it visually. |
@stasm mentioned:
Totally. For me, people's "association" between @eemeli suggested:
What I don't like about this is that the match statements are laid out vertically while the keys are horizontal. I agree with @stasm that |
At least some JSONC variants and Hjson will allow for some messages to go unquoted, but will require quotes if they start with
The error is that the last line of the This is effectively one of the corner-cases of YAML, and it will not be caught by all YAML processors. The way that e.g. the syntax highlighter here doesn't catch the error will make the experience even worse. |
I consider this a bad pattern that should be avoided rather than emulated—even being familiar with it, I can't tell you how many times I've been burned when converting |
I don't have as strong an intuition for the set of options for the question at hand, but the illustration of this point makes me elevate Option A to be an equal first choice because it allows one to write messages in a way that avoids that problem. I don't think Option F allows for a "simple message" pattern to be a degenerate subset of a "complex message". Option D still seems like a good first choice since it strikes the best compromise for the new requirement set as of the last 2 months. The caveats of the escaping that it requires for a simple message are the narrowest.
Oof. This example makes me shudder. I still think of our syntax as describing the input data to the API. I know we've made affordances for declarations, which I can rationalize as the "refactoring" of repetitive data, and it seems fine to me. I can still squint and look at it sideways and call it data. I don't want us going any further along the cliche of unwittingly recreating a Yet Another Micro Language. If the |
The thing is, none of the options allow for that.
These options are fundamentally the same, and e.g. removing the prefix from a complex message results in dramatically different treatment of the remainder (such as interpreting
These options are slightly different, but suffer from the same flaw—consider
Same issue again, with the added complexity that the result of unwrapping |
I didn't mean to trigger the discussion about simple messages with my original comment. I was only advocating for considering wrapping messages in single curly braces, in combination with dot-prefixed I agree that we should be mindful of iteration hazards of text-first mode. I filed #512 which I think would be a good place to continue this topic. |
What I meant by "the message is a map with a semantic comment", was that it's a map of variant keys to variant patterns, and that declarations are extra metadata which describes how to handle this map. The actual data that we encode is the translation content; multiple patterns keyed on some CLDR categories for the most part. |
I don't understand this statement? The
That's correct: they are variations on the magic starter sigil theme and the difference between them is basically trying to find the right tension between the need to escape the starter sequence and the need for people to type an elaborate (and otherwise non-functional) starter. -- I think your fundamental argument is that we should start in code mode? If so, I'm not sure that's very useful to a conversation about how to switch from (starting in) text mode to code mode. If our syntax choice were different, we might have different choices here. But this is our syntax. If we don't like -- I have the idea that there is something conceptually to "F", but I haven't seen a compelling and understandable syntax emerging from it. Ease of authoring and ease of understanding is maybe the most fundamental thing to me. It occurs to me that the problem with (F) is that the enclosing gorp (as with (A)) does nothing except provide an opportunity for syntax errors. And it is conceptually hard for the average user because the
Anyway, I took (F) out of my vote today because I don't see how to make it into a compact, clean syntax. |
No, my fundamental argument is that "simple message" should be a degenerate case of "complex message" (having no declarations or matcher) rather than a distinct grammar—and that can be achieved with a decision to start in code mode (e.g. by requiring patterns to always be quoted) or to start in text mode (e.g. with a sigil to indicate a declaration or match).
Agreed, which is why I have not voted. But I felt it important to emphasize that no option on the ballot addresses this issue. |
They feel like they belong together because we've been equating
I think the square brackets do help, but what's jarring to me about this example is that it looks like weird text, maybe some instruction that shouldn't even be sent for translation but accidentally was, but it doesn't (to me) look like part of the message's syntax. I think that the text-first mode comes with a promise: it's all text and always text, unless it's wrapped in brackets. B, C, and D (and E to some extent) break that promise. This is why I've been thinking out loud whether it wouldn't be best to combine A and D:
|
@gibson042 noted:
Starting in text mode with a sigil for declaration or match is option D, no? I mean, I see what you're saying, which is that you'd like to have unquoted patterns (so the degenerate form is a simple message), but isn't that the syntax decision we just made? That could only work if we quoted the variant keys, a la:
I think we either need to put changes like this out of scope ("disagree and commit") or recognize we have a kind of "failure mode" for this project. FWIW, I did offer up the syntax choice in the UTW presentation after showing the whitespace trimming options including multiline. Sadly the recording of the presentation was somehow lost. But the result was effectively a 50/50 split of the room (people on this thread were in the room, so it's not just me saying that's what happened) @stasm suggested:
What does the external decoration of |
I have a request for people sharing complex message examples in this thread: The vast majority of real-world complex messages will only ever have one selector, with 2-3 variants. Unless there's a specific need to use a differently shaped complex message, could the examples used to determine the appearance of complex messages prefer that shape over others, so that we do not unduly favour edge cases in our decisionmaking? |
I'm uneasy about the bare That said, I can see how and why I acknowledge that the closing
Yep, I'm not very happy about it, but it's all doable, and frankly, not all that worse than having to recognize I'm also not fond of the fact that A is sort of saying: in our text-first syntax, you can have a message that's effectively a single placeholder and nothing else, and then inside that placeholder you can have extra logic. It sounds so close to MF1's nested selectors that I'm worried that people will keep attempting to nest variants long after we're done with MF2.
None of us enjoys A's closing To sum up, I'm not fond of either option. I dislike bare We haven't had much time to iterate and discuss this, and I'm not overly confident in my vote. I think allowing ourselves to use Fundamentally, I think we should first find the syntax for expressing the map of variant keys and variant patterns, and only then think about where and how to put declarations. |
@stasm noted:
(chair hat OFF) If you mean
I think this would be a conversation about the structure of the It occurs to me that some of this discussion stems from our being an "in-a-string" format. We solved this in PUFF (and Fluent, etc.) by using document structure to literally make keys be keys and values values. If we were a file format, we wouldn't be having this conversation! I think it is interesting that a pattern cannot start a message unless there is no code. This makes options like (C) possible. So: I like D because it takes away special knowledge and extra typing. Once you've typed the keyword (with its dot) you're just in code mode. But I could accept most of the other proposals. -- (chair hat ON) only a few hours left to vote. I look forward to our discussion Monday! |
I'm sorry I missed the deadline. But for YAML, options B/C/D, and in particular sigils '.', '>', and '~' are clearly better than the others. |
@duerst Do you want to submit a vote? |
@aphillips Yes, but that is not sufficient on its own to fulfill the property I am espousing. I don't have any particular attachment to sigil-prefixed keywords or even to starting in code mode vs. text mode, I have an attachment to simplicity of message structure so tool creators and developers and translators are subject to a minimum count of new concepts, friction, and surprise.
I don't care about unquoted patterns for their own sake; they're just necessary to fulfill the property I care about if the format starts in text mode. The problem right now is that patterns must be quoted in a complex message and must not be quoted in a simple one.
The recent decisions have been made in isolation—start in code mode vs. text mode, trim pattern whitespace or not, require vs. support vs. prohibit quoting in complex messages—and AFAICT all have assumed the very bifurcation that I find problematic, and which only #512 seems to even touch on.
This thread is definitely not the right scope. I hope #512 is, although it's currently framed as "document" rather than "decide" and I don't even know if the aspects I'm complaining about were discussed in Seville. But if they were/are discussed and ultimately rejected by the group, then I will comfortably disagree and commit. |
Done. But as I'm late, no need to count. Also, what matters for YAML isn't so much the grammatical structure (which is the main issue being voted on) but the exact choice of introductory character. BTW. some of the options are specific about the introductory character ("the sigil ."), while others are not ("use a sigil"), which may skew the ballot a bit (although I don't think it will create too much of a skew). |
I counted all of the votes, even the late ones, since this particular ballot is input to a technical debate. (FWIW, none of the late votes changed the outcome) |
@duerst Is YAML widely used as a format for resource bundle files? |
(chair hat ON) 🎩 In the 2023-11-20 call the WG consensus was to adopt Option D (sigil introducer) using the
vs.
Closing this issue. |
All I know is that it's used for resources in Ruby on Rails, and that Ruby on Rails is a widely used web application framework. YAML's advantages are that it introduces very little visual clutter, and it allows data representations even closer to programs than JSON. The disadvantages are that it can be quite brittle, and that some features have serious security problems. |
This is where you can freely discuss technical issues related to the ballot in #525
The text was updated successfully, but these errors were encountered: