Thanks to visit codestin.com
Credit goes to github.com

Skip to content

data-model: Better represent annotations in the data model, particularly when unsupported #554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 8, 2023

Conversation

gibson042
Copy link
Collaborator

More accurately captures annotations in the data model, particularly when they are unsupported, and also just generally increases alignment between data model representations and the ABNF.

Fixes #552

@gibson042 gibson042 requested review from aphillips and eemeli December 5, 2023 03:49
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree with the ABNF changes. The data model stuff looks good tho.

Comment on lines 23 to 30
/ private-use-expression / reserved-expression
literal-expression = "{" [s] literal [s annotation] [s] "}"
variable-expression = "{" [s] variable [s annotation] [s] "}"
function-expression = "{" [s] annotation [s] "}"
annotation = (function *(s option))
/ reserved-annotation
/ private-use-annotation
function-expression = "{" [s] function-annotation [s] "}"
private-use-expression = "{" [s] private-use-annotation [s] "}"
reserved-expression = "{" [s] reserved-annotation [s] "}"
annotation = function-annotation / private-use-annotation / reserved-annotation
function-annotation = function *(s option)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this come from?

I like the existing tight definition of annotation and am not very thrilled by lots of different expression types. The syntax doesn't need the proliferation of expressions. The data model (above) captures the deeper split between functions, private-use (maybe supported), and reserved (unsupported) and that is an outcome of parsing the syntax. But we don't, I guess, need to go all-in with every kind of placeholder. THoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to make this change, but if we don't then the less direct translation from ABNF to data model can be unnecessarily confusing. Is that worth having fewer productions in the grammar?

expression = literal-expression / variable-expression / nullary-expression
literal-expression = "{" [s] literal [s annotation] [s] "}"
variable-expression = "{" [s] variable [s annotation] [s] "}"
nullary-expression = "{" [s] annotation [s] "}"
annotation = (function *(s option))
           / reserved-annotation
           / private-use-annotation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (normative!) grammar is way more important to me than the (informative) data-model. It's important that the grammar be easily understood and as conceptually clear as possible (using Einstein's simplicity maxim--as simple as possible but no simpler).

Reserved annotations don't really matter--no one can use them. The private use ones are where any additional heat comes into play. The nature of how we set up private-use is that, while they might be "function-like" they don't have to be.

I don't think nullary-expression is as clean as function-expression or maybe annotation-expression?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think nullary-expression is as clean as function-expression or maybe annotation-expression?

function-expression is definitely wrong, and part of the reason for this PR existing in the first place. I originally started with annotation-expression, but abandoned it because {:func} doesn't really have a better claim to that label than {$var :func}—the distinguishing characteristic isn't the presence of an annotation (which is also valid in literal-expression and variable-expression), but rather the absence of an operand, for which "nullary" is the best label of which I am aware.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is all in where one places the stress in the description? An annotation-expression contains only an annotation. Note that annotations are optional in literal and variable expressions. In any case, I don't think saying annotation-expression will confuse anyone. nullary-expression requires the explanation you gave above (and the nullary-ness is invisible by definition)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Done.


"literal-expression": {
"type": "object",
"properties": {
"type": { "const": "literal-expression" },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a type at this level would be unnecessary and noisy. It's effectively duplicating information that's already available in the model as

arg ? arg.type : annotation.type

Copy link
Collaborator Author

@gibson042 gibson042 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many levels where type is unnecessary (e.g., SelectMessage has selectors and variants while PatternMessage has pattern; VariableRef has name while Literal has value), and in fact the sole exception that I see could be refactored (inputDeclaration.name must always be equal to inputDeclaration.value.arg.name). So what guidelines should be used? What I have done here is include type in the constituents of any union—e.g., Expression = LiteralExpression | VariableExpression | …, and all the subtypes can be trivially discriminated by type.

arg ? arg.type : annotation.type

I want to avoid the need for consumers to invent such cleverness, especially since it is not robust in the face of adding private-use expressions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I'm looking at this, an expression is a container, i.e. the {…} part of the syntax where the might contain an operand, and it might contain an annotation, but in any case the surrounding { and } stay the same. So the data model should not change or change as little as possible throughout such changes as well.

As it does make sense to encode in the data model the "expression must not be empty" bit, we do need at least two different shapes for the expression contents, i.e. { arg: …, func?: … } | { func: … }. But why go further than that?

I do get your argument about private-use, but if we're renaming its slot in the expression from func to annotation, then I don't see how it needs to impact the expression-level syntax. OTOH, if we were to continue with a dedicated slot for functions, then I could see something like this making sense:

type Expression = ValueExpression | FunctionExpression | UnsupportedExpression;

interface ValueExpression {
  type: "expression";
  arg: Literal | VariableRef;
  func?: FunctionRef;
}

interface FunctionExpression {
  type: "expression";
  arg?: null;
  func: FunctionRef;
}

interface UnsupportedExpression {
  type: "unsupported";
  arg?: Literal | VariableRef | null;
  annotation: UnsupportedAnnotation;
}

with UnsupportedAnnotation as you currently have it.

One previously unstated optimization target I have here is the developer experience of working with MF2 data models, which shows up here in two different ways:

  1. If I need to write a script that'll e.g. add a function to all unannotated placeholders matching some variable name, needing to change the expression type in addition to the func values feels pretty clumsy.
  2. Needing to work with string identifiers that are over 20 characters long is just asking for typos.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the new type fields.

@aphillips aphillips added syntax Issues related with syntax or ABNF data model Issues related to the Interchange Data Model specification Issue affects the specification labels Dec 5, 2023
@gibson042 gibson042 force-pushed the gh-552-expression-data-model branch from 96228bd to a5553ce Compare December 8, 2023 17:15
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or ship it...

~~~~𖠳~~~~

@aphillips
Copy link
Member

@eemeli If you approve, I can merge

@aphillips aphillips merged commit 7c00820 into unicode-org:main Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data model Issues related to the Interchange Data Model specification Issue affects the specification syntax Issues related with syntax or ABNF
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a PR for renaming FunctionExpression
3 participants