Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

raphael-proust
Copy link
Contributor

This is a proposal for an addition to the syntax of OCaml: punning in record type definitions.

This is fully backwards compatible in that existing programs are completely unaffected. (Unless I got something wrong?) It allows punning in record type definitions (and in in-line records in variant definitions).

E.g.,

type address = string
type name = string
type dob = Date.t

type registered_user = {
  name;
  dob;
  address;
}

Note that I am unfamiliar with the structure of the compiler. As a result, the implementation of this MR may be somewhat naïve. I'd welcome guidance on how to implement this better.

@nojb
Copy link
Contributor

nojb commented Aug 17, 2023

Incidentally, the LexiFi compiler had this exact feature for a long time, but a few years ago we removed it in order to reduce the diff with upstream. Can't remember right now if it was ever proposed for upstream inclusion.

@bluddy
Copy link
Contributor

bluddy commented Aug 17, 2023

What is the advantage of this punning? It seems like record type definition is not that common.

@raphael-proust
Copy link
Contributor Author

What is the advantage of this punning? It seems like record type definition is not that common.

This syntax change also allows the punning of inline record defintions:

type t =
  | Registerd of { name; dob; address; }
  | Unregistered of { name; }

The advantage is conciseness as well as similarity with the expressions.

@haochenx
Copy link
Member

+1 for this proposal. There are countless times that I have written foo : foo as a field and hoped we had such a feature.

It's a quite common practice in my experience to create toplevel types for groups of fields that would otherwise go to a single record type.

One question (about a potential enhancement and I'm certainly not suggesting it being a blocker):
what if you have to qualify the field type? e.g.

module M = struct
  type foo
  type bar
end

type t = {
  foo : M.foo;
  bar : M.bar;
}

(correct me if I'm wrong but) I don't think this works currently:

type t = M.{ foo; bar; }

but it's probably the most desirable solution.
related #12343 seems related but probably not the same thing.

An alternative sane and useful (but maybe weirder) choice would be

type t = {
  M.foo; M.bar;
}

@raphael-proust
Copy link
Contributor Author

(correct me if I'm wrong but) I don't think this works currently:

Indeed, this is not supported. This hack could work I guess

module LocalForInclusion = struct
  open M
  type t = { foo; bar; }
end
include LocalForInclusion

But that's not really nice.

I think it'd make sense to support { M.foo; M.bar; }.
And I think that, considering #12044, it'd also make sense to support M.{ foo; bar; }.

@dbuenzli
Copy link
Contributor

dbuenzli commented Aug 18, 2023

The advantage is conciseness as well as similarity with the expressions.

That's precisely why I dislike this proposal.

I prefer if the eye can easily spot in which language (type or expression) a definition is. This proposition meddles that at the expense of the person reading the code because we lose important contextual signifier (the name : type construct).

When I read the examples here, my brain is constantly wondering whether I'm not reading a record value and expanding the definitions to normal record field declarations. I wish I could simply read that instead (yes I'm very lazy).

Current punning notations make the reasonable assumption that variable names often coincide with record field names or variable names (for the horrendously obscure let* punning). This PR makes the assumption that type names often coincide with record field names.

I my opinion that's a wrong and undesirable assumption. You should have less type names than record field names because types hints at the regularities of your data.

Suppose I add a creation date and a modification date to your registered_user. Do I really want to read:

type address = string
type name = string
type dob = Date.t
type creation_date = Date.t
type modification_date = Date.t

type registered_user = {
  name;
  creation_date;
  modification_date;
  dob;
  address;
}

or perhaps:

type address = string
type name = string
type dob = Date.t

type registered_user = {
  name;
  creation_date : Date.t;
  modification_date : Date.t;
  dob;
  address;
}

Personally I'd rather read:

type address = string
type name = string

type registered_user = { 
  name : name; 
  creation_date : Date.t
  modification_date : Date.t
  date_of_birth : Date.t 
  address : address 
}

This proposal increases the entropy in the language. But I guess it works if you want to give more work to people who devise code formatting tools.

@nojb
Copy link
Contributor

nojb commented Aug 18, 2023

Suppose I add a creation date and a modification date to your registered_user. Do I really want to read:

I find this to be a compelling argument against this feature.

A different argument against is that conciseness of type declarations does not matter much because type declarations are only written once, as opposed to record expressions and patterns which are written repeatedly.

@bluddy
Copy link
Contributor

bluddy commented Aug 18, 2023

I agree with @dbuenzli and have the same reaction when reading the code examples above. I see no purpose to this proposal.
Record type definitions are relatively uncommon and they should stand out and be as clear as possible - they're the most critical information to anyone reading the code. Making them seem like something else in the language (record creation) is a bad idea IMO.

@haochenx
Copy link
Member

haochenx commented Aug 18, 2023

I'm biased towards this feature but

I prefer if the eye can easily spot in which language (type or expression) a definition is.

I agree with this point in general, but think it is more or less sacrificeable in this case since IIUC there is no intertwining of expression language and record declaration at the syntax level. Adding this feature does reduce the ability to infer syntactical context at a glance to a degree but does not create intersection between type expressions and value expressions.

[..] This PR makes the assumption that type names often coincide with record field names.

I my opinion that's a wrong and undesirable assumption. You should have less type names than record field names because types hints at the regularities of your data.

I disagree. For "common types" like string, Date.t it is arguably a bad coding style, but when the types you are punning are some arbitrary application specific types, the situation changes.

(edit (clarification): I do not disagree that type names coinciding with record field names is less frequent, but I disagree it is rare and uncommon such that the convenience of this syntax should be disregarded. I think the opposite is true: when type names (especially long and descriptive ones) do coincide with record field names, this syntax is very convenient and useful. Although it's arguably not without drawbacks, it's a neat feature and worthy to be considered, and I am in favor with it as it will materially benefit the codebases that I work with day in and day out.)

e.g. when have the following "inner" types,

type student_data = {
  home_class : string;
  enrollment_year : int;
}
type faculty_data = {
  office_room_number : string;
  faculty_id : string;
}
type personal_info = {
  first_name : string;
  last_name :  string;
}

it is very desirable to be able to write

type person =
 | Faculty of { personal_info; faculty_data }
 | Student of { personal_info; student_data }

instead of

type person =
 | Faculty of {
    personal_info : personal_info;
    faculty_data : faculty_data;
   }
 | Student of {
    personal_info : personal_info;
    student_data : student_data;
   }

In this case, this feature does not only save keystrokes, it also ensures one does not accidentally declare a field with typo in its field name such as person_info : personal_info (in other words, it is immediate that the field name and the type name are the same). This mode of errors can easily happen when refactoring type names, and such mistakes are especially annoying to deal with after the refactoring has been shipped.

In other words, the usefulness of this syntax differs depending on the type of program. I'd like to argue it is unfair to assume one should not write programs in OCaml where this coding style is beneficial, and when this coding style suits the best, this syntax offers great benefits.

@haochenx
Copy link
Member

haochenx commented Aug 18, 2023

A different argument against is that conciseness of type declarations does not matter much because type declarations are only written once, as opposed to record expressions and patterns which are written repeatedly.

I would also like to disagree with this. There are valid cases to have record types whose field name labels are used only once (or twice) in value expression contexts. One prominent example is when interfacing with external systems that talk in JSON. In this scenario it is not uncommon to declare a type and use it in the value expression context only once (or twice).
When the OCaml program is the receiving party, it is also common not to write all of the field name labels in the value expression context even once.

@dbuenzli
Copy link
Contributor

it also ensures one does not accidentally declare a field with typo in its field name such as person_info : personal_info

Looks harmless to me.

Besides once you factor in modularity and program evolution, I don't think it makes much sense. At a certain point you might want or need to factor out that personal_info to its own Person.t module without changing your own person type, i.e. have personal_info : Person.t, so you end up using the normal notation anyways.

I don't think there's much gain to link field names and type names; on the contrary. But even if you think there is, the notation only seems to bring tiny benefits in edge cases which you trade against decreased overall simplicity, clarity and regularity.

@glondu
Copy link
Contributor

glondu commented Aug 18, 2023

IMHO, I find the proposal interesting.

@haochenx
Copy link
Member

haochenx commented Aug 18, 2023

it also ensures one does not accidentally declare a field with typo in its field name such as person_info : personal_info

Looks harmless to me.

In a codebase where every other longish field name coincides with its type name, it defies expectation, and for most cases, it constitutes a genuine typo. Well, if the argument is that typos in variable / field / type names are harmless, it's another discussion. At least my opinion is that typos should be avoided, and typos in the inconsistency category is outright harmful as they are breeding ground for genuine mistakes.

Besides once you factor in modularity and program evolution, I don't think it makes much sense. At a certain point you might want or need to factor out that personal_info to its own Person.t module without changing your own person type, i.e. have personal_info : Person.t, so you end up using the normal notation anyways.

Well, this an (intuitive) feature and nothing is taken away from the language. IMO you have nothing to lose except someone may write code that may not fit everyone's preference. I don't see your point about modularity and program evaluation. When the time comes, just desugar and do it the old way. If your points are true, then there is no utility value in the existing field punning syntax.

I don't think there's much gain to link field names and type names; on the contrary. But even if you think there is, the notation only seems to bring tiny benefits in edge cases which you trade against decreased overall simplicity, clarity and regularity.

Well, egde cases or not really depends on what you use OCaml for. As OCaml is a general purpose language, and since both (1) it is reasonable to use long and descriptive type names and (2) there are certain users finding this feature useful hold, IMO it is unfair to dismiss those cases.

In cases where this syntax is useful, the benefits are not tiny:

  • it helps making programs more concise while saving keystrokes
  • it establishes easy properties about your program
  • it helps reducing a category of programming errors (however humble typos may seem, i'd argue that it's a genuine class of programming errors)

Again, I don't think it is fair to overlook the benefits just because your code base / preferred coding style would not benefit from it.

Also IMO this syntax does not outrightly decrease simplicity or clarity. You can indeed argue the opposite:

  • it increase simplicity since it allows more concise code to be written (even it disproportionally benefit verbose style of coding, conciseness is still valuable in verbose code)
  • it increase clarity since it establish the guarantee where field names and type names are the same

The only thing it cause is the regularity of the language, but this is purely a problem of preference and taste. You can configure your formatter to enforce your preferred style in codebase that you controls.

(sorry for the many edits and my bad grammar..)

@dbuenzli
Copy link
Contributor

If your points are true, then there is no utility value in the existing field punning syntax.

I'm not sure exactly how you come to this conclusion. The existing field punning syntax makes sense to me because it links the same kind of objects, named values and I find this equality to be pervasive in the code.

The punning of this PR makes less sense to me because it links named values to their type name which I see as less essential. In general types are here to sort your values, not necessarily match the name you give them (unless you are strong into hungarian notation :-).

Again, I don't think it is fair to overlook the benefits just because your code base / preferred coding style would not benefit from it.

I'm not overlooking the benefits I'm weighting them. Languages are a delicate thing and like any system you design more features and choices do not necessarily translate into better systems.

The only thing it cause is the regularity of the language, but this is purely a problem of preference and taste. You can configure your formatter to enforce your preferred style in codebase that you controls.

I find this view of a language rather problematic. The goal of a programming language is to have a common notation and semantics to explain computational processes to other humans1. If we all live in different dialects, the benefits are lost.

Footnotes

  1. And possibly give them to execute to a machine if that's your thing.

@haochenx
Copy link
Member

If your points are true, then there is no utility value in the existing field punning syntax.

I'm not sure exactly how you come to this conclusion. The existing field punning syntax makes sense to me because it links the same kind of objects, named values and I find this equality to be pervasive in the code.

The punning of this PR makes less sense to me because it links named values to their type name which I see as less essential.

"No utility value" was definitely an overstating as the situations are indeed different in some perspectives. It was my short sighting.

Albeit it is less pervasive and its utility probably won't be universally accepted, I'd like to argue that linking field names and type names are reasonable, and there exist programmers doing that from time to time (and for good reasons). My understanding is that this syntax makes as much sense as the existing field punning for such programmers (me included obviously) in those situations.

A syntactical feature does not have to be essential / super-widely applicable to be helpful. I'm not arguing that this feature is unconditionally a good addition to OCaml, I just like to advocate that this feature would be very useful to us and would like to argue that the benefits outweigh the drawbacks. Therefore I'd like to see this proposal being accepted.

[..] In general types are here to sort your values, not necessarily match the name you give them (unless you are strong into hungarian notation :-).

It is unarguably the case where type names in general need not match the field (/ variable / method / ..) names. But my point is that there exist valid, non-edge, and reasonable cases where such matching makes sense and is extremely useful.

  • One example is a record holding fields where the type of each gives you enough information about the field. In this case, the best option is to share the type name.
  • Another example is when you need / want to nest record types, in which case you have to create an intermediate type. In this case, the best choice for the name of the intermediate type might be the field name where you need the type to be nested.

Naming things in code is hard but important, so avoiding coming up with new names (when it's reasonable to do so) is a good thing.

(And no, I'm not into Hungarian notation (I do occasionally adopting similar naming scheme locally and selectively when it helps code readability))

Languages are a delicate thing and like any system you design more features and choices do not necessarily translate into better systems.

Completely agree. Yet my understand was that you were saying that the benefits of this syntax being "tiny" because the codebases you work with and/or your preferred coding style do not benefit much from it. Which is a fine comment to make, but IMO unfair to this discussion given that this syntax is helpful to others and (arguably) in principle.

The only thing it cause is the regularity of the language, but this is purely a problem of preference and taste. You can configure your formatter to enforce your preferred style in codebase that you controls.

I find this view of a language rather problematic. The goal of a programming language is to have a common notation and semantics to explain computational processes to other humans1. If we all live in different dialects, the benefits are lost.

My apologies on my bad wording could be understood as claiming that sacrificing regularity of language by adding feature is purely a problem of preference and taste in the general sense. I do also share the view of your counter argument and think such sacrifice should be taken seriously and be thought-through in general.

What I was trying to say is that in this case (by introducing the proposed syntax), the reduction of regularity is limited and it causes only minor (?) frustrations to some programmers but not fundamental problems to all programmers (at least in my understanding and from what I read from the comments.) Thus it boils down to a problem of preference and can be solved (at least partially) by formatters.

I'd also like to stress that this syntax does not odd out on itself, as OCaml already has the field punning syntax: again, the reduction in linguistic regularity is limited (although in a different perspective than the discussion in the last paragraph.)

@xvw
Copy link
Contributor

xvw commented Aug 21, 2023

Maybe I'm missing something and my question isn't so much related to the punning proposal but, @bluddy :

  • It seems like record type definition is not that common.
  • Record type definitions are relatively uncommon
    You seem to be insisting on ... the marginality of defining product types via records, and in my experience, I often define and therefore ... use them, so where does the intuition come from that defining a record would be uncommon?

@bluddy
Copy link
Contributor

bluddy commented Aug 21, 2023

@xvw It's based on experience. Even if one defines many record types, it's a tiny percentage of the code compared to the number of times one builds records. The high churn of needing to create and update records is what makes a compelling argument for punning.

I guess one could get into the habit of having every function return a record type of its return values, but the very act of defining a type creates a burden on the programmer that discourages this kind of behavior, making tuples a better choice. Is this what you do? Do you define a record type per function return value? Otherwise I can't really understand the need to make record definition more nimble than it already is.

@raphael-proust
Copy link
Contributor Author

Current punning notations make the reasonable assumption that variable names often coincide with record field names or variable names (for the horrendously obscure let* punning). This PR makes the assumption that type names often coincide with record field names.

I my opinion that's a wrong and undesirable assumption. You should have less type names than record field names because types hints at the regularities of your data.

I this assumption holds more or less true depending on the kind of code you are writing. Of course when you are writing libraries your types describe a generic kind of data (dates, formats, locale, etc.) or data-structures (lists, trees, etc.). But when you are writing an application your types describe specific kinds of data which are only to be found locally. In this case it makes sense to name the types the same way you'll name variables holding that data (and thus fields holding them).

In this respect, the example on my initial message was not convincing. It should read something more like

type registered_user = {
  name: string;
  dob: Date.t;
  address: string;
  email_address: string;
}
type product = {
  name: string;
  id: string;
  price: int;
}

type sales_record = {
  registered_user;
  product;
  quantity: int;
  date: Date.t
}

Moreover, all this discussions around variable name made me reconsider something I had put aside in order to make the proposal smaller:

Maybe type variables should be allowed in punned fields.

type ('data, 'tag) entry = {
  'data;
  'tag;
  priority: int;
}

This would encourage the use of significant names for type variables.

@raphael-proust
Copy link
Contributor Author

I guess one could get into the habit of having every function return a record type of its return values, but the very act of defining a type creates a burden on the programmer that discourages this kind of behavior, making tuples a better choice. Is this what you do? Do you define a record type per function return value? Otherwise I can't really understand the need to make record definition more nimble than it already is.

One thing I'd really like to see is named return values.

I think we could use the syntax of record types (without defining a record type) to provide this.

val fold_map : 
  ('acc -> 'i -> ('acc, 'o))
  -> 'acc
  -> 'i list
  -> { acc: 'acc; outs: 'o list }

And, just like inline records, you wouldn't be able to bind on the "pseudo-record": you would need to destruct it. Conceptually, the function doesn't return a value, it returns several (just like constructors with an inline record don't hold a value but rather several).

let { acc; outs; } = fold_map …

This is the counterpart to named parameters which are already very useful in making function signature self-documenting. Consider

val time_of_epoch : int -> { year: int; month: int; day: int; seconds: int }
val summarise : int list -> { min: int; max: int; median: int; mean: float }

@haochenx
Copy link
Member

Maybe type variables should be allowed in punned fields.

Although I could think of places where punning type variables may be helpful in saving key strokes, it seems like an unnecessary feature to me while adding completely novel grammar. The expected behavior for punning field name with field type would be immediate (or at least make sense I believe) for programmers who are familiar with the existing field name punning in value expression, while punning type variables would be a completely new syntax and I bet we can say the same.

Also, I kind of doubt for its utility value. As type variables are completely locally scoped, there are fewer incentives to punning IMO: you can just abbreviate! Punning field name with type name is different: both field names and type names are non-local (to the type definition) and they have to have the correct (and sometime long) name.

@haochenx
Copy link
Member

haochenx commented Aug 21, 2023

I’m not an expert on type systems, but it seems to me that the named return value proposal you gave would easily mess up type inference. And worse (?), if the record type is indeed private, then .. it would cause so many problems.

You’d want the type to be structural, and not a private record type to use the return values sanely. (At least if the goal here is to have “named return value” instead of some special kind of abstraction: some function you can only call as the expression being matched upon)

You really want the return type to be structural. And you already have a good candidate: object type. A closer alternative would be module type (not sure whether you can write it as a literal) but you won’t get variance support (I think?)

What’s closer to what you want is probably the private record type for polymorphic variant types. I remember I asked Jacque Garrigue why there’s no such feature already. I cannot remember his exact answer but it’s along the line of “could be done, but not easily, and with a lot of design considerations (e.g. subtyping) to think of”.

I use a technique called Poor man’s record type: 'a -> [`name of string] * [`age of int] sometimes but it’s not nice to use.

Anyway, this direction of discussion is definitely off-topic and should probably happen somewhere else if we want/need to continue.

@raphael-proust
Copy link
Contributor Author

if the record type is indeed private

To be clear, I don't want a record type. I just want to reuse the record syntax (type and expression and pattern) but I don't want there to be a type. It's just like inline records in patterns | Foo of {x: int; y: int} which doesn't introduce a type for {x: int; y: int}.

@haochenx
Copy link
Member

haochenx commented Sep 29, 2023

if the record type is indeed private

To be clear, I don't want a record type.

Alas, I used terrible and misleading terminology. By private I meant that the type isn't addressable by the programmers. The correct term should be inline (as so-called in the OCaml manual). For example, you cannot create an alias to the record type being created for the variant's argument in the following type:

type foo = A of { field1: unit; field2: unit }

The type for the variant's argument is nominal but isn't addressable by the programmer.

What I was trying to argue is that having non-programmer addressable inline nominal types as function return values is probably not a good idea. A better alternative is structural record(-like) types which can already be achieved using object types. If the focus is on syntax, I can see it being a wishful thing, but it definitely needs more discussion (which is probably going to be even more controversy than this PR..)

(I think a new issue/discussion should be created for this conversation, but I'm answering here for easier cross-reference. It would be nice if the repo admin could move these comments)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants