Text markup
(
What(Text markup language)
How(Compromise between XML, JSON, and simplicity)
Why(Reject Status Quo)
)They both are effective, but imperfect enough to ponder.
I believe they can be unified. Hopefully you will too in 2 minutes.
You might have gotchas to output whilst reading this document. It is by design. Addressing everyting in this document would make it too long. Please refer to GOTCHAS.md.
Lets sum up pros and cons of each format.
- Simple
- Pretty enough
- Lots of quotes, braces, and colons, do we need that much ?
- Not fluent as a markup language (lots of quoting for small string, no multiline for long string)
- No comments
- Fluent as markup language
- Closing tags redundancy
- Calls for a lot of choices (data vs metadata, attribute vs child node)
- Ugly for config files, or as data format
- No types
JSON and XML both represent trees.
They are transported as text.
We want a text fluent JSON variant.
and
We want a simpler, less verbose XML.
It is possible, and the solution is the same for both goals.
Lets agree on 2 lemmas first.
Given this example:
XML A
<element metadata_0="foo">
<data_1>bar</data_1>
</element>vs
XML B
<element>
<metadata_0>foo</metadata_0>
<data_1>bar</data_1>
</element>A receiver of XML A needs to have a representation of possible metadata beforehand.
For instance your browser knows to take attribute class to apply css classes to elements.
It is a convention.
Promoting this data to metadata (XML B to XML A) does not actually bring any benefit once you accept that there is a convention anyway.
An other way to say it: You cannot replace convention by promoting data to metadata.
You might as well not have attributes(and metadata) in the first place, and focus on convention. This is where JSON shines.
A serious JSON user will validate and cast incoming data.
There is no trust that any value will be cast right after a JSON.parse.
JSON A
{
"id": 5,
"name": "foo",
"job": null
}JSON B
{
"id": "5",
"name": "foo",
"job": "null"
}If a receiver knows that id should be a number, it will validate and cast that piece of data. If a receiver does not know a value type beforehand, it probably is just a medium to another, more informed, terminal receiver.
Whether a terminal value is a string or a number in the transport format is irrelevant either way.
Having primitive types in JSON is thus useless (as a format, outside of being a very useful subset of JavaScript).
If you remove types from JSON, quotes become half as useful, and text becomes first class. This is where XML shines.
If we apply those lemmas to JSON, or to XML, we can end up with the same language.
Here we go.
Metadata is data => we remove attributes from the language.
Text is first class => we do not introduce types, and rely on existing conventions.
Lets start with an typical sample
<html>
<body>
<div class="test">
My First Heading
<p>My first paragraph.</p>
</div>
</body>
</html>We demote metadata :
<html>
<body>
<div>
<class>test</class>
My First Heading
<p>My first paragraph.</p>
</div>
</body>
</html>Now we have the possibility of simplifying the grammar for tags. No need to both open and close those chevrons
html<
body<
div<
class<test class>
My First Heading
p<My first paragraph. p>
div>
body>
html>Now lets remove closing tags redundancy, and switch chevrons to parentheses
html(
body(
div(
class(test)
My First Heading
p(My first paragraph)
)
)
)This is the final form: TMA.
{
"id": 5,
"name": "foo",
"job": null
}Text is first class, we rely on existing conventions for types, so we only keep strings.
{
"id": "5",
"name": "foo",
"job": "null"
}Quotes look redundant now, lets remove them, and take care of now meaningful whitespaces.
{
id:5,
name:foo,
job:null, // trailing comma to disembiguate whitespaces
}lets switch : for ( and ,for )
{
id(5)
name(foo)
job(null)
}Do we actually need braces ? lets switch them for parenthesis too
(
id(5)
name(foo)
job(null)
)We end up with the same grammar.
This is a PoC progression, I do not suggest to lose distinction between null and "null". See GOTCHAS.md.
It is a format:
-
It accepts that conventions exist
-
Text is first class
-
Only relies on
(and)to shape data
TODO: introduce syntax for comments
pseudocode:
<document> ::= <node>
<node> ::= <identifier>? "(" <content> ")"
<content> ::= (<string> | <node>)*
<identifier> ::= ( [a-z] | [A-Z] | "_" ) ( [a-z] | [A-Z] | [0-9] | "_" )*
<string> ::= ( [a-z] | [A-Z] | [0-9] | "_" | "-" | " " | "\n")+
TODO:
- add escaped parenthesis to
<string>, - add rest of UTF8
{
"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{ "value": "New", "onclick": "CreateNewDoc" },
{ "value": "Open", "onclick": "OpenDoc" },
{ "value": "Close", "onclick": "CloseDoc" }
]
}
}
}=>
menu(
id(file)
value(File)
popup(
menuitem(
(value(New) onclick(CreateNewDoc))
(value(Open) onclick(OpenDoc))
(value(Close) onclick(CloseDoc))
)
)
){
"Actors": [
{
"name": "Tom Cruise",
"age": 56,
"Born At": "Syracuse, NY",
"Birthdate": "July 3, 1962",
"photo": "https://jsonformatter.org/img/tom-cruise.jpg"
},
{
"name": "Robert Downey Jr.",
"age": 53,
"Born At": "New York City, NY",
"Birthdate": "April 4, 1965",
"photo": "https://jsonformatter.org/img/Robert-Downey-Jr.jpg"
}
]
}=>
(
Actors(
(
name(Tom Cruise)
age(56)
BornAt(Syracuse, NY)
Birthdate(July 3, 1962)
photo(https://jsonformatter.org/img/tom-cruise.jpg)
)
(
name(Robert Downey Jr.)
age(53)
BornAt(New York City, NY)
Birthdate(April 4, 1965)
photo(https://jsonformatter.org/img/Robert-Downey-Jr.jpg)
)
)
)<html>
<body>
<div class="test">
My First Heading
<p>My first paragraph.</p>
</div>
</body>
</html>=>
html(
body(
div(class(test)
My First Heading
p(My first paragraph.))
)
)Name, Age, City
Alice, 30, New York
Bob, 25, Los Angeles
Charlie, 35, Chicago=>
( Name(Alice) Age(30) City(New York) )
( Name(Bob) Age(25) City(Los Angeles) )
( Name(Charlie) Age(35) City(Chicago) )or, to allow for better scaling (at the cost of using csv header convention):
( (Name) (Age) (City) )
( (Alice) (30) (New York) )
( (Bob) (25) (Los Angeles) )
( (Charlie) (35) (Chicago) )