JSON?

I love how compact the JmdictFurigana.txt file format is but it certainly is non-standard and not trivial to parse correctly. I wonder if you've thought about distributing the data as some kind of JSON?

I have a [package](https://github.com/fasiha/jmdict-furigana-node) that parses the compact JmdictFurigana.txt into a line-delimited JSON file, whose lines are for example:
```json
{"text":"アカガエル科","reading":"アカガエルか","furigana":["アカガエル",{"ruby":"科","rt":"か"}]}
{"text":"給料明細","reading":"きゅうりょうめいさい","furigana":[{"ruby":"給","rt":"きゅう"},{"ruby":"料","rt":"りょう"},{"ruby":"明","rt":"めい"},{"ruby":"細","rt":"さい"}]}
```
Each line is valid JSON, with the following schema (in TypeScript notation, so with `type X`, `X` never shows up in the generated JSON):
```ts
type Ruby = {
  ruby: string,
  rt: string,
};

type Furigana = string|Ruby;

type Entry = {
  furigana: Furigana[],
  reading: string,
  text: string,
};
```
I use `ruby`/`rt` to match [HTML](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ruby).

This particular line-delimited JSON format expands the 8.7 MB original to 24 MB, but gzip compression means they're 2.5 MB versus 3.8 MB respectively over the wire. I can imagine replacing the `Entry` schema above with a simpler array-based one, something like `type Entry = [string, string, Furigana[]]` if we wanted to reduce filesize, or imitate the current JmdictFurigana.txt format.

Feel free to say no if you've thought about this and didn't want to support it! I've parsed the file in three languages now (JavaScript, Clojure, and again TypeScript/JavaScript) and it's sufficiently tricky that I thought I'd ask. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JSON? #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

JSON? #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions