Thanks to visit codestin.com
Credit goes to github.com

Skip to content

zengm-games/json-web-streams

Repository files navigation

json-web-streams

Build Status GitHub Repo stars GitHub npm npm

  • Stream large JSON files without loading everything into memory
  • Built on the Web Streams API so it runs in web browsers, Node.js, and more
  • Query with JSONPath to extract only the data you need
  • Integrated schema validation with full TypeScript support
  • Tested on the JSON Parsing Test Suite and other edge cases

Installation

npm install json-web-streams

Getting started

Imagine you have some JSON like:

[{ "x": 1 }, { "x": 2 }, { "x": 3 }]

You can just JSON.parse it and then do whatever you want. But what if it's so large that parsing it all at once is slow or impossible?

By using json-web-streams, you can stream through the JSON object without having to read it all into memory. Here's an example that prints out each object as it is parsed:

import { JSONParseStream } from "json-web-streams";

// data.json contains: [{ "x": 1 }, { "x": 2 }, { "x": 3 }]
const response = await fetch("https://example.com/data.json");
await response.body
	.pipeThrough(new TextDecoderStream())
	.pipeThrough(new JSONParseStream(["$[*]"]))
	.pipeTo(
		new WritableStream({
			write({ value }) {
				console.log(value);
			},
		}),
	);

// Output:
// {"x": 1}
// {"x": 2}
// {"x": 3}

Tip

If you don't have to support Safari, most other environments let you use a nicer syntax for consuming stream output as an async iterator:

const stream = response.body
	.pipeThrough(new TextDecoderStream())
	.pipeThrough(new JSONParseStream(["$[*]"]));
for await (const { value } of stream) {
	console.log(value);
}

API

const jsonParseStream = new JSONParseStream(
    jsonPaths: (JSONPath | { path: JSONPath; key?: Key; schema?: StandardSchemaV1 })[],
    options?: { multi?: boolean },
);

jsonPaths: (JSONPath | { path: JSONPath; key?: Key; schema?: StandardSchemaV1 })[]

The first argument to JSONParseStream is an array specifying what objects to emit from the stream. The JSONPath type is a string containing a JSONPath query. JSONPath is a query language for JSON to let you pick out specific values from a JSON object.

(The JSONPath type is just a slightly more restrictive version of a string. I wish it could completely parse and validate the JSONPath syntax, but currently it just enforces some little things like that it must start with a $.)

In many cases, you'll just have one JSONPath query, like:

["$.foo[*]"]

but you can have as many as you want:

["$.foo[*]", "$.bar", "$.bar.baz"]

Important

json-web-streams only supports a subset of JSONPath. See the JSONPath section below for more details and examples. But to briefly explain what a typical JSONPath query means:

$.foo[*] can be broken into three parts:

$ refers to the root node of the JSON object
.foo is similar to accessing an object property in JS, so this is the foo property of the root node
[*] means "every value in this array or object"

In total this query means "emit every value in the array/object in the foo property of the overall JSON object". So for this JSON { foo: ["A", "B", "C"] } it would emit the three values "A", "B", and "C".

The values of the jsonPaths array can either be JSONPath strings, or objects like { path: JSONPath; key?: Key; schema?: StandardSchemaV1 } to specify the additional optional key and schema properties.

key?: Key

If key is defined, it will be propagated through to the key property on output objects. So it can be any type, but typically is probably something like a number, string, or Symbol that you want to use to discriminate between types of outputs when you have multiple entries in the jsonPaths array. When key is not defined, then the value of path is used instead. See the JSONParseStream output section below for examples.

schema?: StandardSchemaV1

schema is a schema validator from any library supporting the Standard Schema specification such as Zod, Valibot, or ArkType. When you supply a schema like this, each value will be validated before it is emitted by the stream, and emitted values will have correct TypeScript types rather than being unknown. For more details, see the Schema validation and types for JSONParseStream output section below.

options?: { multi?: boolean }

There are various JSON streaming formats that make it more convenient to stream JSON by omitting the opening/closing tags and instead emitting multiple JSON objects sequentially. Some of these formats are:

  • JSON Lines (JSONL) aka Newline Delimited JSON (NDJSON) - JSON objects are separated by \n
  • JSON Text Sequences aka json-seq - JSON objects are separated by the unicode record separator character ␞
  • Concatenated JSON - JSON objects are simply concatenated with nothing in between.

Setting multi to true enables support for all of those streaming JSON formats. It's actually a little more permissive - it allows any combination of whitespace and the unicode record separator between JSON objects.

Tip

If you want to emit every one of these individual JSON objects, use the JSONPath query $ which means "emit the entire object", so in multi mode it will emit each of the individual objects.

JSONParseStream input

new JSONParseStream(jsonPaths) returns a TransformStream, meaning that it receives some input (e.g. from a ReadableStream) and emits some output (e.g. to a WritableStream).

Inputs to the JSONParseStream stream must be strings. If you have a stream emitting some binary encoded text (such as from fetch), pipe it through TextDecoderStream first:

const response = await fetch("https://example.com/data.json");
const stream = response.body
	.pipeThrough(new TextDecoderStream())
	.pipeThrough(new JSONParseStream(["$.foo[*]"]));

JSONParseStream output

Output from JSONParseStream has this format:

type JSONParseStreamOutput<T = unknown> = {
	value: T;
	key: Key;
	wildcardKeys?: string[];
};

value is the value selected by one of your JSONPath queries.

key is the JSONPath query (from the jsonPaths parameter of JSONParseStream) that matched value, or the value from the key property inside jsonPaths if that was supplied.

If you only have one JSONPath query, you can ignore key. But if you have more than one, key may be helpful when processing stream output to distinguish between different types of values. For example:

await new ReadableStream({
		start(controller) {
			controller.enqueue('{ "foo": [1, 2], "bar": ["a", "b", "c"] }');
			controller.close();
		},
	})
	.pipeThrough(new JSONParseStream(["$.bar[*]", "$.foo[*]"]))
	.pipeTo(
		new WritableStream({
			write(record) {
				if (record.key === "$.bar[*]") {
					// Do something with the values from bar
				} else {
					// Do something with the values from foo
				}
			},
		}),
	);

Or with a manually defined key:

await new ReadableStream({
		start(controller) {
			controller.enqueue('{ "foo": [1, 2], "bar": ["a", "b", "c"] }');
			controller.close();
		},
	})
	.pipeThrough(new JSONParseStream([{ key: "bar", path: "$.bar[*]" }, "$.foo[*]"]))
	.pipeTo(
		new WritableStream({
			write(record) {
				if (record.key === "bar") {
					// Do something with the values from bar
				} else {
					// Do something with the values from foo
				}
			},
		}),
	);

wildcardKeys is defined when you have a wildcard in an object (not an array) somewhere in your JSONPath. For example:

await new ReadableStream({
		start(controller) {
			controller.enqueue('{ "foo": [1, 2], "bar": ["a", "b", "c"] }');
			controller.close();
		},
	})
	.pipeThrough(new JSONParseStream(["$[*]"]))
	.pipeTo(
		new WritableStream({
			write(record) {
				console.log(record);
			},
		}),
	);
// Output:
// { key: "$[*]", value: [1, 2], wildcardKeys: ["foo"] },
// { key: "$[*]", value: ["a", "b", "c"], wildcardKeys: ["bar"] },

The purpose of wildcardKeys is to allow you to easily distinguish different types of objects. wildcardKeys has one entry for each wildcard object in your JSONPath query.

Warning

It is possible to have two JSONPath queries that output overlapping objects, like if your data is { "foo": [1, 2] } and you query for both $ and $.foo. This will emit two objects: { foo: [1, 2] } and [1, 2]. Due to how json-web-streams works internally, both of those objects share the same array instance, meaning that if the array in one is mutated it will affect the other.

Some schema validation libraries do a deep clone of objects they validate. In that case, you won't have this issue. Otherwise, in the rare case that you query for overlapping objects, you will have to handle this problem, such as by deep cloning one of the objects.

Schema validation and types for JSONParseStream output

If you want to validate the objects as they stream in, json-web-streams integrates with any schema validation library that supports the Standard Schema specification, such as Zod, Valibot, and ArkType.

To use schema validation for a JSONPath query, then pass an object { path: JSONPath; schema: StandardSchemaV1 } rather just a string JSONPath. Then each value will be validated before being output by the stream, and the correct TypeScript types will be propagated through the stream as well.

import * as z from "zod";

await new ReadableStream({
		start(controller) {
			controller.enqueue('{ "foo": [1, 2], "bar": ["a", "b", "c"] }');
			controller.close();
		},
	})
	.pipeThrough(
		new JSONParseStream([
			{ path: "$.foo[*]", schema: z.number() },
			{ path: "$.bar[*]", schema: z.string() },
		]),
	)
	.pipeTo(
		new WritableStream({
			write(record) {
				if (record.key === "$.foo[*]") {
					// Type of record.value is `number`
				} else {
					// Type of record.value is `string`
				}
			},
		}),
	);

For JSONPath queries with no schema, emitted values will have the unknown type.

The type of the key property will be either the string literal path from the input paramter (such as "$.foo[*]) or whatever you put in the key property of the input, so you can use it to discriminate between object types in the output. For example:

import * as z from "zod";

await new ReadableStream({
		start(controller) {
			controller.enqueue('{ "foo": [1, 2], "bar": ["a", "b", "c"] }');
			controller.close();
		},
	})
	.pipeThrough(
		new JSONParseStream([
			{ key: "foo", path: "$.foo[*]", schema: z.number() },
			{ key: "bar", path: "$.bar[*]", schema: z.string() },
		]),
	)
	.pipeTo(
		new WritableStream({
			write(record) {
				if (record.key === "foo") {
					// Type of record.value is `number`
				} else {
					// Type of record.value is `string`
				}
			},
		}),
	);

Tip

If you only want to validate values or override keys for some queries, you can mix { path: JSONPath; key?: Key; schema?: StandardSchemaV1 } and JSONPath in the jsonPaths array.

JSONPath

json-web-streams supports a subset of JSONPath. Currently the supported components are:

  • The root node, represented by the symbol $ which must be the first character of any JSONPath query.

  • Name selectors which are like accessing a property in a JS object. For instance if you have an object like { "foo": { bar: 5 } }, then $.foo.bar refers to the value 5. You can also write this in the more verbose bracket notation like $["foo"]["bar"] or $["foo", "bar"], which is useful if your key names include characters that need escaping. You can also mix them like $.foo["bar"] or use single quotes like $['foo']['bar'] - all of these JSONPath queries have the same meaning.

  • Wildcard selectors which select every value in an array or object. With this JSON { "foo": { "a": 1, "b": 2, "c": 3 } }, the JSONPath query $.foo[*] would emit the three individual numbers 1, 2, and 3. If the inner object was changed to an array like { "foo": [1, 2, 3] }, the same JSONPath query would emit the same values.

You can combine these selectors as deep as you want. For instance, if instead you have an array of objects you can select values inside those individual objects with a query like $.foo[*].bar. Applying that to this data:

{ "foo": [{ "bar": 1 }, { "bar": 2 }, { "bar": 3 }] }

would emit 1, 2, and 3.

Or if the array is at the root if the object like this data:

[{ "x": 1 }, { "x": 2 }, { "x": 3 }]

then you'd write something like $[*] to emit each object ({ x: 1}, {x: 2}, {x: 3}), or $[*].key to emit just the numbers (1, 2, 3).

To emit the whole object at once (okay in that case you wouldn't use this library, but maybe just for testing, or for multi mode) you just use $.

Tip

If you want to play around with JSONPath queries to make sure you understand what they are doing, jsonpath.com is a great website that lets you easily run a JSONPath query on some data.

JSONParseStream examples

There are several examples above in code blocks throughout the README, and here are a few more!

wildcardKeys vs. multiple entries in jsonPaths

Sometimes there are multiple ways to achieve your goal.

Let's say you have this JSON:

{ "foo": [1, 2], "bar": ["a", "b", "c"] }

You want to get all the values in foo and all the values in bar. You could define them as two separate JSONPath queries and then distinguish the output with .key:

await new ReadableStream({
		start(controller) {
			controller.enqueue('{ "foo": [1, 2], "bar": ["a", "b", "c"] }');
			controller.close();
		},
	})
	.pipeThrough(
		new JSONParseStream(["$.foo[*]", "$.bar[*]"]),
	)
	.pipeTo(
		new WritableStream({
			write(record) {
				if (record.key === "$.foo[*]") {
					// 1, 2
				} else {
					// a, b, c
				}
			},
		}),
	);

Or you could use one JSONPath query with a wildcard, and then use .wildcardKeys to distinguish the objects:

await new ReadableStream({
		start(controller) {
			controller.enqueue('{ "foo": [1, 2], "bar": ["a", "b", "c"] }');
			controller.close();
		},
	})
	.pipeThrough(
		new JSONParseStream(["$[*][*]"]),
	)
	.pipeTo(
		new WritableStream({
			write(record) {
				if (record.wildcardKeys[0] === "foo") {
					// 1, 2
				} else {
					// a, b, c
				}
			},
		}),
	);

Using multiple JSONPath queries is a little more explicit, but using wildcard keys is more concise, especially if you had more than just two types of objects. And instead of known keys like foo and bar your JSON had some unknown keys, then using a wildcard would be your only option.

But a nice thing about multiple JSONPath queries is that you can add schema validation to ensure your data is the correct format and give you nice TypeScript types. Whereas if you are using wildcardKeys to distinguish types, there is currently no way to use that information in schema validation.

In this example, Zod schemas enforce that record.value is either a string or number as appropriate, rather than unknown:

import * as z from "zod";

await new ReadableStream({
		start(controller) {
			controller.enqueue('{ "foo": [1, 2], "bar": ["a", "b", "c"] }');
			controller.close();
		},
	})
	.pipeThrough(
		new JSONParseStream([
			{ path: "$.foo[*]", schema: z.number() },
			{ path: "$.bar[*]", schema: z.string() },
		]),
	)
	.pipeTo(
		new WritableStream({
			write(record) {
				if (record.key === "$.foo[*]") {
					// Type of record.value is `number` rather than `unknown`
				} else {
					// Type of record.value is `string` rather than `unknown`
				}
			},
		}),
	);

About

Streaming JSON parser built on top of the Web Streams API

Topics

Resources

License

Stars

Watchers

Forks