77..
sectionauthor ::
Bob Ippolito <[email protected] > 88
99`JSON (JavaScript Object Notation) <http://json.org >`_, specified by
10- :rfc: `4627 `, is a lightweight data interchange format based on a subset of
11- `JavaScript <http://en.wikipedia.org/wiki/JavaScript >`_ syntax (`ECMA-262 3rd
12- edition <http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdf> `_).
10+ :rfc: `7159 ` (which obsoletes :rfc: `4627 `) and by
11+ `ECMA-404 <http://www.ecma-international.org/publications/standards/Ecma-404.htm >`_,
12+ is a lightweight data interchange format inspired by
13+ `JavaScript <http://en.wikipedia.org/wiki/JavaScript >`_ object literal syntax
14+ (although it is not a strict subset of JavaScript [#rfc-errata ]_ ).
1315
1416:mod: `json ` exposes an API familiar to users of the standard library
1517:mod: `marshal ` and :mod: `pickle ` modules.
@@ -465,18 +467,18 @@ Encoders and Decoders
465467 mysocket.write(chunk)
466468
467469
468- Standard Compliance
469- -------------------
470+ Standard Compliance and Interoperability
471+ ----------------------------------------
470472
471- The JSON format is specified by :rfc: `4627 `. This section details this
472- module's level of compliance with the RFC. For simplicity,
473- :class: `JSONEncoder ` and :class: `JSONDecoder ` subclasses, and parameters other
474- than those explicitly mentioned, are not considered.
473+ The JSON format is specified by :rfc: `7159 ` and by
474+ `ECMA-404 <http://www.ecma-international.org/publications/standards/Ecma-404.htm >`_.
475+ This section details this module's level of compliance with the RFC.
476+ For simplicity, :class: `JSONEncoder ` and :class: `JSONDecoder ` subclasses, and
477+ parameters other than those explicitly mentioned, are not considered.
475478
476479This module does not comply with the RFC in a strict fashion, implementing some
477480extensions that are valid JavaScript but not valid JSON. In particular:
478481
479- - Top-level non-object, non-array values are accepted and output;
480482- Infinite and NaN number values are accepted and output;
481483- Repeated names within an object are accepted, and only the value of the last
482484 name-value pair is used.
@@ -488,43 +490,29 @@ default settings.
488490Character Encodings
489491^^^^^^^^^^^^^^^^^^^
490492
491- The RFC recommends that JSON be represented using either UTF-8, UTF-16, or
492- UTF-32, with UTF-8 being the default.
493+ The RFC requires that JSON be represented using either UTF-8, UTF-16, or
494+ UTF-32, with UTF-8 being the recommended default for maximum interoperability .
493495
494496As permitted, though not required, by the RFC, this module's serializer sets
495497*ensure_ascii=True * by default, thus escaping the output so that the resulting
496498strings only contain ASCII characters.
497499
498500Other than the *ensure_ascii * parameter, this module is defined strictly in
499501terms of conversion between Python objects and
500- :class: `Unicode strings <str> `, and thus does not otherwise address the issue
501- of character encodings.
502+ :class: `Unicode strings <str> `, and thus does not otherwise directly address
503+ the issue of character encodings.
502504
505+ The RFC prohibits adding a byte order mark (BOM) to the start of a JSON text,
506+ and this module's serializer does not add a BOM to its output.
507+ The RFC permits, but does not require, JSON deserializers to ignore an initial
508+ BOM in their input. This module's deserializer raises a :exc: `ValueError `
509+ when an initial BOM is present.
503510
504- Top-level Non-Object, Non-Array Values
505- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
506-
507- The RFC specifies that the top-level value of a JSON text must be either a
508- JSON object or array (Python :class: `dict ` or :class: `list `). This module's
509- deserializer also accepts input texts consisting solely of a
510- JSON null, boolean, number, or string value::
511-
512- >>> just_a_json_string = '"spam and eggs"' # Not by itself a valid JSON text
513- >>> json.loads(just_a_json_string)
514- 'spam and eggs'
515-
516- This module itself does not include a way to request that such input texts be
517- regarded as illegal. Likewise, this module's serializer also accepts single
518- Python :data: `None `, :class: `bool `, numeric, and :class: `str `
519- values as input and will generate output texts consisting solely of a top-level
520- JSON null, boolean, number, or string value without raising an exception::
521-
522- >>> neither_a_list_nor_a_dict = "spam and eggs"
523- >>> json.dumps(neither_a_list_nor_a_dict) # The result is not a valid JSON text
524- '"spam and eggs"'
525-
526- This module's serializer does not itself include a way to enforce the
527- aforementioned constraint.
511+ The RFC does not explicitly forbid JSON strings which contain byte sequences
512+ that don't correspond to valid Unicode characters (e.g. unpaired UTF-16
513+ surrogates), but it does note that they may cause interoperability problems.
514+ By default, this module accepts and outputs (when present in the original
515+ :class: `str `) codepoints for such sequences.
528516
529517
530518Infinite and NaN Number Values
@@ -554,7 +542,7 @@ Repeated Names Within an Object
554542^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
555543
556544The RFC specifies that the names within a JSON object should be unique, but
557- does not specify how repeated names in JSON objects should be handled. By
545+ does not mandate how repeated names in JSON objects should be handled. By
558546default, this module does not raise an exception; instead, it ignores all but
559547the last name-value pair for a given name::
560548
@@ -563,3 +551,48 @@ the last name-value pair for a given name::
563551 {'x': 3}
564552
565553The *object_pairs_hook * parameter can be used to alter this behavior.
554+
555+
556+ Top-level Non-Object, Non-Array Values
557+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
558+
559+ The old version of JSON specified by the obsolete :rfc: `4627 ` required that
560+ the top-level value of a JSON text must be either a JSON object or array
561+ (Python :class: `dict ` or :class: `list `), and could not be a JSON null,
562+ boolean, number, or string value. :rfc: `7159 ` removed that restriction, and
563+ this module does not and has never implemented that restriction in either its
564+ serializer or its deserializer.
565+
566+ Regardless, for maximum interoperability, you may wish to voluntarily adhere
567+ to the restriction yourself.
568+
569+
570+ Implementation Limitations
571+ ^^^^^^^^^^^^^^^^^^^^^^^^^^
572+
573+ Some JSON deserializer implementations may set limits on:
574+
575+ * the size of accepted JSON texts
576+ * the maximum level of nesting of JSON objects and arrays
577+ * the range and precision of JSON numbers
578+ * the content and maximum length of JSON strings
579+
580+ This module does not impose any such limits beyond those of the relevant
581+ Python datatypes themselves or the Python interpreter itself.
582+
583+ When serializing to JSON, beware any such limitations in applications that may
584+ consume your JSON. In particular, it is common for JSON numbers to be
585+ deserialized into IEEE 754 double precision numbers and thus subject to that
586+ representation's range and precision limitations. This is especially relevant
587+ when serializing Python :class: `int ` values of extremely large magnitude, or
588+ when serializing instances of "exotic" numerical types such as
589+ :class: `decimal.Decimal `.
590+
591+
592+ .. rubric :: Footnotes
593+
594+ .. [#rfc-errata ] As noted in `the errata for RFC 7159
595+ <http://www.rfc-editor.org/errata_search.php?rfc=7159> `_,
596+ JSON permits literal U+2028 (LINE SEPARATOR) and
597+ U+2029 (PARAGRAPH SEPARATOR) characters in strings, whereas JavaScript
598+ (as of ECMAScript Edition 5.1) does not.
0 commit comments