Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit a0cef58

Browse files
committed
Serialisation: add reference to MessagePack.
1 parent 1f3ee31 commit a0cef58

File tree

1 file changed

+115
-28
lines changed

1 file changed

+115
-28
lines changed

SERIALISATION.md

Lines changed: 115 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -16,32 +16,79 @@ I2C or SPI. All these require the data to be presented as linear sequences of
1616
bytes. The problem is how to convert an arbitrary Python object to such a
1717
sequence, and how subsequently to restore the object.
1818

19-
I am aware of four ways of achieving this, each with their own advantages and
20-
drawbacks. In two cases the encoded strings comprise ASCII characters, in the
21-
other two they are binary (bytes can take all possible values).
19+
There are numerous standards for achieving this, five of which are readily
20+
available to MicroPython. Each has its own advantages and drawbacks. In two
21+
cases the encoded strings aim to be human readable and comprise ASCII
22+
characters. In the others they comprise binary `bytes` objects where bytes can
23+
take all possible values. The following are the formats with MicroPython
24+
support:
2225

2326
1. ujson (ASCII, official)
2427
2. pickle (ASCII, official)
2528
3. ustruct (binary, official)
26-
4. protobuf [binary, unofficial](https://github.com/dogtopus/minipb)
27-
28-
The first two are self-describing: the format includes a definition of its
29+
4. MessagePack [binary, unofficial](https://github.com/peterhinch/micropython-msgpack)
30+
5. protobuf [binary, unofficial](https://github.com/dogtopus/minipb)
31+
32+
The `ujson` and `pickle` formats produce human-readable byte sequences. These
33+
aid debugging. The use of ASCII data means that a delimiter can be used to
34+
identify the end of a message. This is because it is possible to guarantee that
35+
the delimiter will never occur within a message. A delimiter cannot be used
36+
with binary formats because a message byte can take all possible values
37+
including that of the delimiter. The drawback of ASCII formats is inefficiency:
38+
the byte sequences are relatively long.
39+
40+
Numbers 1, 2 and 4 are self-describing: the format includes a definition of its
2941
structure. This means that the decoding process can re-create the object in the
3042
absence of information on its structure, which may therefore change at runtime.
31-
Further, `ujson` and `pickle` produce human-readable byte sequences which aid
32-
debugging. The drawback is inefficiency: the byte sequences are relatively
33-
long. They are variable length. This means that the receiving process must be
34-
provided with a means to determine when a complete string has been received.
43+
Self describing formats inevitably are variable length. This means that the
44+
receiving process must be provided with a means to determine when a complete
45+
message has been received. In the case of ASCII formats a delimiter may be used
46+
but in the case of `MessagePack` this presents something of a challenge.
47+
48+
The `ustruct` format is binary: the byte sequence comprises binary data which
49+
is neither human readable nor self-describing. The problem of message framing
50+
is solved by hard coding a fixed message structure and length which is known to
51+
transmitter and receiver. In simple cases of fixed format data, `ustruct`
52+
provides a simple, efficient solution.
53+
54+
In `protobuf` and `MessagePack` messages are variable length; both can handle
55+
data whose length varies at runtime. `MessagePack` also allows the message
56+
structure to change at runtime. It is also extensible to enable the efficient
57+
coding of additional Python types or instances of user defined classes.
58+
59+
The `protobuf` standard requires transmitter and receiver to share a schema
60+
which defines the message structure. Message length may change at runtime, but
61+
structure may not.
62+
63+
## 1.1 Transmission over unreliable links
64+
65+
Consider a system where a transmitter periodically sends messages to a receiver
66+
over a communication link. An aspect of the message framing problem arises if
67+
that link is unreliable, meaning that bytes may be lost or corrupted in
68+
transit. In the case of ASCII formats with a delimiter the receiver, once it
69+
has detected the problem, can discard characters until the delimiter is
70+
received and then wait for a complete message.
71+
72+
In the case of binary formats it is generally impossible to re-synchronise to a
73+
continuous stream of data. In the case of regular bursts of data a timeout can
74+
be used. Otherwise "out of band" signalling is required where the receiver
75+
signals the transmitter to request retransmission.
76+
77+
## 1.2 Concurrency
78+
79+
In `uasyncio` systems the transmitter presents no problem. A message is created
80+
using synchronous code, then transmitted using asynchronous code typically with
81+
a `StreamWriter`. In the case of ASCII protocols a delimiter - usually `b"\n"`
82+
is appended.
83+
84+
In the case of ASCII protocols the receiver can use `StreamReader.readline()`
85+
to await a complete message.
3586

36-
The `ustruct` and `protobuf` solutions are binary formats: the byte sequences
37-
comprise binary data which is neither human readable nor self-describing.
38-
Binary sequences require that the receiver has information on their structure
39-
in order to decode them. In the case of `ustruct` sequences are of a fixed
40-
length which can be determined from the structure. `protobuf` sequences are
41-
variable length requiring handling discussed below.
87+
`ustruct` also presents a simple case in that the number of expected bytes is
88+
known to the receiver which simply awaits that number.
4289

43-
The benefit of binary sequences is efficiency: sequence length is closer to the
44-
information-theoretic minimum, compared to the ASCII options.
90+
The variable length binary protocols present a difficulty in that the message
91+
length is unknown in advance. A solution is available for `MessagePack`.
4592

4693
# 2. ujson and pickle
4794

@@ -198,7 +245,47 @@ Output:
198245
(11, 22, b'the quick brown fox jumps over')
199246
```
200247

201-
# 4. Protocol Buffers
248+
# 4. MessagePack
249+
250+
Of the binary formats this is the easiest to use and can be a "drop in"
251+
replacement for `ujson` as it supports the same four methods `dump`, `dumps`,
252+
`load` and `loads`. An application might initially be developed with `ujson`,
253+
the protocol being changed to `MessagePack` later. Creation of a `MessagePack`
254+
string can be done with:
255+
```python
256+
import umsgpack
257+
obj = [1.23, 2.56, 89000]
258+
msg = umsgpack.dumps(obj) # msg is a bytes object
259+
```
260+
Retrieval of the object is as follows:
261+
```python
262+
import umsgpack
263+
# Retrieve the message msg
264+
obj = umsgpack.dumps(msg)
265+
```
266+
An ingenious feature of the standard is its extensibility. This can be used to
267+
add support for additional Python types or user defined classes. This example
268+
shows `complex` data being supported as if it were a native type:
269+
```python
270+
import umsgpack
271+
from umsgpack_ext import mpext
272+
with open('data', 'wb') as f:
273+
umsgpack.dump(mpext(1 + 4j), f) # mpext() handles extension type
274+
```
275+
Reading back:
276+
```python
277+
import umsgpack
278+
import umsgpack_ext # Decoder only needs access to this module
279+
with open('data', 'rb') as f:
280+
z = umsgpack.load(f)
281+
print(z) # z is complex
282+
```
283+
Please see [this repo](https://github.com/peterhinch/micropython-msgpack). The
284+
docs include references to the standard and to other implementations. The repo
285+
includes an asynchronous receiver which enables incoming messages to be decoded
286+
as they arrive while allowing other tasks to run concurrently.
287+
288+
# 5. Protocol Buffers
202289

203290
This is a [Google standard](https://developers.google.com/protocol-buffers/)
204291
described in [this Wikipedia article](https://en.wikipedia.org/wiki/Protocol_Buffers).
@@ -230,15 +317,15 @@ inner `tuple` are strings, with element 0 defining the field's key. Subsequent
230317
elements define the field's data type; in most cases the data type is defined
231318
by a single string.
232319

233-
## 4.1 Installation
320+
## 5.1 Installation
234321

235322
The library comprises a single file `minipb.py`. It has a dependency, the
236323
`logging` module `logging.py` which may be found in
237324
[micropython-lib](https://github.com/micropython/micropython-lib/tree/master/logging).
238325
On RAM constrained platforms `minipb.py` may be cross-compiled or frozen as
239326
bytecode for even lower RAM consumption.
240327

241-
## 4.2 Data types
328+
## 5.2 Data types
242329

243330
These are listed in
244331
[the docs](https://github.com/dogtopus/minipb/wiki/Schema-Representations).
@@ -256,14 +343,14 @@ a subset may be used which maps onto Python data types:
256343
other platforms with special firmware builds.
257344
7. 'X' An empty field.
258345

259-
## 4.2.1 Required and Optional fields
346+
## 5.2.1 Required and Optional fields
260347

261348
If a field is prefixed with `*` it is a `required` field, otherwise it is
262349
optional. The field must still exist in the data: the only difference is that
263350
a `required` field cannot be set to `None`. Optional fields can be useful,
264351
notably for boolean types which can then represent three states.
265352

266-
## 4.3 Application design
353+
## 5.3 Application design
267354

268355
The following is a minimal example which can be pasted at the REPL:
269356
```python
@@ -287,7 +374,7 @@ being saved to a binary file, the file will need an index. Where data is to
287374
be transmitted over and interface each string should be prepended with a fixed
288375
length "size" field. The following example illustrates this.
289376

290-
## 4.4 Transmitter/Receiver example
377+
## 5.4 Transmitter/Receiver example
291378

292379
These examples can't be cut and pasted at the REPL as they assume `send(n)` and
293380
`receive(n)` functions which access the interface.
@@ -329,7 +416,7 @@ while True:
329416
# Do something with the received dict
330417
```
331418

332-
## 4.5 Repeating fields
419+
## 5.5 Repeating fields
333420

334421
This feature enables variable length lists to be encoded. List elements must
335422
all be of the same (declared) data type. In this example the `value` and `txt`
@@ -357,13 +444,13 @@ tx = w.encode(data)
357444
rx = w.decode(tx)
358445
print(rx)
359446
```
360-
### 4.5.1 Packed repeating fields
447+
### 5.5.1 Packed repeating fields
361448

362449
The author of `minipb` [does not recommend](https://github.com/dogtopus/minipb/issues/6)
363450
their use. Their purpose appears to be in the context of fixed-length fields
364451
which are outside the scope of pure Python programming.
365452

366-
## 4.6 Message fields (nested dicts)
453+
## 5.6 Message fields (nested dicts)
367454

368455
The concept of message fields is a Protocol Buffer notion. In MicroPython
369456
terminology a message field contains a `dict` whose contents are defined by
@@ -404,7 +491,7 @@ print(rx)
404491
print(rx['nested'][2]['str2']) # Access inner dict instances
405492
```
406493

407-
### 4.6.1 Recursion
494+
### 5.6.1 Recursion
408495

409496
This is surely overkill in most MicroPython applications, but for the sake of
410497
completeness message fields can be recursive:

0 commit comments

Comments
 (0)