@@ -16,32 +16,79 @@ I2C or SPI. All these require the data to be presented as linear sequences of
16
16
bytes. The problem is how to convert an arbitrary Python object to such a
17
17
sequence, and how subsequently to restore the object.
18
18
19
- I am aware of four ways of achieving this, each with their own advantages and
20
- drawbacks. In two cases the encoded strings comprise ASCII characters, in the
21
- other two they are binary (bytes can take all possible values).
19
+ There are numerous standards for achieving this, five of which are readily
20
+ available to MicroPython. Each has its own advantages and drawbacks. In two
21
+ cases the encoded strings aim to be human readable and comprise ASCII
22
+ characters. In the others they comprise binary ` bytes ` objects where bytes can
23
+ take all possible values. The following are the formats with MicroPython
24
+ support:
22
25
23
26
1 . ujson (ASCII, official)
24
27
2 . pickle (ASCII, official)
25
28
3 . ustruct (binary, official)
26
- 4 . protobuf [ binary, unofficial] ( https://github.com/dogtopus/minipb )
27
-
28
- The first two are self-describing: the format includes a definition of its
29
+ 4 . MessagePack [ binary, unofficial] ( https://github.com/peterhinch/micropython-msgpack )
30
+ 5 . protobuf [ binary, unofficial] ( https://github.com/dogtopus/minipb )
31
+
32
+ The ` ujson ` and ` pickle ` formats produce human-readable byte sequences. These
33
+ aid debugging. The use of ASCII data means that a delimiter can be used to
34
+ identify the end of a message. This is because it is possible to guarantee that
35
+ the delimiter will never occur within a message. A delimiter cannot be used
36
+ with binary formats because a message byte can take all possible values
37
+ including that of the delimiter. The drawback of ASCII formats is inefficiency:
38
+ the byte sequences are relatively long.
39
+
40
+ Numbers 1, 2 and 4 are self-describing: the format includes a definition of its
29
41
structure. This means that the decoding process can re-create the object in the
30
42
absence of information on its structure, which may therefore change at runtime.
31
- Further, ` ujson ` and ` pickle ` produce human-readable byte sequences which aid
32
- debugging. The drawback is inefficiency: the byte sequences are relatively
33
- long. They are variable length. This means that the receiving process must be
34
- provided with a means to determine when a complete string has been received.
43
+ Self describing formats inevitably are variable length. This means that the
44
+ receiving process must be provided with a means to determine when a complete
45
+ message has been received. In the case of ASCII formats a delimiter may be used
46
+ but in the case of ` MessagePack ` this presents something of a challenge.
47
+
48
+ The ` ustruct ` format is binary: the byte sequence comprises binary data which
49
+ is neither human readable nor self-describing. The problem of message framing
50
+ is solved by hard coding a fixed message structure and length which is known to
51
+ transmitter and receiver. In simple cases of fixed format data, ` ustruct `
52
+ provides a simple, efficient solution.
53
+
54
+ In ` protobuf ` and ` MessagePack ` messages are variable length; both can handle
55
+ data whose length varies at runtime. ` MessagePack ` also allows the message
56
+ structure to change at runtime. It is also extensible to enable the efficient
57
+ coding of additional Python types or instances of user defined classes.
58
+
59
+ The ` protobuf ` standard requires transmitter and receiver to share a schema
60
+ which defines the message structure. Message length may change at runtime, but
61
+ structure may not.
62
+
63
+ ## 1.1 Transmission over unreliable links
64
+
65
+ Consider a system where a transmitter periodically sends messages to a receiver
66
+ over a communication link. An aspect of the message framing problem arises if
67
+ that link is unreliable, meaning that bytes may be lost or corrupted in
68
+ transit. In the case of ASCII formats with a delimiter the receiver, once it
69
+ has detected the problem, can discard characters until the delimiter is
70
+ received and then wait for a complete message.
71
+
72
+ In the case of binary formats it is generally impossible to re-synchronise to a
73
+ continuous stream of data. In the case of regular bursts of data a timeout can
74
+ be used. Otherwise "out of band" signalling is required where the receiver
75
+ signals the transmitter to request retransmission.
76
+
77
+ ## 1.2 Concurrency
78
+
79
+ In ` uasyncio ` systems the transmitter presents no problem. A message is created
80
+ using synchronous code, then transmitted using asynchronous code typically with
81
+ a ` StreamWriter ` . In the case of ASCII protocols a delimiter - usually ` b"\n" `
82
+ is appended.
83
+
84
+ In the case of ASCII protocols the receiver can use ` StreamReader.readline() `
85
+ to await a complete message.
35
86
36
- The ` ustruct ` and ` protobuf ` solutions are binary formats: the byte sequences
37
- comprise binary data which is neither human readable nor self-describing.
38
- Binary sequences require that the receiver has information on their structure
39
- in order to decode them. In the case of ` ustruct ` sequences are of a fixed
40
- length which can be determined from the structure. ` protobuf ` sequences are
41
- variable length requiring handling discussed below.
87
+ ` ustruct ` also presents a simple case in that the number of expected bytes is
88
+ known to the receiver which simply awaits that number.
42
89
43
- The benefit of binary sequences is efficiency: sequence length is closer to the
44
- information-theoretic minimum, compared to the ASCII options .
90
+ The variable length binary protocols present a difficulty in that the message
91
+ length is unknown in advance. A solution is available for ` MessagePack ` .
45
92
46
93
# 2. ujson and pickle
47
94
@@ -198,7 +245,47 @@ Output:
198
245
(11 , 22 , b ' the quick brown fox jumps over' )
199
246
```
200
247
201
- # 4. Protocol Buffers
248
+ # 4. MessagePack
249
+
250
+ Of the binary formats this is the easiest to use and can be a "drop in"
251
+ replacement for ` ujson ` as it supports the same four methods ` dump ` , ` dumps ` ,
252
+ ` load ` and ` loads ` . An application might initially be developed with ` ujson ` ,
253
+ the protocol being changed to ` MessagePack ` later. Creation of a ` MessagePack `
254
+ string can be done with:
255
+ ``` python
256
+ import umsgpack
257
+ obj = [1.23 , 2.56 , 89000 ]
258
+ msg = umsgpack.dumps(obj) # msg is a bytes object
259
+ ```
260
+ Retrieval of the object is as follows:
261
+ ``` python
262
+ import umsgpack
263
+ # Retrieve the message msg
264
+ obj = umsgpack.dumps(msg)
265
+ ```
266
+ An ingenious feature of the standard is its extensibility. This can be used to
267
+ add support for additional Python types or user defined classes. This example
268
+ shows ` complex ` data being supported as if it were a native type:
269
+ ``` python
270
+ import umsgpack
271
+ from umsgpack_ext import mpext
272
+ with open (' data' , ' wb' ) as f:
273
+ umsgpack.dump(mpext(1 + 4j ), f) # mpext() handles extension type
274
+ ```
275
+ Reading back:
276
+ ``` python
277
+ import umsgpack
278
+ import umsgpack_ext # Decoder only needs access to this module
279
+ with open (' data' , ' rb' ) as f:
280
+ z = umsgpack.load(f)
281
+ print (z) # z is complex
282
+ ```
283
+ Please see [ this repo] ( https://github.com/peterhinch/micropython-msgpack ) . The
284
+ docs include references to the standard and to other implementations. The repo
285
+ includes an asynchronous receiver which enables incoming messages to be decoded
286
+ as they arrive while allowing other tasks to run concurrently.
287
+
288
+ # 5. Protocol Buffers
202
289
203
290
This is a [ Google standard] ( https://developers.google.com/protocol-buffers/ )
204
291
described in [ this Wikipedia article] ( https://en.wikipedia.org/wiki/Protocol_Buffers ) .
@@ -230,15 +317,15 @@ inner `tuple` are strings, with element 0 defining the field's key. Subsequent
230
317
elements define the field's data type; in most cases the data type is defined
231
318
by a single string.
232
319
233
- ## 4 .1 Installation
320
+ ## 5 .1 Installation
234
321
235
322
The library comprises a single file ` minipb.py ` . It has a dependency, the
236
323
` logging ` module ` logging.py ` which may be found in
237
324
[ micropython-lib] ( https://github.com/micropython/micropython-lib/tree/master/logging ) .
238
325
On RAM constrained platforms ` minipb.py ` may be cross-compiled or frozen as
239
326
bytecode for even lower RAM consumption.
240
327
241
- ## 4 .2 Data types
328
+ ## 5 .2 Data types
242
329
243
330
These are listed in
244
331
[ the docs] ( https://github.com/dogtopus/minipb/wiki/Schema-Representations ) .
@@ -256,14 +343,14 @@ a subset may be used which maps onto Python data types:
256
343
other platforms with special firmware builds.
257
344
7 . 'X' An empty field.
258
345
259
- ## 4 .2.1 Required and Optional fields
346
+ ## 5 .2.1 Required and Optional fields
260
347
261
348
If a field is prefixed with ` * ` it is a ` required ` field, otherwise it is
262
349
optional. The field must still exist in the data: the only difference is that
263
350
a ` required ` field cannot be set to ` None ` . Optional fields can be useful,
264
351
notably for boolean types which can then represent three states.
265
352
266
- ## 4 .3 Application design
353
+ ## 5 .3 Application design
267
354
268
355
The following is a minimal example which can be pasted at the REPL:
269
356
``` python
@@ -287,7 +374,7 @@ being saved to a binary file, the file will need an index. Where data is to
287
374
be transmitted over and interface each string should be prepended with a fixed
288
375
length "size" field. The following example illustrates this.
289
376
290
- ## 4 .4 Transmitter/Receiver example
377
+ ## 5 .4 Transmitter/Receiver example
291
378
292
379
These examples can't be cut and pasted at the REPL as they assume ` send(n) ` and
293
380
` receive(n) ` functions which access the interface.
@@ -329,7 +416,7 @@ while True:
329
416
# Do something with the received dict
330
417
```
331
418
332
- ## 4 .5 Repeating fields
419
+ ## 5 .5 Repeating fields
333
420
334
421
This feature enables variable length lists to be encoded. List elements must
335
422
all be of the same (declared) data type. In this example the ` value ` and ` txt `
@@ -357,13 +444,13 @@ tx = w.encode(data)
357
444
rx = w.decode(tx)
358
445
print (rx)
359
446
```
360
- ### 4 .5.1 Packed repeating fields
447
+ ### 5 .5.1 Packed repeating fields
361
448
362
449
The author of ` minipb ` [ does not recommend] ( https://github.com/dogtopus/minipb/issues/6 )
363
450
their use. Their purpose appears to be in the context of fixed-length fields
364
451
which are outside the scope of pure Python programming.
365
452
366
- ## 4 .6 Message fields (nested dicts)
453
+ ## 5 .6 Message fields (nested dicts)
367
454
368
455
The concept of message fields is a Protocol Buffer notion. In MicroPython
369
456
terminology a message field contains a ` dict ` whose contents are defined by
@@ -404,7 +491,7 @@ print(rx)
404
491
print (rx[' nested' ][2 ][' str2' ]) # Access inner dict instances
405
492
```
406
493
407
- ### 4 .6.1 Recursion
494
+ ### 5 .6.1 Recursion
408
495
409
496
This is surely overkill in most MicroPython applications, but for the sake of
410
497
completeness message fields can be recursive:
0 commit comments