Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b00e8f1

Browse files
committed
Added cookbook example for BOM insertion.
1 parent ee9e485 commit b00e8f1

1 file changed

Lines changed: 44 additions & 0 deletions

File tree

Doc/howto/logging-cookbook.rst

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1544,3 +1544,47 @@ works::
15441544
if __name__ == '__main__':
15451545
main()
15461546

1547+
1548+
Inserting a BOM into messages sent to a SysLogHandler
1549+
-----------------------------------------------------
1550+
1551+
`RFC 5424 <http://tools.ietf.org/html/rfc5424>`_ requires that a
1552+
Unicode message be sent to a syslog daemon as a set of bytes which have the
1553+
following structure: an optional pure-ASCII component, followed by a UTF-8 Byte
1554+
Order Mark (BOM), followed by Unicode encoded using UTF-8. (See the `relevant
1555+
section of the specification <http://tools.ietf.org/html/rfc5424#section-6>`_.)
1556+
1557+
In Python 2.6 and 2.7, code was added to
1558+
:class:`~logging.handlers.SysLogHandler` to insert a BOM into the message, but
1559+
unfortunately, it was implemented incorrectly, with the BOM appearing at the
1560+
beginning of the message and hence not allowing any pure-ASCII component to
1561+
appear before it.
1562+
1563+
As this behaviour is broken, the incorrect BOM insertion code is being removed
1564+
from Python 2.7.4 and later. However, it is not being replaced, and if you
1565+
want to produce RFC 5424-compliant messages which includes a BOM, an optional
1566+
pure-ASCII sequence before it and arbitrary Unicode after it, encoded using
1567+
UTF-8, then you need to do the following:
1568+
1569+
#. Attach a :class:`~logging.Formatter` instance to your
1570+
:class:`~logging.handlers.SysLogHandler` instance, with a format string
1571+
such as::
1572+
1573+
u"ASCII section\ufeffUnicode section"
1574+
1575+
The Unicode code point ``u'\feff```, when encoded using UTF-8, will be
1576+
encoded as a UTF-8 BOM -- the bytestring ``'\xef\xbb\bf'``.
1577+
1578+
#. Replace the ASCII section with whatever placeholders you like, but make sure
1579+
that the data that appears in there after substitution is always ASCII (that
1580+
way, it will remain unchanged after UTF-8 encoding).
1581+
1582+
#. Replace the Unicode section with whatever placeholders you like; if the data
1583+
which appears there after substitution is Unicode, that's fine -- it will be
1584+
encoded using UTF-8.
1585+
1586+
If the formatted message is Unicode, it *will* be encoded using UTF-8 encoding
1587+
by ``SysLogHandler``. If you follow these rules, you should be able to produce
1588+
RFC 5424-compliant messages. If you don't, logging may not complain, but your
1589+
messages will not be RFC 5424-compliant, and your syslog daemon may complain.
1590+

0 commit comments

Comments
 (0)