@@ -1544,3 +1544,47 @@ works::
15441544 if __name__ == '__main__':
15451545 main()
15461546
1547+
1548+ Inserting a BOM into messages sent to a SysLogHandler
1549+ -----------------------------------------------------
1550+
1551+ `RFC 5424 <http://tools.ietf.org/html/rfc5424 >`_ requires that a
1552+ Unicode message be sent to a syslog daemon as a set of bytes which have the
1553+ following structure: an optional pure-ASCII component, followed by a UTF-8 Byte
1554+ Order Mark (BOM), followed by Unicode encoded using UTF-8. (See the `relevant
1555+ section of the specification <http://tools.ietf.org/html/rfc5424#section-6> `_.)
1556+
1557+ In Python 2.6 and 2.7, code was added to
1558+ :class: `~logging.handlers.SysLogHandler ` to insert a BOM into the message, but
1559+ unfortunately, it was implemented incorrectly, with the BOM appearing at the
1560+ beginning of the message and hence not allowing any pure-ASCII component to
1561+ appear before it.
1562+
1563+ As this behaviour is broken, the incorrect BOM insertion code is being removed
1564+ from Python 2.7.4 and later. However, it is not being replaced, and if you
1565+ want to produce RFC 5424-compliant messages which includes a BOM, an optional
1566+ pure-ASCII sequence before it and arbitrary Unicode after it, encoded using
1567+ UTF-8, then you need to do the following:
1568+
1569+ #. Attach a :class: `~logging.Formatter ` instance to your
1570+ :class: `~logging.handlers.SysLogHandler ` instance, with a format string
1571+ such as::
1572+
1573+ u"ASCII section\ufeffUnicode section"
1574+
1575+ The Unicode code point ``u'\feff` ``, when encoded using UTF-8, will be
1576+ encoded as a UTF-8 BOM -- the bytestring ``'\xef\xbb\bf' ``.
1577+
1578+ #. Replace the ASCII section with whatever placeholders you like, but make sure
1579+ that the data that appears in there after substitution is always ASCII (that
1580+ way, it will remain unchanged after UTF-8 encoding).
1581+
1582+ #. Replace the Unicode section with whatever placeholders you like; if the data
1583+ which appears there after substitution is Unicode, that's fine -- it will be
1584+ encoded using UTF-8.
1585+
1586+ If the formatted message is Unicode, it *will * be encoded using UTF-8 encoding
1587+ by ``SysLogHandler ``. If you follow these rules, you should be able to produce
1588+ RFC 5424-compliant messages. If you don't, logging may not complain, but your
1589+ messages will not be RFC 5424-compliant, and your syslog daemon may complain.
1590+
0 commit comments