Description
In some cases, I have seen the standard library email message parser (invoked via email.parser.BytesParser) losing headers, in cases where a message has a header line where a space appears between the header name and the delimiting colon following.
Sample which reproduces this:
Subject: test-ignore
To: [email protected]
x-fred :dead
Date: Fri, 20 May 2022 18:13:19 +1200
From: [email protected]
Hello, this is a test message
When this is parsed in email.parser.BytesParser, the resulting EmailMessage object has these headers, as dumped by calling .items()
:
[('Subject', 'test-ignore'),
('To', ': [email protected];')
]
Meanwhile, the dropped headers end up in the message payload, as seen from calling .get_payload()
:
x-fred :dead
Date: Fri, 20 May 2022 18:13:19 +1200
From: [email protected]
Hello, this is a test message
This failure to gracefully cope with a non-compliant header puts Python3's standard library email parser in breach of RFC5322, Section 4.5: ...any amount of white space is allowed [in the header] before the ":" at the end of the [header] field name...
To cope with email.parser.Parser's failure to cope with this archaic header format, I've had to implement a crude workaround of detecting if any expected headers are absent in the resulting email.message.EmailMessage object, and if so, manually 'massaging' the headers part of the raw message, then re-submitting it to the Parser.
However, I'd suggest this really does need to be fixed within Python's standard library module.