please make `get_body_text` more robust

Hi,

especially when encountering malformed spam email, alot keeps quitting on me with tracebacks like this:

```
  File "/usr/share/alot/alot/widgets/search.py", line 187, in <genexpr>
    lastcontent = ' '.join(m.get_body_text() for m in msgs)
  File "/usr/share/alot/alot/db/message.py", line 287, in get_body_text
    return extract_body_part(self.get_mime_part())
  File "/usr/share/alot/alot/db/utils.py", line 497, in extract_body_part
    rendered_payload = render_part(
  File "/usr/share/alot/alot/db/utils.py", line 345, in render_part
    raw_payload = remove_cte(part)
  File "/usr/share/alot/alot/db/utils.py", line 440, in remove_cte
    bp = base64.b64decode(payload)
  File "/usr/lib/python3.9/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
```

or

```
  File "/usr/share/alot/alot/widgets/search.py", line 187, in <genexpr>
    lastcontent = ' '.join(m.get_body_text() for m in msgs)
  File "/usr/share/alot/alot/db/message.py", line 287, in get_body_text
    return extract_body_part(self.get_mime_part())
  File "/usr/share/alot/alot/db/utils.py", line 497, in extract_body_part
    rendered_payload = render_part(
  File "/usr/share/alot/alot/db/utils.py", line 345, in render_part
    raw_payload = remove_cte(part)
  File "/usr/share/alot/alot/db/utils.py", line 436, in remove_cte
    bp = quopri.decodestring(payload.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 8114-8123: ordinal not in range(128)
```

I'm currently running alot with the following patch:

```patch
--- a/alot/db/message.py	2022-04-21 14:03:34.085067550 +0200
+++ b/alot/db/message.py	2022-04-21 12:17:26.415798127 +0200
@@ -284,7 +284,10 @@
 
     def get_body_text(self):
         """ returns bodystring extracted from this mail """
-        return extract_body_part(self.get_mime_part())
+        try:
+            return extract_body_part(self.get_mime_part())
+        except:
+            return "ERROR"
 
     def matches(self, querystring):
         """tests if this messages is in the resultset for `querystring`"""
```

This replaces the message body by `ERROR` which is fine because those messages are spam anyways and at least alot doesn't quit. If a messages makes alot quit, it's quite time consuming to find that one spam message that tripped it off. With this patch such messages can be quickly identified and marked as spam. Certainly something more descriptive than `ERROR` should be returned, maybe even a traceback that helps identifying the problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

please make `get_body_text` more robust #1601

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

please make get_body_text more robust #1601

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

please make `get_body_text` more robust #1601