Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ParserBot: erroneous raw line recovery in error handling #1850

@ghost

Description

This logic here:

intelmq/intelmq/lib/bot.py

Lines 1005 to 1031 in 7ba8b62

for line in self.parse(report):
if not line:
continue
try:
value = self.parse_line(line, report)
if value is None:
continue
elif type(value) is list or isinstance(value, types.GeneratorType):
# filter out None
events = list(filter(bool, value))
else:
events = [value]
except Exception:
self.logger.exception('Failed to parse line.')
self.__failed.append((traceback.format_exc(), line))
else:
events_count += len(events)
self.send_message(*events)
for exc, line in self.__failed:
report_dump = report.copy()
report_dump.change('raw', self.recover_line(line))
if self.parameters.error_dump_message:
self._dump_message(exc, report_dump)
if self._Bot__destination_queues and '_on_error' in self._Bot__destination_queues:
self.send_message(report_dump, path='_on_error')

does not work with all recover_line_* methods. Some methods use the parameter line, others use self.current_line. The overall logic is fine, but there is a major bug:

process collects all fails (self.__failed is appended with line) in the first loop (for line in self.parse(report))
In the second loop (for exc, line in self.__failed), recover_line is called with line.
If recover_line accesses self.current_line, the data is wrong, as self.current_line is then the last line of the report, not the actual one.

Unfortunately, simply fixing some recover_line_* functions is not enough, the process in self.process needs to be thought through and eventually adapted as well.

  • self.current_line should be deleted after the parsing end to prohibit this error in the future.
  • self.recover_line behaviour should be harmonized, making it applicable for use in self.parse_line and in self.process
  • self.process should be investigated

In the future we also need better tests, but that's a bigger task and I'm afraid we can't stem that on a short term. And unfortunately the issue is important, as it leads to bogus (wrong/duplicated) data in the dumps and therefore loss of data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIndicates an unexpected problem or unintended behaviorcomponent: corehelp wantedIndicates that a maintainer wants help on an issue or pull request

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions