ParserBot: erroneous raw line recovery in error handling

This logic here:

https://github.com/certtools/intelmq/blob/7ba8b625a0189b7b2cbf86eb8d1825eeb4aac729/intelmq/lib/bot.py#L1005-L1031

does not work with all recover_line_* methods. Some methods use the parameter `line`, others use `self.current_line`. The overall logic is fine, but there is a major bug:

process collects all fails (`self.__failed` is appended with `line`) in the first loop (`for line in self.parse(report)`)
In the second loop (`for exc, line in self.__failed`), `recover_line` is called with `line`.
If `recover_line` accesses `self.current_line`, the data is wrong, as `self.current_line` is then the last line of the report, not the actual one.

Unfortunately, simply fixing some `recover_line_*` functions is not enough, the process in `self.process` needs to be thought through and eventually adapted as well.

- `self.current_line` should be deleted after the parsing end to prohibit this error in the future.
- `self.recover_line` behaviour should be harmonized, making it applicable for use in `self.parse_line` and in `self.process`
- `self.process` should be investigated

In the future we also need better tests, but that's a bigger task and I'm afraid we can't stem that on a short term. And unfortunately the issue is important, as it leads to bogus (wrong/duplicated) data in the dumps and therefore loss of data.

	for line in self.parse(report):

	if not line:
	continue
	try:
	value = self.parse_line(line, report)
	if value is None:
	continue
	elif type(value) is list or isinstance(value, types.GeneratorType):
	# filter out None
	events = list(filter(bool, value))
	else:
	events = [value]
	except Exception:
	self.logger.exception('Failed to parse line.')
	self.__failed.append((traceback.format_exc(), line))
	else:
	events_count += len(events)
	self.send_message(*events)

	for exc, line in self.__failed:
	report_dump = report.copy()
	report_dump.change('raw', self.recover_line(line))
	if self.parameters.error_dump_message:
	self._dump_message(exc, report_dump)
	if self._Bot__destination_queues and '_on_error' in self._Bot__destination_queues:
	self.send_message(report_dump, path='_on_error')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParserBot: erroneous raw line recovery in error handling #1850

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ParserBot: erroneous raw line recovery in error handling #1850

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions