-
Notifications
You must be signed in to change notification settings - Fork 97
[adapters] Fix data interleaving in HTTP ingress connector. #5498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks to Bruno Rucy @brurucy and Abhinav Gyawali @abhizer for help with this issue. Fixes: #3495 Signed-off-by: Ben Pfaff <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a data interleaving issue in the HTTP ingress connector by moving the StreamSplitter from shared endpoint state to per-request local scope.
Changes:
- Removed
StreamSplitterfromHttpInputEndpointDetailsstruct to prevent concurrent requests from sharing state - Refactored the
pushmethod to accept parsed chunks directly instead of managing splitting logic - Created a new
StreamSplitterinstance within the request handling loop to ensure each request has its own isolated splitter
| while let Some(chunk) = splitter.next(eoi) { | ||
| num_errors += self.push(chunk, &mut errors, timestamp); | ||
| } |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestamp variable is captured once before the loop but reused for all chunks. If the splitter produces multiple chunks, they will all receive the same timestamp even though they may be processed at slightly different times. Consider capturing a fresh timestamp inside the loop for each chunk to ensure accurate temporal ordering of events.
| } | ||
| Ok(None) => true, | ||
| }; | ||
| while let Some(chunk) = splitter.next(eoi) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the fix is because this loop cannot be interrupted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more than that. Each call to complete_request receives a payload, which gets read read chunk by chunk. If the request is short, there's only one chunk total, and if that chunk ends in a new-line (which is what the splitter looks for in the case of JSON with default settings), then push will split it properly. But if the request is long, and the chunks do not end in new-lines (they only would by luck), then push will feed in all the full records in the current chunk and leave the start of the next record in its buffer. Then, the next chunk from any request will be appended. Obviously, the start of one record followed by part of another record from a different source will cause something bad to happen, and that's what was happening.
By making each request break its input data into full records, and then only passing the full records to the parser, we avoid the problem.
|
I will test this as soon as it is merged. In my use-case I routinely see tens of parsing failures every day |
|
It's easy to overlook an issue if it's not solved quickly, please ping us again if you have bugs like this one so we can assign them higher priority. |
Thanks to Bruno Rucy @brurucy and Abhinav Gyawali @abhizer for help with this issue.
Fixes: #3495