-
Notifications
You must be signed in to change notification settings - Fork 185
fix(v2-stable): task_done exception on dropped item #1336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Thank you for raising this issue. The event is only dropped in the truncate_item_in_place function if the body does not exist or both the input and output are empty. However, truncation is only necessary if the input, output, or metadata are too large. In your events, are the input and output missing, with only the metadata being large? |
Hey! Yes metadata was very large in some of our events, and input/outputs were missing. |
In that case, this PR could be simplified to the drop condition to be extended to check for missing metadata as well: so event is only dropped if input, output, and metadata are missing |
Hey! Unless, of course, there's no actual way for the drop condition to be triggered in which case, the drop condition itself might be redundant. |
The drop condition should never be triggered as if all of input / output / metadata are missing, there is no other field that can hold a value with size exceeding the limit. And yes, it is only an addtional safeguard here.
yes 👍 |
29da4fd
to
f4dbe7f
Compare
@hassiebp I've done the changes, let me know if its fine or anything else is needed to be done. |
Thanks for your contribution, @newmanifold |
Released in 2.60.10 |
Problem:
Sometimes for large traces we are seeing exception
ValueError('task_done() called too many times')
, After few of these exceptions , traces/spans aren't getting into langfuse.Cause:
In file ingestion_consumer dropped events inside _truncate_item_in_place were being appended to the batch events in the _next method, which caused the task_done call on the ingestion queue in the upload method to fail with an exception.
The issue is that task_done is already called for dropped items inside _truncate_item_in_place.
Changes:
Important
Fixes
task_done
exception iningestion_consumer.py
by preventing dropped items from being appended to batch events.ValueError('task_done() called too many times')
iningestion_consumer.py
by preventing dropped items from being appended to batch events._truncate_item_in_place()
to indicate if an item was dropped._next()
to append only non-dropped items to events._truncate_item_in_place()
now returns a tuple(item_size, dropped)
._next()
to handle the new return value from_truncate_item_in_place()
and conditionally append events.This description was created by
for bffe753. You can customize this summary. It will automatically update as commits are pushed.
Disclaimer: Experimental PR review
Greptile Summary
Updated On: 2025-09-15 05:00:00 UTC
This PR fixes a critical bug in the task queue management system for Langfuse's ingestion consumer. The issue occurred when processing large traces that exceeded size limits - the
_truncate_item_in_place()
method would drop oversized events and calltask_done()
on the ingestion queue, but these dropped events were still being added to the batch for processing. Later, when theupload()
method processed the batch, it would calltask_done()
again for each event in the batch, including the already-dropped ones, resulting inValueError('task_done() called too many times')
.The fix modifies the
_truncate_item_in_place()
method to return a tuple containing both the item size and a boolean flag indicating whether the item was dropped. The_next()
method now checks this flag and only appends non-dropped events to the batch. This ensures thattask_done()
is called exactly once per queue item - either during the truncation/drop process or during batch processing, but never both.The change maintains backward compatibility in terms of functionality while fixing the queue state management. The ingestion consumer continues to handle oversized events by truncating or dropping them as before, but now properly tracks which items were dropped to prevent double-counting in the task queue. This fix is essential for maintaining the integrity of Langfuse's async task processing system, particularly when dealing with large traces that require size-based filtering.
Confidence score: 4/5
_truncate_item_in_place()
and its usage in_next()