-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(evals): improve evals UI with tool groups and duration fix #10133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Fix backend race condition in runTask.ts where TaskTokenUsageUpdated could arrive before TaskStarted handler set taskMetricsId - Add Promise-based synchronization (taskMetricsReady) for event handlers - Fix UI to fall back to database timestamps (startedAt/finishedAt) when streaming duration is unavailable (e.g., page loaded after TaskStarted)
- Add tool groups feature with customizable name and icon - Groups aggregate tool usage stats in table columns - Persist groups to localStorage - Tools can only belong to one group - Each group displays only icon in header with tooltip showing name and tools
Contributor
Re-review complete. No blocking items remain.
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
Contributor
Fixed the |
… data Add fallback case for finished tasks where DB metrics are empty and streaming usage is unavailable. Duration is now calculated from startedAt/finishedAt timestamps in all cases.
cte
approved these changes
Dec 16, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
Issue/PR - Triage
New issue. Needs quick review to confirm validity and assign labels.
lgtm
This PR has been approved by a maintainer
size:XL
This PR changes 500-999 lines, ignoring generated files.
UI/UX
UI/UX related or focused
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR improves the evals UI with two changes:
Changes
Commits
Important
Enhance evals UI with tool groups for better visualization and fix duration reporting by calculating from timestamps when streaming data is unavailable.
run.tsxandruns.tsx.run.tsxby calculating from timestamps if streaming data is unavailable.runTask.ts, ensure task metrics are updated correctly by waiting fortaskMetricsIdto be set before processing certain events.runTask.tsby resolvingtaskMetricsReadyon disconnect.This description was created by
for b9aa6b5. You can customize this summary. It will automatically update as commits are pushed.