Clean up orchestration and utils layout #2194

NolanTrem · 2025-05-20T21:40:55Z

Important

Refactor code by consolidating token counting logic into num_tokens function in base_utils.py and removing duplicate count_tokens_for_text functions from ingestion workflows.

Refactoring:
- Remove count_tokens_for_text function from ingestion_workflow.py in both hatchet and simple directories.
- Replace count_tokens_for_text with num_tokens from base_utils.py in ingestion_workflow.py in both hatchet and simple directories.
Utilities:
- Add num_tokens function to base_utils.py to handle token counting.
- Use set for membership checks in _detect_result_type in base_utils.py.
- Simplify logic in format_search_results_for_llm in base_utils.py by using extend for appending multiple lines.
Miscellaneous:
- Remove unused imports and TYPE_CHECKING blocks in base_utils.py.

^{This description was created by}^{for d432f2f. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to d432f2f in 1 minute and 16 seconds. Click for details.

Reviewed 315 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 9 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. py/core/main/orchestration/hatchet/ingestion_workflow.py:5

Draft comment:
Removed unused 'tiktoken' import since token counting now uses the shared num_tokens function.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. py/core/main/orchestration/hatchet/ingestion_workflow.py:31

Draft comment:
Deleted the duplicate 'count_tokens_for_text' function; relying on a centralized num_tokens improves maintainability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. py/core/main/orchestration/hatchet/ingestion_workflow.py:115

Draft comment:
Replaced calls to 'count_tokens_for_text' with the shared 'num_tokens' function; ensure this covers all token-model scenarios.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. py/core/main/orchestration/simple/ingestion_workflow.py:5

Draft comment:
Removed the unused 'tiktoken' import and duplicate token counting function in favor of the shared num_tokens function.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. py/shared/utils/base_utils.py:587

Draft comment:
Simplified key conversion in convert_nonserializable_objects by checking if the key is a string before converting.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

6. py/shared/utils/base_utils.py:266

Draft comment:
Refactored tokens_count_for_message to initialize 'num_tokens' inline with tokens_per_message for brevity.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

7. py/shared/utils/base_utils.py:323

Draft comment:
Streamlined the results setter in SearchResultsCollector for clearer type checking and iteration over inputs.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

8. py/shared/utils/base_utils.py:452

Draft comment:
Refactored _detect_result_type to use a set literal for checking 'entity', 'relationship', 'community', which improves readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

9. py/shared/utils/base_utils.py:649

Draft comment:
Updated the default model in num_tokens from 'gpt-4.1' to 'gpt-4o' with a FIXME note due to tiktoken limitations.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_eJjaU2dsgF1uWm7H

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

Clean up orchestration and utils layout

d432f2f

ellipsis-dev bot reviewed May 20, 2025

View reviewed changes

NolanTrem merged commit d561443 into main May 20, 2025
41 of 42 checks passed

NolanTrem deleted the Nolan/CondenseTokenCount branch May 20, 2025 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clean up orchestration and utils layout #2194

Clean up orchestration and utils layout #2194

Uh oh!

NolanTrem commented May 20, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Clean up orchestration and utils layout #2194

Clean up orchestration and utils layout #2194

Uh oh!

Conversation

NolanTrem commented May 20, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NolanTrem commented May 20, 2025 •

edited by ellipsis-dev bot

Loading