Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@NolanTrem
Copy link
Collaborator

@NolanTrem NolanTrem commented May 20, 2025

Important

Refactor code by consolidating token counting logic into num_tokens function in base_utils.py and removing duplicate count_tokens_for_text functions from ingestion workflows.

  • Refactoring:
    • Remove count_tokens_for_text function from ingestion_workflow.py in both hatchet and simple directories.
    • Replace count_tokens_for_text with num_tokens from base_utils.py in ingestion_workflow.py in both hatchet and simple directories.
  • Utilities:
    • Add num_tokens function to base_utils.py to handle token counting.
    • Use set for membership checks in _detect_result_type in base_utils.py.
    • Simplify logic in format_search_results_for_llm in base_utils.py by using extend for appending multiple lines.
  • Miscellaneous:
    • Remove unused imports and TYPE_CHECKING blocks in base_utils.py.

This description was created by Ellipsis for d432f2f. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! πŸ‘

Reviewed everything up to d432f2f in 1 minute and 16 seconds. Click for details.
  • Reviewed 315 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 9 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with πŸ‘ or πŸ‘Ž to teach Ellipsis.
1. py/core/main/orchestration/hatchet/ingestion_workflow.py:5
  • Draft comment:
    Removed unused 'tiktoken' import since token counting now uses the shared num_tokens function.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. py/core/main/orchestration/hatchet/ingestion_workflow.py:31
  • Draft comment:
    Deleted the duplicate 'count_tokens_for_text' function; relying on a centralized num_tokens improves maintainability.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. py/core/main/orchestration/hatchet/ingestion_workflow.py:115
  • Draft comment:
    Replaced calls to 'count_tokens_for_text' with the shared 'num_tokens' function; ensure this covers all token-model scenarios.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
4. py/core/main/orchestration/simple/ingestion_workflow.py:5
  • Draft comment:
    Removed the unused 'tiktoken' import and duplicate token counting function in favor of the shared num_tokens function.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
5. py/shared/utils/base_utils.py:587
  • Draft comment:
    Simplified key conversion in convert_nonserializable_objects by checking if the key is a string before converting.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
6. py/shared/utils/base_utils.py:266
  • Draft comment:
    Refactored tokens_count_for_message to initialize 'num_tokens' inline with tokens_per_message for brevity.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
7. py/shared/utils/base_utils.py:323
  • Draft comment:
    Streamlined the results setter in SearchResultsCollector for clearer type checking and iteration over inputs.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
8. py/shared/utils/base_utils.py:452
  • Draft comment:
    Refactored _detect_result_type to use a set literal for checking 'entity', 'relationship', 'community', which improves readability.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
9. py/shared/utils/base_utils.py:649
  • Draft comment:
    Updated the default model in num_tokens from 'gpt-4.1' to 'gpt-4o' with a FIXME note due to tiktoken limitations.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_eJjaU2dsgF1uWm7H

You can customize Ellipsis by changing your verbosity settings, reacting with πŸ‘ or πŸ‘Ž, replying to comments, or adding code review rules.

@NolanTrem NolanTrem merged commit d561443 into main May 20, 2025
41 of 42 checks passed
@NolanTrem NolanTrem deleted the Nolan/CondenseTokenCount branch May 20, 2025 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant