-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix: improve file filtering, fix yaml parsing, add openrouter support. #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
utils/crawl_local_files.py
Outdated
relpath = os.path.relpath(filepath, directory) | ||
else: | ||
relpath = filepath | ||
rel_root = os.path.relpath(root, directory) if use_relative_paths else root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have already been addressed by another PR?
#67
Could you check if that PR made such optimization?
utils/fix_yaml.py
Outdated
@@ -0,0 +1,20 @@ | |||
import re | |||
|
|||
def add_indentation(text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not fix indentation as a post processing.
If an LLM failed with outputting yaml, it could be caused by many reasons. I'm bit worried that fixing problematic one will still get a result that's off in contnet. I would just let LLM retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that fixing indent for yaml is a temporal solution. In fact, this need occurs from use case for large repos. A new issue to address this might be more suitable.
FYI.
I tested it on some inside projects, on deepseek, gemini, qwen, and claude.
when total tokens below 100k, things usually work well.
when ~700k tokens, for gemini-2.5, sometimes it successes after retry; for gemini-2.0, it nearly fails all the time (at same point with indent problem).
nodes.py
Outdated
@@ -117,7 +121,7 @@ def exec(self, prep_res): | |||
{context} | |||
|
|||
{language_instruction}Analyze the codebase context. | |||
Identify the top 5-10 core most important abstractions to help those new to the codebase. | |||
Identify the top 5-20 core most important abstractions to help those new to the codebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you find 20 is a good change? I tune this before and find abstractions above 10 become less interesting and a bit low level to read. Maybe make this a tunable parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Love it! Could you check if it has been implemented by previous PR?
I'd rather just let LLM retry. My experience is that, when LLM messes up the indentation, it usually also messes up the content a bit.
Could you make it tunable?
Love them! |
now addressed in #74 |
crawl_local_files.py
with folder-level exclusion, partially solve Can it support large repo? #23fix_yaml.py
utility for YAML indentation fixesnodes.py
to support up to 20 core abstractions Can it support large repo? #23