adding support for thinking models #882

AmineF0 · 2025-05-12T13:05:40Z

updated VannaBase.extract_sql to strip out , , , and tags—used by models like Alibaba’s Qwen3 to wrap intermediate reasoning—before parsing SQL. Without this, the extractor sometimes misread the reasoning as part of the query.

What Changed:

Tag Removal: We loop through known reasoning tags and use non-greedy regex (e.g. …) to clean them out.

Whitespace Cleanup: Removes extra blank lines to avoid confusing the parser.

Safe Defaults: If no tags are present (like with normal LLMs), nothing changes.

Why It Matters:
Qwen3 and similar LLMs output thoughts in XML-style tags. We now clean those up so only the final SQL remains, ensuring compatibility without breaking existing behavior.

Edge Cases:
Only known tags are removed—new ones (e.g., ) won’t be unless added.

Assumes well-formed tags; malformed ones may slip through.

Regex is cautious to avoid removing too much.

Minimal performance cost.

llamapreview

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

Primary purpose and scope: Enhance SQL extraction to handle LLM responses with XML-style reasoning tags
Key components modified: VannaBase.extract_sql method
Cross-component impacts: Affects all SQL parsing logic using cleaned responses
Business value alignment: Improves compatibility with modern LLMs while maintaining backward compatibility

1.2 Technical Architecture

System design modifications: Added preprocessing layer for response sanitization
Integration points impact: Changes input processing for all SQL extraction patterns
Dependency changes: Introduces new regex patterns without new dependencies

2. Critical Findings

2.1 Must Fix (P0🔴)

Issue: Incorrect variable usage in WITH clause extraction

Analysis Confidence: High
Impact: Fails to apply cleaning to WITH-style queries, breaking core functionality
Resolution: Change llm_response to cleaned_response in WITH clause regex

Issue: Missing test coverage

Analysis Confidence: High
Impact: Risk of undetected regressions and edge case failures
Resolution: Add comprehensive unit tests for tag removal scenarios

2.2 Should Fix (P1🟡)

Issue: Potential false positives in SQL comments/strings

Analysis Confidence: Medium
Impact: Could corrupt valid SQL containing tag-like patterns
Suggested Solution: Add test cases and document limitation

2.3 Consider (P2🟢)

Area: Configurable tag list

Analysis Confidence: Medium
Improvement Opportunity: Future-proof against new LLM tag formats

Area: Code documentation

Analysis Confidence: High
Improvement Opportunity: Better maintainability through inline comments

2.4 Summary of Action Items

Fix WITH clause variable (P0 - immediate)
Add test suite (P0 - before merge)
Document tag-in-SQL limitation (P1 - post-merge)
Consider tag configuration (P2 - future)

3. Technical Analysis

3.1 Code Logic Analysis

📁 src/vanna/base/base.py - VannaBase.extract_sql

Submitted PR Code:

    # WITH clause extraction
    sqls = re.findall(r"\bWITH\b .*?;", llm_response, re.DOTALL | re.IGNORECASE)

Analysis:
- Uses original uncleaned response for WITH queries
- Defeats purpose of preprocessing step
- Creates inconsistent parsing behavior
LlamaPReview Suggested Improvements:

    sqls = re.findall(r"\bWITH\b .*?;", cleaned_response, re.DOTALL | re.IGNORECASE)

Improvement rationale:
- Ensures consistent preprocessing
- Maintains feature integrity

📁 src/vanna/base/base.py - Tag Removal Logic

Submitted PR Code:

    reasoning_tags = ["reasoning", "reason", "thoughts", "think"]
    for tag_name in reasoning_tags:
        pattern = rf"<{re.escape(tag_name)}>(.*?)</{re.escape(tag_name)}>"
        cleaned_response = re.sub(pattern, "", cleaned_response, flags=re.DOTALL)

Analysis:
- Effective non-greedy pattern with proper escaping
- Lacks context awareness for SQL structure
- No protection against comment/string matches
Suggested Improvements:

    # Remove reasoning sections (may affect SQL comments/strings)
    for tag_name in reasoning_tags:
        pattern = rf"<{re.escape(tag_name)}>(.*?)</{re.escape(tag_name)}>"
        cleaned_response = re.sub(pattern, "", cleaned_response, flags=re.DOTALL)

3.2 Key Quality Aspects

Testing strategy: Requires validation of multiline tags and mixed content
Documentation needs: Public API docs should mention tag-stripping behavior

4. Overall Evaluation

Technical assessment: Effective solution with critical implementation gaps
Business impact: High value for LLM compatibility once fixed
Risk evaluation: Medium risk without test coverage
Notable positives: Good regex safety practices with re.escape
Implementation quality: Requires critical fixes before production-ready
Final recommendation: Request Changes (Address P0 issues first)

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

adding support for thinking models

4c7ddb9

llamapreview bot reviewed May 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding support for thinking models #882

adding support for thinking models #882

Uh oh!

AmineF0 commented May 12, 2025

Uh oh!

llamapreview bot left a comment

Uh oh!

Uh oh!

adding support for thinking models #882

Are you sure you want to change the base?

adding support for thinking models #882

Uh oh!

Conversation

AmineF0 commented May 12, 2025

What Changed:

Uh oh!

llamapreview bot left a comment

Choose a reason for hiding this comment

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

1.2 Technical Architecture

2. Critical Findings

2.1 Must Fix (P0🔴)

2.2 Should Fix (P1🟡)

2.3 Consider (P2🟢)

2.4 Summary of Action Items

3. Technical Analysis

3.1 Code Logic Analysis

3.2 Key Quality Aspects

4. Overall Evaluation

Uh oh!

Uh oh!