sql match bug fix #921

1023097618 · 2025-07-05T06:03:11Z

when the sql is like this:

SELECT\n    ig.investor_group_id,\n    COUNT(DISTINCT CASE WHEN ch.is_major_shareholder = 1 THEN ch.company_id END) AS num_major_shareholder_banks,\n    COUNT(DISTINCT CASE WHEN ch.shares_held = (\n        SELECT MAX(ch2.shares_held)\n        FROM c_holdings ch2\n        WHERE ch2.company_id = ch.company_id\n    ) AND ch.is_major_shareholder = 1 THEN ch.company_id END) AS num_controlled_banks\nFROM\n    c_investors ig\nJOIN\n    c_holdings ch ON ig.investor_id = ch.investor_id\nWHERE\n    ig.investor_group_id IS NOT NULL\nGROUP BY\n    ig.investor_group_id\nHAVING\n    num_major_shareholder_banks > 2\n    OR num_controlled_banks > 1;

this line

sqls = re.findall(r"\bSELECT\b .*?;", llm_response, re.DOTALL | re.IGNORECASE)

will only match the sub sql like this:

SELECT MAX(ch2.shares_held)\n        FROM c_holdings ch2\n        WHERE ch2.company_id = ch.company_id\n    ) AND ch.is_major_shareholder = 1 THEN ch.company_id END) AS num_controlled_banks\nFROM\n    c_investors ig\nJOIN\n    c_holdings ch ON ig.investor_id = ch.investor_id\nWHERE\n    ig.investor_group_id IS NOT NULL\nGROUP BY\n    ig.investor_group_id\nHAVING\n    num_major_shareholder_banks > 2\n    OR num_controlled_banks > 1;

this bug is caused by the \n following after the SELECT , which can not be matched by the \b , so we should fix the reg expression like this

sqls = re.findall(r"\bSELECT.*?;", llm_response, re.DOTALL | re.IGNORECASE)

llamapreview

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

Primary purpose and scope: Fixes SQL extraction bug when handling nested subqueries by modifying regex pattern
Key components modified: SQL extraction logic in base.py
Cross-component impacts: Ensures accurate SQL extraction for all query patterns
Business value alignment: Maintains core functionality reliability for text-to-SQL generation

1.2 Technical Architecture

System design modifications: Improved regex pattern for SQL query detection
Component interaction changes: No architectural changes
Integration points impact: Maintains existing integration contracts
Dependency changes and implications: No new dependencies introduced

2. Critical Findings

2.1 Must Fix (P0🔴)

Issue: Current regex fails to extract complete SQL queries containing nested subqueries

Analysis Confidence: High
Impact: Causes incorrect SQL parsing, leading to potential query execution failures
Resolution: Implement the proposed regex change to r"\bSELECT.*?;"

2.2 Should Fix (P1🟡)

Issue: Regex-based approach may not handle all complex SQL edge cases

Analysis Confidence: Medium
Impact: Potential for future extraction failures with unconventional SQL syntax
Suggested Solution: Consider SQL parsing library (e.g., sqlparse) for long-term robustness

2.3 Consider (P2🟢)

Area: Enhanced logging for SQL extraction process

Analysis Confidence: Medium
Improvement Opportunity: Improved debugging capabilities for future extraction issues

2.4 Summary of Action Items

Implement regex fix immediately (P0)
Evaluate SQL parsing libraries for future-proofing (P1)
Consider adding extraction failure logging (P2)

3. Technical Analysis

3.1 Code Logic Analysis

📁 src/vanna/base/base.py - SQL extraction logic

Submitted PR Code:

    sqls = re.findall(r"\bSELECT.*?;", llm_response, re.DOTALL | re.IGNORECASE)

Analysis:
- Fixes critical issue where nested subqueries weren't fully captured
- Handles cases where SELECT is immediately followed by parenthesis
- Maintains existing case-insensitive matching
- Preserves core extraction functionality for other patterns
Improvement rationale:
- Fixes critical extraction failure with nested queries
- Maintains backward compatibility with existing patterns
- Minimal risk change with high reliability impact

3.2 Key Quality Aspects

Testing strategy: Recommend adding test cases for nested subqueries
Documentation: Suggest documenting regex limitations in code comments

4. Overall Evaluation

Technical assessment: Effective solution for critical extraction bug
Business impact: Maintains core functionality reliability
Risk evaluation: Low risk with high value fix
Notable positive aspects: Simple, focused solution to specific issue
Implementation quality: Clean, minimal change with clear purpose
Final recommendation: Approve ✅ with suggestion for future robustness improvements

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

sql match bug fix

173ea97

llamapreview bot reviewed Jul 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sql match bug fix #921

sql match bug fix #921

Uh oh!

1023097618 commented Jul 5, 2025

Uh oh!

llamapreview bot left a comment

Uh oh!

Uh oh!

sql match bug fix #921

Are you sure you want to change the base?

sql match bug fix #921

Uh oh!

Conversation

1023097618 commented Jul 5, 2025

Uh oh!

llamapreview bot left a comment

Choose a reason for hiding this comment

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

1.2 Technical Architecture

2. Critical Findings

2.1 Must Fix (P0🔴)

2.2 Should Fix (P1🟡)

2.3 Consider (P2🟢)

2.4 Summary of Action Items

3. Technical Analysis

3.1 Code Logic Analysis

3.2 Key Quality Aspects

4. Overall Evaluation

Uh oh!

Uh oh!