Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@grahamplata
Copy link
Member

@grahamplata grahamplata commented May 20, 2025

Add SQL model validation in the form of assert. Assertions are SQL queries that SHOULD NOT return any rows.

Example table
SELECT * FROM range(5) results in a range column containing 0,1,2,3,4

Example model.yaml
This example contains a mixture of both sql and assert syntax

# model.yaml
type: model
sql: SELECT * FROM range(5)

tests:
  # Test that all values are in valid range
  - name: Valid Range
    assert: range >= 0 AND range <= 4

  # Test that all values are in valid range (sql)
  - name: Valid Range SQL
    sql: SELECT * FROM model WHERE range < 0 OR range > 4

  # Test row count is exactly 5
  - name: Exact Row Count
    sql: SELECT 'Wrong row count' as error WHERE (SELECT COUNT(*) FROM model) != 5

  # Test no null values exist
  - name: No Nulls
    assert: range IS NOT NULL

  # Test all values are non-negative
  - name: Non-negative Values
    assert: range >= 0

  # Test maximum value doesn't exceed 4
  - name: Max Value Check
    assert: range <= 4

  # Test using BETWEEN syntax
  - name: Range Between
    assert: range BETWEEN 0 AND 4

  # Test that specific values exist
  - name: Value 0 Exists
    sql: SELECT 'Value 0 missing' WHERE (SELECT COUNT(*) FROM model WHERE range = 0) = 0

  - name: Value 4 Exists
    sql: SELECT 'Value 4 missing' WHERE (SELECT COUNT(*) FROM model WHERE range = 4) = 0

  # Test no duplicates (each value appears exactly once)
  - name: No Duplicates
    sql: SELECT range, COUNT(*) as count FROM model GROUP BY range HAVING COUNT(*) > 1

  # Test arithmetic properties
  - name: Sum Check
    sql: SELECT 'Sum should be 10' WHERE (SELECT SUM(range) FROM model) != 10

  - name: Average Check
    sql: SELECT 'Average should be 2' WHERE (SELECT AVG(range) FROM model) != 2.0

  # Test min/max values
  - name: Min Value Check
    sql: SELECT 'Min should be 0' WHERE (SELECT MIN(range) FROM model) != 0

  - name: Max Value Check
    sql: SELECT 'Max should be 4' WHERE (SELECT MAX(range) FROM model) != 4

  # Test even numbers exist
  - name: Even Numbers Present
    assert: range % 2 = 0 OR range % 2 = 1

  # Test specific range exclusions
  - name: No Negative Values
    assert: range >= 0

  - name: No Large Values
    assert: range < 100

  # Test using IN clause
  - name: Valid Values Only
    assert: range IN (0, 1, 2, 3, 4)

  # Test data completeness
  - name: All Expected Values Present
    sql: |
      SELECT missing_value FROM (
        VALUES (0), (1), (2), (3), (4)
      ) AS expected(missing_value)
      WHERE missing_value NOT IN (SELECT range FROM model)

Checklist:

  • Covered by tests
  • Ran it and it works as intended
  • Reviewed the diff before requesting a review
  • Checked for unhandled edge cases
  • Linked the issues it closes PLAT-3
  • Checked if the docs need to be updated - I plan to update these once we collect feedback
  • Intend to cherry-pick into the release branch
  • I'm proud of this work!

@avaitla
Copy link

avaitla commented May 23, 2025

If a test fails, the resulting table view is empty, this is a bit painful if you want to have an iterative write/test/fix workflow, where you first write a sql model and then start to add tests onto it and when a test fails you can iteratively fix your model until the test passes, however when the data goes blank because the test failed you can't iterate effectively (as you can no longer see whats going on)

Screenshot 2025-05-23 at 2 34 16 PM

Ideally the tests could have a warning at the bottom for which ones failed, but still allow the model to exist. This approach allows people to incrementally adopt testing into their workflow if they're not used to it, otherwise the hard block and assert makes it much harder feature for an end user to adopt and ease into.

@avaitla
Copy link

avaitla commented May 23, 2025

Longer term would be nice if the SQL test can also display what the failure was (the returned rows can include concatenated strings on what broke to help us debug further). Also surfacing them on end user explore dashboards. For instance if you're in finance looking at your finance dashboard and the underlying model has test failures, it is good to know that the data you're looking at failed some integrity tests.

@grahamplata grahamplata changed the title Add SQL model validation for full and incremental models [ENG-620] - Add SQL model validation for full and incremental models May 30, 2025
@grahamplata grahamplata self-assigned this May 30, 2025
@avaitla
Copy link

avaitla commented Jun 2, 2025

Anything left to do here?

@grahamplata grahamplata changed the title [ENG-620] - Add SQL model validation for full and incremental models Add SQL model validation for full and incremental models Jun 5, 2025
@avaitla
Copy link

avaitla commented Jun 10, 2025

Wanted to bump here @begelundmuller

@begelundmuller
Copy link
Contributor

begelundmuller commented Jun 12, 2025

Wanted to bump here @begelundmuller

@avaitla We've run into some tricky edge cases around how tests interact with model execution that are taking a while to get right. We're hoping to get this shipped soon!

@avaitla
Copy link

avaitla commented Jun 24, 2025

I like the mix of sql and assert, it makes the simple cases easier but still gives the power of flexible queries. Is there a simple for assert saying "user_id" is unique, or ("user_id", "date") is unique composite, thats another common case.

Also curious when a few tests failed where they will appear and what the message says. Will the dashboard viewer see it somewhere to know the dashboard they're looking at has errors in the underlying model.

@grahamplata
Copy link
Member Author

I like the mix of sql and assert, it makes the simple cases easier but still gives the power of flexible queries. Is there a simple for assert saying "user_id" is unique, or ("user_id", "date") is unique composite, thats another common case.

Also curious when a few tests failed where they will appear and what the message says. Will the dashboard viewer see it somewhere to know the dashboard they're looking at has errors in the underlying model.

Initially it will be surfaced through the existing "error handling popover" that appears in the lower third of the table results. I will be setting up some time with the frontend folks. Happy to capture your thoughts

@avaitla
Copy link

avaitla commented Jun 25, 2025

If an error is shown for tests, will the end dashboard still be up for end users? Ideally I imagine it could be.

Would this be the best way to encode a unique column test?

SELECT 1 FROM table GROUP BY uniq_key HAVING COUNT(*) > 1

This is a good reference for some similar tests: https://cloud.google.com/bigquery/docs/reference/standard-sql/debugging-statements

@grahamplata
Copy link
Member Author

If an error is shown for tests, will the end dashboard still be up for end users? Ideally I imagine it could be.

Would this be the best way to encode a unique column test?

SELECT 1 FROM table GROUP BY uniq_key HAVING COUNT(*) > 1

This is a good reference for some similar tests: https://cloud.google.com/bigquery/docs/reference/standard-sql/debugging-statements

The model test errors will not prevent dashboards from being available to end users. If model tests fail, the reconciler returns an error, but it does not delete or roll back the model output.

@grahamplata grahamplata merged commit 1cf6d43 into main Jul 4, 2025
13 checks passed
@grahamplata grahamplata deleted the gplata/model-tests branch July 4, 2025 14:30
grahamplata added a commit that referenced this pull request Jul 8, 2025
* feat: add model tests functionality

* update supporting model tests

* add model tests hash

* make proto

* fix: reorder web-local and web-integration in package-lock.json

* Review

* Prevent unnecessary state updates

---------

Co-authored-by: Benjamin Egelund-Müller <[email protected]>
grahamplata added a commit that referenced this pull request Jul 8, 2025
* feat: add model tests functionality

* update supporting model tests

* add model tests hash

* make proto

* fix: reorder web-local and web-integration in package-lock.json

* Review

* Prevent unnecessary state updates

---------

Co-authored-by: Benjamin Egelund-Müller <[email protected]>
royendo added a commit that referenced this pull request Oct 31, 2025
- Create comprehensive model validation documentation page
- Document both 'assert' and 'sql' test syntaxes
- Include examples for common validation patterns (null checks, ranges, duplicates, etc.)
- Add complete example with various validation types
- Document test execution behavior and best practices
- Include guidance on test naming, organization, and performance
- Add sections on working with incremental and partitioned models
- Update models index to reference new validation page

Related to PR #7344

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants