Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ValidateLite: A lightweight CLI for database schema validation and data quality checks. Ideal for CI/CD, ETL, and data pipelines.

License

Notifications You must be signed in to change notification settings

litedatum/validatelite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ValidateLite

PyPI version Python 3.8+ License: MIT Code Coverage

ValidateLite: A lightweight, scenario-driven data validation tool for modern data practitioners.

Whether you're a data scientist cleaning a messy CSV, a data engineer building robust pipelines, or a developer needing a quick check, ValidateLite provides powerful, focused commands for your use case:

  • vlite check: For quick, ad-hoc data checks. Need to verify if a column is unique or not null right now? The check command gets you an answer in seconds, zero config required.

  • vlite schema: For robust, repeatable, and automated validation. Define your data's contract in a JSON schema and let ValidateLite verify everything from data types and ranges to complex type-conversion feasibility.


Who is it for?

For the Data Scientist: Preparing Data for Analysis

You have a messy dataset (legacy_data.csv) where everything is a string. Before you can build a model, you need to clean it up and convert columns to their proper types (integer, float, date). How much work will it be?

Instead of writing complex cleaning scripts first, use vlite schema to assess the feasibility of the cleanup.

1. Define Your Target Schema (rules.json)

Create a schema file that describes the current type and the desired type.

{
  "legacy_users": {
    "rules": [
      {
        "field": "user_id",
        "type": "string",
        "desired_type": "integer",
        "required": true
      },
      {
        "field": "salary",
        "type": "string",
        "desired_type": "float(10,2)",
        "required": true
      },
      {
        "field": "bio",
        "type": "string",
        "desired_type": "string(500)",
        "required": false
      }
    ]
  }
}

2. Run the Validation

vlite schema --conn legacy_data.csv --rules rules.json

ValidateLite will generate a report telling you exactly what can and cannot be converted, saving you hours of guesswork.

FIELD VALIDATION RESULTS
========================

Field: user_id
  ✓ Field exists (string)
  ✓ Not Null constraint
  ✗ Type Conversion Validation (string → integer): 15 incompatible records found

Field: salary
  ✓ Field exists (string)
  ✗ Type Conversion Validation (string → float(10,2)): 8 incompatible records found

Field: bio
  ✓ Field exists (string)
  ✓ Length Constraint Validation (string → string(500)): PASSED

For the Data Engineer: Ensuring Data Integrity in CI/CD

You need to prevent breaking schema changes and bad data from ever reaching production. Embed ValidateLite into your CI/CD pipeline to act as a quality gate.

Example Workflow (.github/workflows/ci.yml)

This workflow automatically validates the database schema on every pull request.

jobs:
  validate-db-schema:
    name: Validate Database Schema
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install ValidateLite
        run: pip install validatelite

      - name: Run Schema Validation
        run: |
          vlite schema --conn "mysql://${{ secrets.DB_USER }}:${{ secrets.DB_PASS }}@${{ secrets.DB_HOST }}/sales" \
                       --rules ./schemas/customers_schema.json \
                       --fail-on-error

This same approach can be used to monitor data quality at every stage of your ETL/ELT pipelines, preventing "garbage in, garbage out."


Quick Start: Ad-Hoc Checks with check

For temporary, one-off validation needs, the check command is your best friend. You can run multiple rules on any supported data source (files or databases) directly from the command line.

1. Install (if you haven't already):

pip install validatelite

2. Run a check:

# Check for nulls and uniqueness in a CSV file
vlite check --conn "customers.csv" --table customers \
  --rule "not_null(id)" \
  --rule "unique(email)"

# Check value ranges and formats in a database table
vlite check --conn "mysql://user:pass@host/db" --table customers \
  --rule "range(age, 18, 99)" \
  --rule "enum(status, 'active', 'inactive')"

Learn More


📝 Development Blog

Follow the journey of building ValidateLite through our development blog posts:


📄 License

This project is licensed under the MIT License