Thanks to visit codestin.com
Credit goes to github.com

Skip to content

LLMSchemaCompareOperator / @task.llm_schema_compare #62734

@kaxil

Description

@kaxil

LLMSchemaCompareOperator / @task.llm_schema_compare

Cross-system schema drift detection powered by LLM reasoning.

What

Compare schemas across different database systems (PostgreSQL, Snowflake, S3 Parquet, etc.) and identify mismatches that would break data loading. The LLM handles complex cross-system type mapping that simple equality checks miss (e.g., varchar(255) vs string, timestamp vs timestamptz).

Design

  • Accepts multiple data_sources (or db_conn_ids + table_names) for cross-system comparison
  • Schema introspection from each source via the appropriate hook (DbApiHook, S3Hook, etc.)
  • System prompt includes schema context from all sources with clear labeling (database name, dialect)
  • reasoning_mode=True strongly recommended — complex cross-system type mapping benefits from step-by-step analysis
  • context_strategy="full" for thorough analysis (includes constraints, indexes, clustering keys)
  • Structured output: list of mismatches, severity, suggested migration actions

Use Cases

  • Detect breaking schema changes before ETL runs
  • Generate migration plans during maintenance windows
  • Validate schema consistency across data warehouse replicas
  • Compare source system schemas against downstream expectations

Example

schema_drift = LLMSchemaCompareOperator(
    task_id="detect_schema_drift",
    data_sources=[customer_s3, customer_postgres, customer_snowflake],
    prompt="Identify schema mismatches that would break data loading between systems",
    reasoning_mode=True,
    context_strategy="full",
    llm_conn_id="openai_default",
)

# Decorator version
@task.llm_schema_compare(
    db_conn_ids=["postgres_source", "snowflake_target"],
    table_names=["customers"],
)
def check_migration_readiness():
    is_maintenance = check_migration_window()
    if is_maintenance:
        return "Compare schemas and generate migration plan for maintenance window"
    return "Compare schemas and flag breaking changes — no migrations allowed"

Dependencies

  • LLMOperator (merged)
  • Multi-datasource support (for cross-database introspection)

Phase

Phase 3

Metadata

Metadata

Assignees

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions