Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ramprasadowk/spark-s3-delta-comparator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Spark S3 vs Delta Table Comparator

This PySpark utility compares JSON-based S3 source tables with Delta Lake tables to validate data integrity.

πŸ” Features

  • Compares row counts
  • Aligns records by primary key(s)
  • Identifies mismatches at column level
  • Outputs matched, mismatched, or missing tables

πŸ“‚ Folder Structure

About

Compare data between s3 to Datarbicks Delta table

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages