Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

Python 647 37 Updated Oct 20, 2025

bruin-data / ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

Python 3,325 111 Updated Oct 31, 2025

hyperxpro / Brotli4j

Brotli4j provides Brotli compression and decompression for Java.

Java 135 40 Updated Oct 1, 2025

rdblue / brotli-codec

Hadoop Codec for Brotli

Java 9 4 Updated Feb 16, 2021

rivian / delta-go

Go 85 12 Updated May 5, 2025

G-Research / spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Scala 231 30 Updated Jul 22, 2025

awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,533 573 Updated Aug 27, 2025

mrpowers-io / quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

Python 675 99 Updated Mar 6, 2025

opensrp / fhircore

FHIR Core / OpenSRP 2 is a Kotlin application for delivering offline-capable, mobile-first healthcare project implementations from local community to national and international scale using FHIR and…

Kotlin 66 70 Updated Oct 30, 2025

google / fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services on top of that data.

Jupyter Notebook 199 118 Updated Oct 31, 2025

awslabs / python-deequ

Python API for Deequ

Jupyter Notebook 801 147 Updated Apr 1, 2025

fsspec / s3fs

S3 Filesystem

Python 977 290 Updated Oct 30, 2025

dylan-profiler / visions

Type System for Data Analysis in Python

Python 213 19 Updated Feb 1, 2025

ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

Python 13,226 1,752 Updated Oct 15, 2025

dimajix / flowman

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

Scala 96 19 Updated Sep 30, 2025

databricks-industry-solutions / dbignite

Python 36 15 Updated Sep 8, 2025

databrickslabs / dlt-meta

Metadata driven Databricks Lakeflow Declarative Pipelines framework for bronze/silver pipelines

Python 222 100 Updated Oct 28, 2025

apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.

Java 8,353 4,429 Updated Nov 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sam Shpuntoff sshpuntoff

Achievements

Achievements

Block or report sshpuntoff

Stars

apache / datafusion

peak / s5cmd

clflushopt / tpchgen-rs

tianon / gosu

apache / auron

NationalSecurityAgency / XORSATFilter

facebookincubator / velox

apache / datafusion-comet

mrpowers-io / spark-fast-tests

databricks / databricks-jdbc

SuperClaude-Org / SuperClaude_Framework

apache / spark-docker

Nike-Inc / koheesio