Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View sshpuntoff's full-sized avatar

Block or report sshpuntoff

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Apache DataFusion SQL Query Engine

Rust 7,967 1,725 Updated Nov 1, 2025

Parallel S3 and local filesystem execution tool.

Go 3,707 308 Updated Jun 13, 2025

TPC-H benchmark data generation in pure Rust

Rust 205 45 Updated Sep 9, 2025

Simple Go-based setuid+setgid+setgroups+exec

Shell 4,892 348 Updated Oct 1, 2025

The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing

Rust 1,631 187 Updated Oct 31, 2025

A library for building efficient set-membership filters and dictionaries based on the Satisfiability problem.

C 80 21 Updated Aug 4, 2022

A composable and fully extensible C++ execution engine library for data management systems.

C++ 3,932 1,387 Updated Nov 1, 2025

Apache DataFusion Comet Spark Accelerator

Scala 1,060 247 Updated Oct 31, 2025

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

Scala 452 77 Updated Aug 8, 2025

JDBC for Databricks SQL

Java 19 22 Updated Oct 30, 2025

A configuration framework that enhances Claude Code with specialized commands, cognitive personas, and development methodologies.

Python 17,504 1,550 Updated Oct 31, 2025

Official Dockerfile for Apache Spark

Dockerfile 150 49 Updated Oct 31, 2025

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

Python 647 37 Updated Oct 20, 2025

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

Python 3,325 111 Updated Oct 31, 2025

Brotli4j provides Brotli compression and decompression for Java.

Java 135 40 Updated Oct 1, 2025

Hadoop Codec for Brotli

Java 9 4 Updated Feb 16, 2021
Go 85 12 Updated May 5, 2025

A library that provides useful extensions to Apache Spark and PySpark.

Scala 231 30 Updated Jul 22, 2025

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,533 573 Updated Aug 27, 2025

pyspark methods to enhance developer productivity 📣 👯 🎉

Python 675 99 Updated Mar 6, 2025

FHIR Core / OpenSRP 2 is a Kotlin application for delivering offline-capable, mobile-first healthcare project implementations from local community to national and international scale using FHIR and…

Kotlin 66 70 Updated Oct 30, 2025

A collection of tools for extracting FHIR resources and analytics services on top of that data.

Jupyter Notebook 199 118 Updated Oct 31, 2025

Python API for Deequ

Jupyter Notebook 801 147 Updated Apr 1, 2025

S3 Filesystem

Python 977 290 Updated Oct 30, 2025

Type System for Data Analysis in Python

Python 213 19 Updated Feb 1, 2025

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

Python 13,226 1,752 Updated Oct 15, 2025

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

Scala 96 19 Updated Sep 30, 2025

Metadata driven Databricks Lakeflow Declarative Pipelines framework for bronze/silver pipelines

Python 222 100 Updated Oct 28, 2025

Apache Beam is a unified programming model for Batch and Streaming data processing.

Java 8,353 4,429 Updated Nov 1, 2025
Next