Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View thunderpoot's full-sized avatar
💨
💨

Sponsors

@pbernicchi
@blacknite4ever

Highlights

  • Pro

Organizations

@commoncrawl @mlcommons @telehack-foundation @londonpixelexchange @vaughantype

Block or report thunderpoot

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Automatically visualize your pandas dataframe via a single print! 📊 💡

Python 5,365 377 Updated Mar 20, 2024

A collection of public presentations from the Common Crawl Foundation

9 1 Updated Oct 22, 2025

Examples for using the Daft data engine

Jupyter Notebook 9 2 Updated Jan 6, 2026

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

Rust 5,120 385 Updated Jan 17, 2026

Magnificent app which corrects your previous console command.

Python 95,212 3,816 Updated Jul 19, 2024

tall, condensed, bitmap font for geeks

Vim Script 2,052 32 Updated Nov 4, 2023

Internet Archive's Sparkling Data Processing Library

Scala 15 2 Updated Dec 22, 2025

Introduction to WebGraphs - Workshop at the IIPC Web Archiving Conference 2025

Shell 3 Updated Apr 10, 2025

Materials for the workshop held at Born-Digital Collections, Archives and Memory conference 2025.

2 Updated Apr 10, 2025

Lightweight Python utility for retrieving individual pages from the Common Crawl archives.

Python 7 1 Updated Mar 2, 2025

A very simple Python module to suppress `KeyboardInterrupt` traceback spam when pressing `^C`.

Python 4 Updated Feb 28, 2025

ripgrep recursively searches directories for a regex pattern while respecting your gitignore

Rust 58,972 2,370 Updated Dec 17, 2025

Vocabulary for Expressing Content Preferences for AI Training

Makefile 2 Updated Jan 15, 2026

Common Voice is part of Mozilla's initiative to help teach machines how real people speak.

TypeScript 3,440 874 Updated Jan 16, 2026

An open-source handbook of applied guidance and tools for sustainable software development and maintenance.

Jupyter Notebook 23 4 Updated Jan 12, 2026

Yet another Markdown renderer but this one extends standard Markdown with additional features, offering enhanced functionality and flexibility for content creators

Go 5 Updated Dec 11, 2024

yq is a portable command-line YAML, JSON, XML, CSV, TOML, HCL and properties processor

Go 14,715 730 Updated Jan 15, 2026

Statistics of Common Crawl monthly Web Graphs

Python 5 1 Updated Dec 23, 2025

Working repo to support the Alliance's Open Trusted Data Initiative

Jupyter Notebook 10 7 Updated Jan 16, 2026

Book repository for The Turing Way: a how to guide for reproducible, ethical and collaborative data science

TeX 2,121 740 Updated Jan 16, 2026

A utility to do detailed analysis of gzip files.

Rust 3 Updated Nov 19, 2024

LaTeX to image converter with web UI using Node.js / Docker

JavaScript 294 43 Updated Oct 4, 2023

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

Python 118 27 Updated Jan 16, 2026

Open source project for data preparation for GenAI applications

HTML 891 240 Updated Jan 16, 2026

Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code

68 86 Updated Jan 7, 2026

⏩ Ship faster with Continuous AI. Open-source CLI that can be used in TUI mode as a coding agent or Headless mode to run background agents

TypeScript 30,916 4,056 Updated Jan 16, 2026

Turn ASCII art into SVG

JavaScript 97 17 Updated Jan 15, 2026

X-SAMPA to IPA and IPA to X-SAMPA converter

JavaScript 3 2 Updated May 19, 2021

A JavaScript utility for web pages that creates dynamic, human-readable dates, times, and relative time descriptions from UNIX timestamps.

JavaScript 3 Updated Aug 28, 2024

The day-to-day front-end to the IETF database for people who work on IETF standards.

Python 898 681 Updated Jan 16, 2026
Next