Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View bluishglc's full-sized avatar

Block or report bluishglc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Awesome LLM Books: Curated list of books on Large Language Models

1,031 154 Updated Apr 17, 2025

kafkaproxy is a reverse proxy for the wire protocol of Apache Kafka.

Java 86 12 Updated Jun 13, 2023
Scala 36 3 Updated Aug 24, 2022

Self-contained demo using Flink SQL and Debezium to build a CDC-based analytics pipeline. All you need is Docker! 🐳

Dockerfile 25 33 Updated May 11, 2021

Yahoo! Cloud Serving Benchmark

Java 5,152 2,313 Updated Aug 29, 2025

Kafka Connect Examples

Shell 43 19 Updated Sep 27, 2022

深度学习经典、新论文逐段精读

31,721 2,733 Updated Mar 22, 2025

Testbench for experimenting with Apache Hive at any data scale.

Java 64 193 Updated Jul 10, 2017

TPC-DS benchmark kit with some modifications/fixes

C 100 79 Updated Aug 13, 2024

A topic-centric list of HQ open datasets.

69,739 10,841 Updated Oct 15, 2025

A set of notebooks to explore and explain core conceptions of Apache Hudi, such as file layouts, file sizing, compaction, clustering and so on.

Jupyter Notebook 10 3 Updated Aug 22, 2023

This is a powerful cli tool for Apache Ranger and AWS EMR automated installation & integration with OpenLDAP & Windows AD. It supports Open-Source Ranger and EMR-Native Ranger both, supports OpenLD…

Shell 9 15 Updated Jan 30, 2023

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Jupyter Notebook 11,501 4,485 Updated Oct 15, 2025

A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.

Shell 16 5 Updated Nov 22, 2022

Backup for NYC TLC data for the DE Zoomcamp course

188 52 Updated Jul 19, 2022

The Metadata Platform for your Data and AI Stack

Java 11,159 3,243 Updated Oct 24, 2025

This command line tool is a useful complement to aws-cli. It offers a suite of utilities that manages and operates ec2, emr and other aws services.

Shell 1 Updated Jul 4, 2023

Star Schema Benchmark Tool for Apache Kylin

C 97 47 Updated Aug 26, 2021

Star Schema Benchmark dbgen

C 125 85 Updated Mar 11, 2024

A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype

Java 198 144 Updated Aug 12, 2020

Apache Hadoop docker image

Shell 2,298 1,388 Updated Feb 1, 2024

《spark高级数据分析》练习

Scala 23 42 Updated Jun 9, 2018

New Last.fm Dataset 2020 for music auto-tagging purposes.

Python 31 2 Updated Jul 6, 2023

A Hadoop cluster based on Docker, including Hive and Spark.

Shell 81 31 Updated Nov 13, 2022

Multi-container environment with Hadoop, Spark and Hive

Shell 224 164 Updated May 5, 2025
Next