Thanks to visit codestin.com
Credit goes to github.com

bluishglc

Follow

Laurence Geng bluishglc

Follow

Architect, author of the book Big Data Platform Architecture and Prototype Implementation，sales page: https://item.jd.com/12677623.html

46 followers · 0 following

Shanghai, China
https://laurence.blog.csdn.net/

Achievements

Achievements

Stars

Jason2Brownlee / awesome-llm-books

Awesome LLM Books: Curated list of books on Large Language Models

1,031 154 Updated Apr 17, 2025

oryanmoshe / debezium-timestamp-converter

Java 44 40 Updated Sep 3, 2024

dajudge / kafkaproxy

kafkaproxy is a reverse proxy for the wire protocol of Apache Kafka.

Java 86 12 Updated Jun 13, 2023

sjwiesman / flink-scala-3

Scala 36 3 Updated Aug 24, 2022

yangyichao-mango / flink-study

Java 463 172 Updated Sep 17, 2022

morsapaes / flink-sql-CDC

Self-contained demo using Flink SQL and Debezium to build a CDC-based analytics pipeline. All you need is Docker! 🐳

Dockerfile 25 33 Updated May 11, 2021

brianfrankcooper / YCSB

Yahoo! Cloud Serving Benchmark

Java 5,152 2,313 Updated Aug 29, 2025

tmcgrath / kafka-connect-examples

Kafka Connect Examples

Shell 43 19 Updated Sep 27, 2022

mli / paper-reading

深度学习经典、新论文逐段精读

31,721 2,733 Updated Mar 22, 2025

aws-samples / emr-spark-benchmark

Shell 25 3 Updated Mar 12, 2024

cartershanklin / hive-testbench

Testbench for experimenting with Apache Hive at any data scale.

Java 64 193 Updated Jul 10, 2017

databricks / tpcds-kit

Forked from gregrahn/tpcds-kit

TPC-DS benchmark kit with some modifications/fixes

C 100 79 Updated Aug 13, 2024

databricks / spark-sql-perf

Scala 611 411 Updated Feb 26, 2022

hortonworks / hive-testbench

Java 391 293 Updated Jan 25, 2024

awesomedata / awesome-public-datasets

A topic-centric list of HQ open datasets.

69,739 10,841 Updated Oct 15, 2025

bluishglc / apache-hudi-core-conceptions

A set of notebooks to explore and explain core conceptions of Apache Hudi, such as file layouts, file sizing, compaction, clustering and so on.

Jupyter Notebook 10 3 Updated Aug 22, 2023

bluishglc / ranger-emr-cli-installer

This is a powerful cli tool for Apache Ranger and AWS EMR automated installation & integration with OpenLDAP & Windows AD. It supports Open-Source Ranger and EMR-Native Ranger both, supports OpenLD…

Shell 9 15 Updated Jan 30, 2023

ageron / handson-ml3

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Jupyter Notebook 11,501 4,485 Updated Oct 15, 2025

bluishglc / serverless-datalake-example

A serverless datalake project and framework based on AWS S3，Glue，Athena，MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.

Shell 16 5 Updated Nov 22, 2022

DataTalksClub / nyc-tlc-data

Backup for NYC TLC data for the DE Zoomcamp course

188 52 Updated Jul 19, 2022

datahub-project / datahub

The Metadata Platform for your Data and AI Stack

Java 11,159 3,243 Updated Oct 24, 2025

bluishglc / aws-cli-plus

This command line tool is a useful complement to aws-cli. It offers a suite of utilities that manages and operates ec2, emr and other aws services.

Shell 1 Updated Jul 4, 2023

Kyligence / ssb-kylin

Star Schema Benchmark Tool for Apache Kylin

C 97 47 Updated Aug 26, 2021

electrum / ssb-dbgen

Star Schema Benchmark dbgen

C 125 85 Updated Mar 11, 2024

bluishglc / bdp

A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype

Java 198 144 Updated Aug 12, 2020

big-data-europe / docker-hadoop

Apache Hadoop docker image

Shell 2,298 1,388 Updated Feb 1, 2024

libaoquan95 / aasPractice

《spark高级数据分析》练习

Scala 23 42 Updated Jun 9, 2018

renesemela / lastfm-dataset-2020

New Last.fm Dataset 2020 for music auto-tagging purposes.

Python 31 2 Updated Jul 6, 2023

bambrow / docker-hadoop-workbench

A Hadoop cluster based on Docker, including Hive and Spark.

Shell 81 31 Updated Nov 13, 2022

Marcel-Jan / docker-hadoop-spark

Forked from big-data-europe/docker-hadoop

Multi-container environment with Hadoop, Spark and Hive

Shell 224 164 Updated May 5, 2025