Apache Flink studies, notes and samples

The vision for this repository is to create a comprehensive, up-to-date, and practical body of knowledge around Apache Flink and its ecosystem, including integration with related technologies (Kafka, Iceberg, CDC, Kubernetes, etc.), accessible as a GitHub Pages site and a set of demonstrations or study code.

Better read in BOOK format.

Goals and Repository Organisation

The first goal is to keep notes, best practices, how-to from my deeper dive into Apache Flink. Content may be relevant to others so this is open sourced. All the content is from public documentation. The goal is to present a body of knowledge around Flink ecosystem.

The repository is organized into:

docs: a living book on Flink subjects
code: Flink programming code in SQL, Python and Java
deployment: for infrastructure as code. For running locally, some of the older code or demos were using Docker compose and desktop, but due to the licensing challenge, and also trying to keep up with k8s deployment, the deployments are going to be local kubernetes, but should be easy to port to any kubernetes cluster hosted by any Cloud provider. For local kubernetes, minicube has some challenges, and colima on mac seems a better experience. I have two laptops one MacOS and on Windows Linux (WSL).

Some end-to-end demos are also implemented to cover integration demonstrations, or more polished demonstrations.

The Implementation-Specific documentation is kept in the respective code folders, via README.md files. The goal is to make it easier to maintain documentation alongside code changes.

Docs

The docs folder includes:

Core Flink concepts and architecture
Flink SQL and Table API deep dives
Stateful stream processing patterns
Event time processing and watermarks
Fault tolerance and exactly-once semantics
State backends and state management
Deployment options (standalone, YARN, Kubernetes)
Performance tuning and monitoring
Integration guides (Kafka, Iceberg, MySQL CDC)
End-to-end tutorials and examples
Troubleshooting and best practices

Code

flink-java folder includes java main classes to learn about some of the Flink Data streams processing like simple filtering, joins operations, using Quarkus app.
flink-sql is a set of SQL examples
The e2e-demos includes a set of end to end demonstration with more component. The e-com-sale-simulator tutorial integrates Kafka, MySQL, Elasticsearch, and Kibana with Flink SQL to analyze e-commerce user's behavior in real-time.
Flink SQL local java app

Deployment

k8s: all the local deployment is for colima with kubernetes engine.
product-tar: is for flink and Kafka binary.
cc-terraform is to use terraform to configure Confluent Cloud environment, Kafka Cluster, and Flink compute pools.

🙏 Support my work

Love it? Give it a ⭐️ by clicking below:

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.github/workflows		.github/workflows
.quarkus/cli/plugins		.quarkus/cli/plugins
.vscode		.vscode
assistants		assistants
code		code
deployment		deployment
docs		docs
e2e-demos		e2e-demos
.gitconfig		.gitconfig
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
TASKS.md		TASKS.md
checklist.md		checklist.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Apache Flink studies, notes and samples

Goals and Repository Organisation

Docs

Code

Deployment

🙏 Support my work

About

Uh oh!

Releases

Packages

Languages

License

jbcodeforce/flink-studies

Folders and files

Latest commit

History

Repository files navigation

Apache Flink studies, notes and samples

Goals and Repository Organisation

Docs

Code

Deployment

🙏 Support my work

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages