Big Data Workshops with hands-on tutorials for working with S3, Spark, Delta Lake, Trino, ...
This workshop is used in the Big Data and Spark Ecosystem Module of the Data Engineering CAS at the Berner Fachhochschule.
All the workshops can be done on a container-based infrastructure using Docker Compose for the container orchestration. It can be run on a local machine or in a cloud environment. Check 01-environment for instructions on how to setup the infrastructure.
- Working with Minio Object Storage
- Working with AWS S3 Object Storage (optional)
- Getting Started using Spark RDD and DataFrames
- Data Reading and Writing using DataFrames
- Graph Analysis using Spark GraphFrames
- Working with different data types
- Working with Delta Lake Table Format
- Working with Trino
- Data Ingestion with Apache NiFi