Thanks to visit codestin.com
Credit goes to github.com

Skip to content

This project is to serve as a practice environment for Databricks medallion architecture. This will be limited because I am going to be using Databvricks free

Notifications You must be signed in to change notification settings

henriqueetges/MyDatalake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MyDatalake

This project is a sandbox for me to capture different kinds of data and load them into databricks environment. Goal is to implement a medallion architecture. This is being built on top of a free version of Databricks, so some features may not be explored at this time.

Data Architecture

  • Raw Layer: File data stored in managed storage, using json files
  • Bronze Layer: Delta lake tables, based on raw data
  • Silver Layer: Denormalized and cleansed tables
  • Gold Layer: Reporting Layer

Project Structure

  • Lib: Contains helper functions
  • Layers: Contain the ingestion scripts, their scripts, queries and metadata stored as yml files.

About

This project is to serve as a practice environment for Databricks medallion architecture. This will be limited because I am going to be using Databvricks free

Resources

Stars

Watchers

Forks