Project: Data Modeling with Postgres

This project models a Postgres database with tables designed to optimize queries on song play analysis for a startup called Sparkify.
The objective is to create a database schema and ETL pipeline for the analysis of data Sparkify been collecting on songs and user activity on their new music streaming app.

Project Strcuture

ETL Pipeline

etl.py

ETL pipeline builder

process_data
- Iterating dataset to apply process_song_file and process_log_file functions
process_song_file
- Process song dataset to insert record into songs and artists dimension table
process_log_file
- Process log file to insert record into time and users dimensio table and songplays fact table

create_tables.py

Creating Fact and Dimension table schema

create_database
drop_tables
create_tables

sql_queries.py

Helper SQL query statements for etl.py and create_tables.py

*_table_drop
*_table_create
*_table_insert
song_select

Database Schema

Fact table

songplays - songplay_id PRIMARY KEY - start_time REFERENCES time (start_time) - user_id REFERENCES users (user_id) - level - song_id REFERENCES songs (song_id) - artist_id REFERENCES artists (artist_id) - session_id - location - user_agent

Dimension table

users - user_id PRIMARY KEY - first_name - last_name - gender - level

songs - song_id PRIMARY KEY - title - artist_id - year - duration

artists - artist_id PRIMARY KEY - name - location - latitude - longitude

time - start_time PRIMARY KEY - hour - day - week - month - year - weekday

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
create_tables.py		create_tables.py
erd.py		erd.py
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project: Data Modeling with Postgres

Project Strcuture

ETL Pipeline

etl.py

Database Schema

Fact table

Dimension table

About

Uh oh!

Releases

Packages

Languages

srajeevan/data-modeling-postgres

Folders and files

Latest commit

History

Repository files navigation

Project: Data Modeling with Postgres

Project Strcuture

ETL Pipeline

etl.py

Database Schema

Fact table

Dimension table

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages