Codestin Search App

DATA ENGINEERING 101 - Project

Project Description and Objectives

In this project, the aim was to learning how to create a data pipeline using Python and SQL. The project was divided into four parts:

Data Exploration ,Cleaning and Transformation
Loading Data into a postgres database, using kafka and zookeeper to stream data
Using spark to process the data and load it into a postgres database
Creating a dashboard to visualize the data

Data Exploration ,Cleaning and Transformation

The data used in this project was financial data from an online retail store. The data was in a csv format and contained information on the loans given out by the company. The data was cleaned and transformed to remove any null values , duplicates and any other inconsistencies in the data. The data was then loaded into a postgres database.

Loading Data into a postgres database, using kafka and zookeeper to stream data

The data was loaded into a postgres database using a python script. Kafka was added to the pipeline to stream new data into the database after cleaning. Zookeeper was used to manage the kafka cluster.

Using spark to process the data and load it into a postgres database

Spark was used as a kind of exercise to redo part one of the project, Although it was not necessary to use spark in this project as scale was not an issue but it was a good learning experience. The data was processed using spark and loaded into a postgres database.

Creating a dashboard to visualize the data

A dashboard was created using Dash to visualize the data. The dashboard contained information on the loans given out by the company. The aim was to answer and give insights on the following 5 questions:

What is the trend of loan issuance over the months for each year?
What is the percentage distribution of loan grades in the dataset?
What is the distribution of loan amounts across different grades?
Which states have the highest average loan amount?
How does the loan amount relate to annual income across states?

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
M1		M1
M2		M2
M3		M3
M4		M4
screenshots		screenshots
.gitignore		.gitignore
Dataset Desc..pdf		Dataset Desc..pdf
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DATA ENGINEERING 101 - Project

Project Description and Objectives

Data Exploration ,Cleaning and Transformation

Loading Data into a postgres database, using kafka and zookeeper to stream data

Using spark to process the data and load it into a postgres database

Creating a dashboard to visualize the data

Dashboard Screenshots

About

Uh oh!

Releases

Packages

Languages

OMAR-AHMED-SAAD/Data-Engineering-Project

Folders and files

Latest commit

History

Repository files navigation

DATA ENGINEERING 101 - Project

Project Description and Objectives

Data Exploration ,Cleaning and Transformation

Loading Data into a postgres database, using kafka and zookeeper to stream data

Using spark to process the data and load it into a postgres database

Creating a dashboard to visualize the data

Dashboard Screenshots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages