Data engineering
and big data
U N D E R S TA N D I N G D ATA E N G I N E E R I N G
Hadrien Lacroix
Content Developer at DataCamp
About the course
Conceptual course
No coding involved
Objectives
Being able to exchange with data engineers
Provide a solid foundation to learn more
UNDERSTANDING DATA ENGINEERING
Chapter 1
What is data engineering?
1. Data engineering and big data
2. Data engineers vs. data scientists
3. Data pipelines
UNDERSTANDING DATA ENGINEERING
Chapter 2
How data storage works
1. Structured vs unstructured data
2. SQL
3. Data warehouse and data lakes
UNDERSTANDING DATA ENGINEERING
Chapter 3
How to move and process data
1. Processing data
2. Scheduling data
3. Parallel computing
4. Cloud computing
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
Data workflow
UNDERSTANDING DATA ENGINEERING
Data workflow
UNDERSTANDING DATA ENGINEERING
Data workflow
UNDERSTANDING DATA ENGINEERING
Data workflow
UNDERSTANDING DATA ENGINEERING
Data engineers
UNDERSTANDING DATA ENGINEERING
Data engineers
Data engineers deliver:
the correct data
in the right form
to the right people
as efficiently as possible
UNDERSTANDING DATA ENGINEERING
A data engineer's responsibilities
Ingest data from different sources
Optimize databases for analysis
Remove corrupted data
Develop, construct, test and maintain data architectures
UNDERSTANDING DATA ENGINEERING
Data engineers and big data
Big data becomes the norm =>
UNDERSTANDING DATA ENGINEERING
Data engineers and big data
Big data becomes the norm => data engineers are more and more needed
Big data:
Have to think about how to deal with its size
So large traditional methods don't work anymore
UNDERSTANDING DATA ENGINEERING
Big data growth
Sensors and devices
Social media
Enterprise data
VoIP (voice communication, multimedia sessions)
1 Data Age 2025, Seagate, November 2018
UNDERSTANDING DATA ENGINEERING
The five Vs
Volume (how much?)
Variety (what kind?)
Velocity (how frequent?)
Veracity (how accurate?)
Value (how useful?)
UNDERSTANDING DATA ENGINEERING
Summary
What's waiting for you
How data flows through an organization
When a data engineer intervenes
What their responsibilities are
How data engineering relates to big data
UNDERSTANDING DATA ENGINEERING
Let's practice!
U N D E R S TA N D I N G D ATA E N G I N E E R I N G
Data engineers vs.
data scientists
U N D E R S TA N D I N G D ATA E N G I N E E R I N G
Hadrien Lacroix
Content Developer at DataCamp
Data workflow
UNDERSTANDING DATA ENGINEERING
Data engineers
UNDERSTANDING DATA ENGINEERING
Data scientists
UNDERSTANDING DATA ENGINEERING
Data engineers enable data scientists
Data engineer Data scientist
Ingest and store data Exploit data
Set up databases Access databases
Build data pipelines Use pipeline outputs
Strong software skills Strong analytical skills
UNDERSTANDING DATA ENGINEERING
Summary
At which stages data engineers and data scientists intervene
How data engineers enable data scientists
UNDERSTANDING DATA ENGINEERING
Let's practice!
U N D E R S TA N D I N G D ATA E N G I N E E R I N G
The data pipeline
U N D E R S TA N D I N G D ATA E N G I N E E R I N G
Hadrien Lacroix
Content Developer at DataCamp
If data is the new oil...
1 The Economist, 2017-05-06, by David Parkins
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
Back to data engineering
Ingest
Process
Store
Need pipelines
Automate flow from one station to the next
Provide up-to-date, accurate, relevant data
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
Data pipelines ensure an efficient flow of the data
Automate Reduce
Extracting Human intervention
Transforming Errors
Combining Time it takes data to flow
Validating
Loading
UNDERSTANDING DATA ENGINEERING
ETL and data pipelines
ETL Data pipelines
Popular framework for designing data Move data from one system to another
pipelines May follow ETL
1) Extract data
Data may not be transformed
2) Transform extracted data
Data may be directly loaded in
3) Load transformed data to another applications
database
UNDERSTANDING DATA ENGINEERING
Summary
What a data pipeline is
What it does
Why it's important
How data pipelines are implemented at Spotflix
What ETL is and its nuances
UNDERSTANDING DATA ENGINEERING
Let's practice!
U N D E R S TA N D I N G D ATA E N G I N E E R I N G