Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views79 pages

Data Engineering and Big Data: Hadrien Lacroix

The document provides an overview of data engineering, emphasizing its role in managing big data and the responsibilities of data engineers. It outlines key concepts such as data pipelines, data storage, and the differences between data engineers and data scientists. Additionally, it discusses the importance of automation and frameworks like ETL in data processing and flow management.

Uploaded by

username.ac316
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views79 pages

Data Engineering and Big Data: Hadrien Lacroix

The document provides an overview of data engineering, emphasizing its role in managing big data and the responsibilities of data engineers. It outlines key concepts such as data pipelines, data storage, and the differences between data engineers and data scientists. Additionally, it discusses the importance of automation and frameworks like ETL in data processing and flow management.

Uploaded by

username.ac316
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Data engineering

and big data


U N D E R S TA N D I N G D ATA E N G I N E E R I N G

Hadrien Lacroix
Content Developer at DataCamp
About the course
Conceptual course
No coding involved

Objectives
Being able to exchange with data engineers

Provide a solid foundation to learn more

UNDERSTANDING DATA ENGINEERING


Chapter 1
What is data engineering?

1. Data engineering and big data

2. Data engineers vs. data scientists


3. Data pipelines

UNDERSTANDING DATA ENGINEERING


Chapter 2
How data storage works

1. Structured vs unstructured data

2. SQL
3. Data warehouse and data lakes

UNDERSTANDING DATA ENGINEERING


Chapter 3
How to move and process data

1. Processing data

2. Scheduling data
3. Parallel computing

4. Cloud computing

UNDERSTANDING DATA ENGINEERING


UNDERSTANDING DATA ENGINEERING
Data workflow

UNDERSTANDING DATA ENGINEERING


Data workflow

UNDERSTANDING DATA ENGINEERING


Data workflow

UNDERSTANDING DATA ENGINEERING


Data workflow

UNDERSTANDING DATA ENGINEERING


Data engineers

UNDERSTANDING DATA ENGINEERING


Data engineers
Data engineers deliver:

the correct data

in the right form


to the right people

as efficiently as possible

UNDERSTANDING DATA ENGINEERING


A data engineer's responsibilities
Ingest data from different sources
Optimize databases for analysis

Remove corrupted data

Develop, construct, test and maintain data architectures

UNDERSTANDING DATA ENGINEERING


Data engineers and big data
Big data becomes the norm =>

UNDERSTANDING DATA ENGINEERING


Data engineers and big data
Big data becomes the norm => data engineers are more and more needed
Big data:
Have to think about how to deal with its size

So large traditional methods don't work anymore

UNDERSTANDING DATA ENGINEERING


Big data growth
Sensors and devices
Social media

Enterprise data

VoIP (voice communication, multimedia sessions)

1 Data Age 2025, Seagate, November 2018

UNDERSTANDING DATA ENGINEERING


The five Vs
Volume (how much?)
Variety (what kind?)

Velocity (how frequent?)

Veracity (how accurate?)

Value (how useful?)

UNDERSTANDING DATA ENGINEERING


Summary
What's waiting for you
How data flows through an organization

When a data engineer intervenes

What their responsibilities are

How data engineering relates to big data

UNDERSTANDING DATA ENGINEERING


Let's practice!
U N D E R S TA N D I N G D ATA E N G I N E E R I N G
Data engineers vs.
data scientists
U N D E R S TA N D I N G D ATA E N G I N E E R I N G

Hadrien Lacroix
Content Developer at DataCamp
Data workflow

UNDERSTANDING DATA ENGINEERING


Data engineers

UNDERSTANDING DATA ENGINEERING


Data scientists

UNDERSTANDING DATA ENGINEERING


Data engineers enable data scientists
Data engineer Data scientist
Ingest and store data Exploit data

Set up databases Access databases

Build data pipelines Use pipeline outputs

Strong software skills Strong analytical skills

UNDERSTANDING DATA ENGINEERING


Summary
At which stages data engineers and data scientists intervene
How data engineers enable data scientists

UNDERSTANDING DATA ENGINEERING


Let's practice!
U N D E R S TA N D I N G D ATA E N G I N E E R I N G
The data pipeline
U N D E R S TA N D I N G D ATA E N G I N E E R I N G

Hadrien Lacroix
Content Developer at DataCamp
If data is the new oil...

1 The Economist, 2017-05-06, by David Parkins

UNDERSTANDING DATA ENGINEERING


UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
Back to data engineering
Ingest
Process

Store

Need pipelines

Automate flow from one station to the next

Provide up-to-date, accurate, relevant data

UNDERSTANDING DATA ENGINEERING


UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
UNDERSTANDING DATA ENGINEERING
Data pipelines ensure an efficient flow of the data
Automate Reduce

Extracting Human intervention

Transforming Errors
Combining Time it takes data to flow

Validating

Loading

UNDERSTANDING DATA ENGINEERING


ETL and data pipelines
ETL Data pipelines
Popular framework for designing data Move data from one system to another
pipelines May follow ETL
1) Extract data
Data may not be transformed
2) Transform extracted data
Data may be directly loaded in
3) Load transformed data to another applications
database

UNDERSTANDING DATA ENGINEERING


Summary
What a data pipeline is
What it does

Why it's important

How data pipelines are implemented at Spotflix

What ETL is and its nuances

UNDERSTANDING DATA ENGINEERING


Let's practice!
U N D E R S TA N D I N G D ATA E N G I N E E R I N G

You might also like