0% found this document useful (0 votes)

10 views3 pages

Lecture 1 - Intro & Foundations

The lecture introduces the shift in data analytics from traditional measurement to strategic analysis, emphasizing the explosion of data generation and the transition to cloud storage for easier access. It discusses the challenges of Big Data characterized by the four V's: Volume, Velocity, Variety, and Veracity, and highlights the difficulties in programming distributed systems. Additionally, it covers different forms of parallelism and scalability, clarifying the distinction between scalability and performance.

Uploaded by

teun.bobbink

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views3 pages

Lecture 1 - Intro & Foundations

Uploaded by

teun.bobbink

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Lecture 1 - Intro & Foundations

What should you be able to do after this week?

Describe the characteristics of the recent shift in data analytics

Explain different forms of parallelism and scalability

Distinguish between scalability and performance

What is this course about?

Storing data
Before very limited storage, huge size of device and expensive.

Now, huge storage capacity, small in size and affordable.

Storage cost over time decreased over the years and capacity increased.

Why was there a shift in Data & Analytics?

Previously:

Data traditionally used for measurement

Descriptive backwards facing view about what happened in the past

Nowadays:

Data leveraged for strategic analysis, centered on growth

Data is used in a predictive forward facing function.

What changed?
Data generation exploded!
Previously:

Businesses captured well understood well-defined transaction data (e.g., data about orders and
payments)

Nowadays:

Advent of the web and mobile phones produces unprecedented amount of much less structured and
less defined interaction data

How does Modern Data Analytics work?

Previously:

IT department had monopoly on access to data

End users had to go through IT (via ticketing systems) for data analysis

Slow and tedious

Nowadays:

Data centrally stored in the cloud, IT department manages cloud

End users can directly access and analyse data

Challenges in working with Big Data

Lecture 1 - Intro & Foundations 1

The four V’s of Big Data

Four V’s Description

Volume We have to process a lot of data

Velocity The data is arriving very fast

We have structured, semi-structured and

Variety
unstructured data from many different sources

We have data of highly varying quality and

Veracity
trustworthiness

Challenges with Volume & Velocity

Can’t we just use lots of machines to process lots of data really fast?

Unfortunately, programming distributed systems (=working with lots of computers) is really hard!

Coordination

Concurrency

Fault tolerance

We need ways to write simple but efficient programs which execute in parallel on large datasets.

Challenges with Variety & Veracity

Can’t we just feed all our data into machine learning modesl which magically find the right answers for us?

Unfortunately not, most data scientists spend the majority of their time with preparing, cleaning and
organizing data instead of analysing data and training models…

Many data-driven ML applications are found to reproduce and amplify existing bias and discrimination.

Parallelism & Scalability

Task Parallelism
Also known as “multi-tasking”
Execute many independent tasks at once
Example: Operating system executing different processes at once on a multi-core machine

Data Parallelism
Execute the same task in parallel on different slices of the data
Example: query processing in modern cloud databases which store partitions of the data on different
machines

Pipeline Parallelism
Break tasks into a sequence of processing stages
Each stage takes result from the previous stage as input, with results being passed downstream
immediately
Example: instruction pipelining in modern CPUs

Scalability
Ability of a system to handle a growing amount of work by adding resources to the system
Often distinguished how resources are added:

Lecture 1 - Intro & Foundations 2

Scale-up: replace machine with “beefier” machine (More RAM, more Cores)

Scale-out: add more machines of the same type

Desired goal in practice:

Linear scalability with number of machines/cores in scale-out settings

“Elastic” scaling in cloud environments

Scalability ≠ Performance
A common misconception is that scalable systems are
also automatically performant
Scalability often comes with increased overheads,
especially in distributed settings (e.g., network
communication, coordination overhead)

Lecture 1 - Intro & Foundations 3

CS 441 Handouts
No ratings yet
CS 441 Handouts
300 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
Big Data Unit 1 AKTU Notes
100% (1)
Big Data Unit 1 AKTU Notes
87 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
69 pages
Big Data 1
No ratings yet
Big Data 1
28 pages
Course Outline and Introduction
No ratings yet
Course Outline and Introduction
37 pages
Lecture 3 - Introduction To Apache Spark - 1691899519972
No ratings yet
Lecture 3 - Introduction To Apache Spark - 1691899519972
67 pages
EECS6893 BigDataAnalytics Lecture1
No ratings yet
EECS6893 BigDataAnalytics Lecture1
81 pages
Guidelines For Oral Presentation
No ratings yet
Guidelines For Oral Presentation
5 pages
Lectures 01 Introduction
No ratings yet
Lectures 01 Introduction
46 pages
Module - 1
No ratings yet
Module - 1
84 pages
Data Collection & Analysis Educational Presentation in Pink and Blue Lined Style
No ratings yet
Data Collection & Analysis Educational Presentation in Pink and Blue Lined Style
51 pages
Big Data Training
No ratings yet
Big Data Training
244 pages
5introduction Data Science
No ratings yet
5introduction Data Science
46 pages
Big Data Analytics 18CS72 - Module 1
No ratings yet
Big Data Analytics 18CS72 - Module 1
84 pages
Module 1-BDA
No ratings yet
Module 1-BDA
82 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
What Is Teaching Approach
No ratings yet
What Is Teaching Approach
3 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
User Experience Design Project Guide
No ratings yet
User Experience Design Project Guide
70 pages
Ecs765p W1
No ratings yet
Ecs765p W1
39 pages
Big Data Overview
No ratings yet
Big Data Overview
75 pages
Data Science and Big Data Analytics - Unit - 1
No ratings yet
Data Science and Big Data Analytics - Unit - 1
47 pages
Cloud Computing Unit4
No ratings yet
Cloud Computing Unit4
55 pages
Lecture1 Introduction and Map Reduce Model
No ratings yet
Lecture1 Introduction and Map Reduce Model
92 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Lecture 1 - Introduction To Big Data
No ratings yet
Lecture 1 - Introduction To Big Data
51 pages
Big Data Analytics
No ratings yet
Big Data Analytics
36 pages
Unit I Introduction Data Science and Big Data
No ratings yet
Unit I Introduction Data Science and Big Data
42 pages
Big Data Engineer Course
No ratings yet
Big Data Engineer Course
31 pages
22UCS303 DS-Unit I-N
No ratings yet
22UCS303 DS-Unit I-N
42 pages
Marine Transportation Thesis Topics
100% (3)
Marine Transportation Thesis Topics
7 pages
EECS6893 BigDataAnalytics Lecture1
No ratings yet
EECS6893 BigDataAnalytics Lecture1
58 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
DS231 Module 3 PDF
No ratings yet
DS231 Module 3 PDF
41 pages
Intro Big Data
No ratings yet
Intro Big Data
36 pages
Big Data Analysis Introduction
No ratings yet
Big Data Analysis Introduction
42 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
Failing COVID-19 Response A Failure of A Weak and Privatized
No ratings yet
Failing COVID-19 Response A Failure of A Weak and Privatized
14 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Big Data Management Course Overview
No ratings yet
Big Data Management Course Overview
26 pages
BIGDATA AND HADOOP - Unit I
No ratings yet
BIGDATA AND HADOOP - Unit I
23 pages
Big Data Analytics Course Guide
No ratings yet
Big Data Analytics Course Guide
17 pages
Unit-1 Final Sgs
No ratings yet
Unit-1 Final Sgs
24 pages
ACFrOgAwL0PY3KhC8bmj8 Tkn4VXIuzRXmO4m38PNwPNxrYXssqTO9eW08uDiqm1 WcoT Df8n8jEQFFXyw0jf8 Gt5zLErZyC0O7n Br5gKXSItii9m5K1Bz6 1Z4ID5XTfeuXt9i5gRsy5d6Fh
No ratings yet
ACFrOgAwL0PY3KhC8bmj8 Tkn4VXIuzRXmO4m38PNwPNxrYXssqTO9eW08uDiqm1 WcoT Df8n8jEQFFXyw0jf8 Gt5zLErZyC0O7n Br5gKXSItii9m5K1Bz6 1Z4ID5XTfeuXt9i5gRsy5d6Fh
60 pages
Big Data
No ratings yet
Big Data
41 pages
CS-701 BigDataHadoop Unit-1
No ratings yet
CS-701 BigDataHadoop Unit-1
23 pages
Big Data
No ratings yet
Big Data
10 pages
Unit 4 LT
No ratings yet
Unit 4 LT
16 pages
Udemy Business 2023 WorkplaceLearningTrends Report
No ratings yet
Udemy Business 2023 WorkplaceLearningTrends Report
34 pages
1.introduction To Data Science
No ratings yet
1.introduction To Data Science
23 pages
XMMMX
No ratings yet
XMMMX
14 pages
Topic 1 Big Data Technologies
No ratings yet
Topic 1 Big Data Technologies
5 pages
Big Data Analytics & Distributed Platforms
No ratings yet
Big Data Analytics & Distributed Platforms
18 pages
BDA2023 Outline
No ratings yet
BDA2023 Outline
7 pages
IET Udaipur BDA Unit-1
No ratings yet
IET Udaipur BDA Unit-1
10 pages
Surveying With Construction Applications 8 Ed Kavanagh
No ratings yet
Surveying With Construction Applications 8 Ed Kavanagh
305 pages
Social Psychology Essentials
No ratings yet
Social Psychology Essentials
94 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
8 pages
Pega CSA 7.4 Certification Guide
0% (1)
Pega CSA 7.4 Certification Guide
2 pages
00 - 00 DS - Overview - FRAMEWORK
No ratings yet
00 - 00 DS - Overview - FRAMEWORK
63 pages
PCom - Lesson 17
No ratings yet
PCom - Lesson 17
23 pages
First Quarter Module 1 Activities
100% (4)
First Quarter Module 1 Activities
2 pages
Chapter1 PDF
No ratings yet
Chapter1 PDF
30 pages
SCOPE Student's Handbook - Obstetrics - Gynecology
No ratings yet
SCOPE Student's Handbook - Obstetrics - Gynecology
16 pages
0 - Course Intro
No ratings yet
0 - Course Intro
9 pages
Data Science: Lecture #1
No ratings yet
Data Science: Lecture #1
22 pages
Siena News Fall 2010
No ratings yet
Siena News Fall 2010
36 pages
Cambridge Homeschooling Guide
No ratings yet
Cambridge Homeschooling Guide
12 pages
Ict Css12 q4 Las4 Week-7-8
No ratings yet
Ict Css12 q4 Las4 Week-7-8
7 pages
Big Data
No ratings yet
Big Data
7 pages
A Quantitative Evaluation of Shame Resilience Theory
No ratings yet
A Quantitative Evaluation of Shame Resilience Theory
2 pages
COMP9313: Big Data Management
No ratings yet
COMP9313: Big Data Management
79 pages
Physics Bits Solutions
No ratings yet
Physics Bits Solutions
20 pages
GMAT Focus Edition: A Complete Guide: Updated On Jul 2, 2024 22:36 IST
No ratings yet
GMAT Focus Edition: A Complete Guide: Updated On Jul 2, 2024 22:36 IST
7 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
Assessing Quality of Education: in Perspective With Continuous Assessment and Learners' Performance in Adwa College, Ethiopia
No ratings yet
Assessing Quality of Education: in Perspective With Continuous Assessment and Learners' Performance in Adwa College, Ethiopia
11 pages
Unit of Work
No ratings yet
Unit of Work
23 pages
Business Plan Template
No ratings yet
Business Plan Template
10 pages
Brighton in The Rain
No ratings yet
Brighton in The Rain
3 pages
4th Grade Sugar Rhythms Lesson Plan
No ratings yet
4th Grade Sugar Rhythms Lesson Plan
3 pages
INVITATION For Speaker
No ratings yet
INVITATION For Speaker
3 pages
B.A. 2nd Sem Exam Schedule 2024
No ratings yet
B.A. 2nd Sem Exam Schedule 2024
1 page
Pecs 10
No ratings yet
Pecs 10
2 pages
3RD PT Eng3 Tos
No ratings yet
3RD PT Eng3 Tos
2 pages
Calander 2018-2019 Tusd
No ratings yet
Calander 2018-2019 Tusd
1 page

Lecture 1 - Intro & Foundations

Uploaded by

Lecture 1 - Intro & Foundations

Uploaded by

Lecture 1 - Intro & Foundations

What should you be able to do after this week?

Explain different forms of parallelism and scalability

Distinguish between scalability and performance

What is this course about?

Now, huge storage capacity, small in size and affordable.

Why was there a shift in Data & Analytics?

Data traditionally used for measurement

Descriptive backwards facing view about what happened in the past

Data leveraged for strategic analysis, centered on growth

Data is used in a predictive forward facing function.

How does Modern Data Analytics work?

IT department had monopoly on access to data

Slow and tedious

Data centrally stored in the cloud, IT department manages cloud

End users can directly access and analyse data

Challenges in working with Big Data

Lecture 1 - Intro & Foundations 1

Four V’s Description

Volume We have to process a lot of data

Velocity The data is arriving very fast

We have structured, semi-structured and

We have data of highly varying quality and

Challenges with Volume & Velocity

Challenges with Variety & Veracity

Parallelism & Scalability

Lecture 1 - Intro & Foundations 2

Scale-out: add more machines of the same type

Desired goal in practice:

Linear scalability with number of machines/cores in scale-out settings

“Elastic” scaling in cloud environments

Lecture 1 - Intro & Foundations 3

You might also like