0% found this document useful (0 votes)

60 views13 pages

Digitization Week 3

Rooms oh fully taken by worse do. Points afraid but may end law lasted. Was out laughter raptures returned outweigh. Luckily cheered colonel me do we attacks on highest enabled. Tried law yet style child. Bore of true of no be deal. Frequently sufficient in be unaffected. The furnished she concluded depending procuring concealed.

Uploaded by

Ilion

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views13 pages

Digitization Week 3

Uploaded by

Ilion

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Evolution of Data Management

Database Management Systems

Early days → DBMS
• Collecting, storing, processing and retrieving data
• Application built on top of file systems

Examples:
• Bank
• Hospital

Redundancies → same info in more than one places

Inconsistencies → different values for the same info

• Plethora of drawbacks
1. Data redundancy and inconsistency
◦ Multiple data formats, duplication in different files
2. Difficulty in accessing data
◦ Need to write a new program to carry out each new task
3. Data isolation
◦ Multiple files and formats
4. Integrity problems
◦ Integrity constraints (e.g., account balance > 0) become “buried” in program
code rather than being stated explicitly ◦ Hard to add new constraints or change
existing ones
5. Atomicity of updates
◦ Example: Transfer of funds from one account to another should either be
completed or not happen at all
◦ Failures may leave data in an inconsistent state with partial updates carried
out
6. Concurrent access by multiple users
◦ Needed for performance ◦ Example: Two people reading a balance (e.g.,
100) and then withdrawing money (e.g., 50 each) at the same time
◦ Uncontrolled concurrent accesses can lead to inconsistencies
7. Security problems
◦ Hard to provide user access to some, but not all, data

Relational Database Management Systems

Following Developments → RDBMS
◦ Designed to take care of DBMS drawbacks /inefficiencies
◦ Data is stored in the form of tables
◦ Maintaining the relationships among tables
◦ Large data sizes, distribution, many users, multiple levels of data security supports
integrity constraints, etc.
Internet growth
- The example of Wikipedia
• Free content encyclopedia
• Among the popular websites
• Written/maintained: community of volunteer contributors

• Various actions given to average Web user!

• Create new articles • Extend existing articles
• Translate to other languages
• Events appear within minutes

Big Data and its challenges

• Reality: ever-increasing data, demanding users

Big Data
• Information assets that require new forms of processing to enable enhanced decision
making and insight discovery
• Definition of Big Data expressed through Vs
1. Volume:
◦ Amount of generated and stored data ◦ Wikipedia 6.5M Eng. articles, users,
other languages, etc.
2. Velocity:
◦ Rate/Speed at which the data is generated, received, collected, and
(perhaps) processed
◦ Wikipedia e.g., 6000 editors have more than 100 edits per month
over the English articles
3. Variety:
◦ Different types of data that are available
◦ RDBMS: structured and neatly fit
◦ Web systems, e.g., Wikipedia: unstructured and semistructured data types,
such as text, audio, and video
◦ Requires additional preprocessing to derive meaning and support metadata
4. Veracity:
◦ Quality of captured data
◦ Truthful of the data and how much we can rely on it
◦ Low veracity → high percentage of meaningless data (e.g., noise)
5. Value:
◦ Refers to the inherent wealth (i.e., economic and social) embedded in the
data
◦ Consider biggest tech. companies large part of their value comes from their
data, which they’re constantly analyzing to improve efficiency & develop new
products

Even more Big Data Characteristics

• Visualization:
◦ Display the data
◦ Technical issues due to limitations of in-memory technology, scalability,
response time, etc.
• Volatility:
◦ Everything changes … thus we always need to be if data is now irrelevant,
historic, or just not useful
• Vulnerability:
◦ New security concerns
Example

Data Integration
• Entities encode a large part of our knowledge
• Valuable asset for numerous current applications and (Web) systems

• Plethora of different objects have the same name

• Example: London

Entity Resolution
• Task that identifies and aggregates the different descriptions that refer to the same
real-world objects
• Primary usefulness:
◦ Improves data quality and integrity
◦ Fosters re-use of existing data sources
• Example application domains:
◦ Linked Data
◦ Building Knowledge Graphs
◦ Census data
◦ Price comparison portals
Data Management

• Challenges arise from the application settings • Examples: ◦ Data characteristics ◦ System
and resources ◦ Time restrictions • Evolving nature of the application settings implies a
constant modification of the challenges • Primary reason for the plethora of the Entity
Resolution methods

Challenges
Veracity
• Structured data with known semantics and quality
• Dealing with high levels of description noise
+ Volume
• Very large number of description
+ Variety
• Large volumes of semi-structured, unstructured or highly heterogeneous structured
data
+ Velocity
• Increasing volume of available data
Challenges in time

Big Data refers to the inherent wealth, economic and social, embedded in any data collection
- Data storage
- Finding the needle in the haystack
- Data processing
- Scalability

Architectural choices to consider

• Storage layer
• Programming model & execution engine
• Scheduling
• Optimizations
• Fault tolerance
• Load balancing
Scalability in data management (Chronological order)
Traditional databases
◦ Constrained functionality: SQL only
◦ Efficiency limited by server capacity
- Memory
- CPU (central processing unit)
- HDD (hard disk drive)
- Network
• Scaling can be done by
◦ Adding more hardware
◦ Creating better algorithms
- But we still can reach the limits

Distributed databases
• Innovation:
◦ Add more DBSMs & partition the data
• Constrained functionality:
◦ Answer SQL queries
• Efficiency limited by network, #servers
• API offers location transparency
◦ User/application always sees a single machine
◦ User/application not caring about data location
• Scaling: add more/better servers, faster network

Massively parallel processing platforms

• Innovation:
◦ Connect computers (nodes) over a LAN & make development, parallelization, and
robustness easy
• Functionality:
◦ Generic data-intensive computing
• Efficiency relies on network, #computers, and algorithms
• API offers location & parallelism transparency
◦ Developers don’t know where data is stored and how the code will be parallelized
• Scaling: ◦ Add more and/or better computers

Cloud
• Massively parallel processing platforms running over rented hardware
• Innovation: Elasticity, standardization
◦ Amazon requires huge computational capacity near holidays
◦ University requires very little resources during holidays
• Elasticity can be automatically adjusted
• API offers location and parallelism transparency
• Scaling: it’s magic!
Big Data models
Store, Manage, and Process of Big Data by harnessing large clusters of commodity nodes
• MapReduce family: simpler, more constrained

• 2nd generation: enables more complex processing and data, optimization opportunities
- Apache spark, Google Pregel, Microsoft Dryad

Big Data Analytics (according to IBM)\

• Driven by artificial intelligence, mobile devices, social media and the Internet of Things
(IoT)
• Data sources are becoming more complex than those for traditional data
◦ e.g., Web applications allow user generated data
- Deliver deeper insights
- Power innovative data applications
- Better and faster decision-making
- Predicting future outcomes
- Enhanced business intelligence

Analytics
• Traditional computation (e.g., SQL):
◦ Exact and all answers over the whole data collection

Interactive processing:
• Users give an opinion
• Thus:
- Users understand the problem
- Users influence decisions
• ER: system users are asked to help during the processing, i.e., their answers are
considered as part of the algorithm

Crowdsourcing processing:
• Difficult tasks or opinions in the processing are given to a group of people
• ER: humans are asked about the relation between profiles for a small compensation per
reply

Approximate processing:
• Use a representative sample instead of the entire input data collection
• Give approximate output and not exact answers
• Answers given with quarantines
• ER: profiles are the same with 95% certainty

Progressive processing:
• Efficiently process given limited time and/or computational resources that currently are
available
• ER: results are shown as soon as there are available

Incremental processing:
• Data updates is often high, which quickly makes the previous results obsolete
• Update existing processing information • Allow leveraging new evidence from updates to:
• Fix previous inconsistencies or
• Complete the information

Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Unit III-1
67% (6)
Unit III-1
13 pages
It3401 Web Essentials Syllabus
No ratings yet
It3401 Web Essentials Syllabus
2 pages
QGIS Map Layout Guide
No ratings yet
QGIS Map Layout Guide
18 pages
Big Data 1 Unit
No ratings yet
Big Data 1 Unit
21 pages
Hadoop PPT
100% (1)
Hadoop PPT
25 pages
Big Data-One
No ratings yet
Big Data-One
9 pages
Big Data Introduction
No ratings yet
Big Data Introduction
41 pages
Big Data Analytics 18CS72 - Module 1
No ratings yet
Big Data Analytics 18CS72 - Module 1
84 pages
Example Questions For CBT Exam
No ratings yet
Example Questions For CBT Exam
32 pages
Big Data Unit 1 AKTU Notes
100% (1)
Big Data Unit 1 AKTU Notes
87 pages
Big Data Basics for Beginners
No ratings yet
Big Data Basics for Beginners
53 pages
Business Intelligence Essentials
No ratings yet
Business Intelligence Essentials
38 pages
Module 1-BDA
No ratings yet
Module 1-BDA
82 pages
Unit 1
No ratings yet
Unit 1
44 pages
Big Data
No ratings yet
Big Data
54 pages
Wa0033.
No ratings yet
Wa0033.
26 pages
Unit I
No ratings yet
Unit I
64 pages
College Bus Tracking App
No ratings yet
College Bus Tracking App
25 pages
Stream Processing Chapter 2
No ratings yet
Stream Processing Chapter 2
21 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
$R3N9XOZ
No ratings yet
$R3N9XOZ
56 pages
Big Data - Comprehensive Summary
No ratings yet
Big Data - Comprehensive Summary
12 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
BD 1
No ratings yet
BD 1
15 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
Unit 1
No ratings yet
Unit 1
21 pages
Content
No ratings yet
Content
7 pages
Big Data
No ratings yet
Big Data
23 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Introduction To Big Data Management
No ratings yet
Introduction To Big Data Management
53 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Unit-1 Introduction To Big Data Analytics
No ratings yet
Unit-1 Introduction To Big Data Analytics
57 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Collection Framework in JAVA
No ratings yet
Collection Framework in JAVA
10 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
BD Unit 1
No ratings yet
BD Unit 1
5 pages
BDA UNIT-1 (Lecture-1)
No ratings yet
BDA UNIT-1 (Lecture-1)
5 pages
Database Trends & Innovations
No ratings yet
Database Trends & Innovations
5 pages
Chapter 5 ITM100
No ratings yet
Chapter 5 ITM100
5 pages
Big Data Presentation
No ratings yet
Big Data Presentation
22 pages
Wibd Notes
No ratings yet
Wibd Notes
32 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
Adbms Finals Reviewer
No ratings yet
Adbms Finals Reviewer
3 pages
Big Data
No ratings yet
Big Data
23 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
No ratings yet
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
42 pages
Unit 4 LT
No ratings yet
Unit 4 LT
16 pages
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
No ratings yet
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
58 pages
00 - 00 DS - Overview - FRAMEWORK
No ratings yet
00 - 00 DS - Overview - FRAMEWORK
63 pages
Module 1
No ratings yet
Module 1
21 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Avoid Beckhoff Twincat 2.11 Keygenl Risks
No ratings yet
Avoid Beckhoff Twincat 2.11 Keygenl Risks
4 pages
BDT 1
No ratings yet
BDT 1
49 pages
Big Data Class - Introduction
No ratings yet
Big Data Class - Introduction
60 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
On Block Chain
No ratings yet
On Block Chain
16 pages
Lecture 1
No ratings yet
Lecture 1
22 pages
Lesson 3 - Variables, Constants and Calculations
No ratings yet
Lesson 3 - Variables, Constants and Calculations
25 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
24 pages
Big Data
No ratings yet
Big Data
31 pages
Powerview Model Pv750: Installation and Operations Manual
No ratings yet
Powerview Model Pv750: Installation and Operations Manual
36 pages
Catalog EcoStruxure Machine SCADA Expert - Lite SCADA For Line Management
No ratings yet
Catalog EcoStruxure Machine SCADA Expert - Lite SCADA For Line Management
15 pages
Dell XPS One 2720 Pegatron IPPLP-PL Rev 1.01
No ratings yet
Dell XPS One 2720 Pegatron IPPLP-PL Rev 1.01
82 pages
Future Revolution On Big Data
No ratings yet
Future Revolution On Big Data
24 pages
Tier 1 Network - Wikipedia
No ratings yet
Tier 1 Network - Wikipedia
14 pages
Closure Prop Dfa
No ratings yet
Closure Prop Dfa
22 pages
COS 212 Programming II
No ratings yet
COS 212 Programming II
14 pages
Blackview BV7000 Stock Rom Firmware
No ratings yet
Blackview BV7000 Stock Rom Firmware
3 pages
Bit Set and Clear For 68HC11
0% (1)
Bit Set and Clear For 68HC11
4 pages
Plantpax Distributed Control System - Template User Manual
No ratings yet
Plantpax Distributed Control System - Template User Manual
144 pages
DLink Dealer 2025
No ratings yet
DLink Dealer 2025
18 pages
HP Vmware Utilities User Guide: Vmware Vsphere 5.5 For September 2013
No ratings yet
HP Vmware Utilities User Guide: Vmware Vsphere 5.5 For September 2013
19 pages
FON Unit IV - IoT
No ratings yet
FON Unit IV - IoT
29 pages
API Cheet Sheet
No ratings yet
API Cheet Sheet
11 pages
Microsoft Office 365 Plans Comparison
No ratings yet
Microsoft Office 365 Plans Comparison
8 pages
Modul PBM 4 (Assembly Basic)
No ratings yet
Modul PBM 4 (Assembly Basic)
19 pages
Four-Channel Universal Analog Input Using The MAX11270: Hardware Specification
No ratings yet
Four-Channel Universal Analog Input Using The MAX11270: Hardware Specification
8 pages
Module-1.2 - Embedded Systems Based On Microcontrollers
No ratings yet
Module-1.2 - Embedded Systems Based On Microcontrollers
125 pages
JWT Tools for Developers
No ratings yet
JWT Tools for Developers
5 pages
ESX Job and Account Refresh Code
No ratings yet
ESX Job and Account Refresh Code
2 pages
Cisco Catalyst 8300 Pricing Guide
No ratings yet
Cisco Catalyst 8300 Pricing Guide
1 page
Dos and Donts of C Program
No ratings yet
Dos and Donts of C Program
6 pages

Digitization Week 3

Uploaded by

Digitization Week 3

Uploaded by

Evolution of Data Management

Database Management Systems

Redundancies → same info in more than one places

Relational Database Management Systems

• Various actions given to average Web user!

Big Data and its challenges

• Reality: ever-increasing data, demanding users

Even more Big Data Characteristics

• Plethora of different objects have the same name

Architectural choices to consider

Massively parallel processing platforms

Big Data Analytics (according to IBM)\

You might also like