Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
26 views20 pages

Lecture 1 Introduction

PDM - Introduction
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views20 pages

Lecture 1 Introduction

PDM - Introduction
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

International University, VNU-HCMC

School of Computer Science and Engineering

Lecture 1:
INTRODUCTION

Instructor: Nguyen Thi Thuy Loan


[email protected], [email protected]
https://nttloan.wordpress.com/

International University, VNU-HCMC

Course Website
• Blackboard IU
• Please check frequently for updates!
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

1
International University, VNU-HCMC

Acknowledgement
• The following slides are referenced from Duke
University.
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

International University, VNU-HCMC

Outline
• Data/ Information/ Knowledge
• Database
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Database management system

2
International University, VNU-HCMC

Data/ Information/ Knowledge


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

International University, VNU-HCMC

What comes to your mind…


When you think about “databases”?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

OpenAI – DALL-E

3
International University, VNU-HCMC

But these use databases too…


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

International University, VNU-HCMC

Data → science & health


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

https://www.npr.org/sections/goatsandsoda/2023/04/21/1171245878/how-do-you-get-equal-health-care-for-all-a-huge-new-
database-holds-clues 8

4
International University, VNU-HCMC

Data → fun and profit


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

International University, VNU-HCMC

Democratizing data (and analysis)


• Democratization of data: more data—relevant to you and the
society—are being collected
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

o “Smart planet”: sensors for phones and cars, roads and


bridges, buildings and forests, …
o “Government in the sunshine”: spending reports, school
performance, crime reports, corporate filings, campaign
contributions…
• But few people know how to analyze them
o Even fewer know how to analyze them responsibly and at
scale
• You will learn how to help bridge this divide

10

5
International University, VNU-HCMC

Computational challenge
•Moore’s Law:
Processing power doubles every 18 months
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

•But the amount of data doubles every 9 months


o Disk sales (# of bits) doubles every nine months
o Parkinson’s Law:
• Data expands to fill the space available for storage.

11

International University, VNU-HCMC

The storage
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

http://www.micronautomata.com/big_data 12

6
International University, VNU-HCMC

Moore’s Law reversed


Time to process all data doubles every 18 months!
•Does your attention span double every 18 months?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

o No, so we need smarter data management and


processing techniques.

13

International University, VNU-HCMC

What is database?
• In computing, a database is an organized collection of
data stored and accessed electronically. Small
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

databases can be stored on a file system, while large


databases are hosted on computer clusters or cloud
storage.
• Ex:

https://en.wikipedia.org/
14

7
International University, VNU-HCMC

What is a database system?


From Oxford Dictionary:
• Database: an organized body of related information
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Database system, DataBase Management System


(DBMS): a software system that facilitates the
creation, maintenance, and use of an electronic
database

15

International University, VNU-HCMC

What do you want from a DBMS?


•Keep data around (persistent)
•Answer questions (queries) about data
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

•Update data

16

8
International University, VNU-HCMC

What do you want from a DBMS?


•Example: a traditional banking application
• Data: Each account belongs to a branch, has a number,
an owner, a balance, …; each branch has a location, a
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

manager, …
• Persistency: Balance can’t disappear after a power
outage
• Query: What’s the balance in Homer Simpson’s
account?
• What’s the difference in average balance between
Springfield and Capitol City accounts?
• Modification: Homer withdraws $100; charge accounts
with lower than $500 balance a $5 fee.

17

International University, VNU-HCMC

Sounds simple!
1001#Springfield#Mr. Morgan... ...
00987-00654#Ned Flanders#2500.00
00123-00456#Homer Simpson#400.00
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

00142-00857#Montgomery Burns#1000000000.00 ... ...

• Text files
• Accounts/branches separated by newlines
• Fields separated by #’s

18

9
International University, VNU-HCMC

Query by programming
1001#Springfield#Mr. Morgan... ...
00987-00654#Ned Flanders#2500.00
00123-00456#Homer Simpson#400.00
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

00142-00857#Montgomery Burns#1000000000.00 ... ...

• What’s the balance in Homer Simpson’s account?


• A simple script
• Scan through the accounts file
• Look for the line containing “Homer Simpson”
• Print out the balance

19

International University, VNU-HCMC

Query processing tricks


•Tens of thousands of accounts are not Homer’s
o Cluster accounts by owner’s initial: those owned by
“A...” go into file A; those owned by “B...” go into file B;
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

etc. → decide which file to search using the initial


o Keep accounts sorted by owner name → binary search?
o Hash accounts using owner name → compute file offset
directly
o Index accounts by owner name: index entries have the
form (owner_name, file_offset) → search index to get
file offset
o And the list goes on…

•What happens when the query changes to: What’s the


balance in account 00142-00857?

20

10
International University, VNU-HCMC

Observations
• There are many techniques—not only in storage and
query processing but also in concurrency control,
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

recovery, etc.
• These techniques get used over and over again in
different applications
• Different techniques may work better in different
usage scenarios.

21

International University, VNU-HCMC

The birth of DBMS – 1


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

From Hans-J. Schek’s VLDB 2000 slides

22

11
International University, VNU-HCMC

The birth of DBMS – 2


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

From Hans-J. Schek’s VLDB 2000 slides

23

International University, VNU-HCMC

The birth of DBMS – 3


Assoc. Prof. Nguyen Thi Thuy Loan, PhD

From Hans-J. Schek’s VLDB 2000 slides

24

12
International University, VNU-HCMC

Early efforts
• “Factoring out” data management functionalities from
applications and standardizing these functionalities is an
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

important first step


• CODASYL standard (circa 1960’s)
• Bachman got a Turing award for this in 1973
• However, getting the abstraction right (the API between
applications and the DBMS) is still tricky.

25

International University, VNU-HCMC

CODASYL
• Query: Who has accounts with 0 balance managed by a branch
in Springfield?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Pseudo-code of a CODASYL application:


o Use the index on account(balance) to get accounts with 0
balance;
o For each account record:
o Get the branch ID of this account;
o Use the index on a branch(id) to get the branch record;
o If the branch record’s location field reads “Springfield”:
o Output the owner field of the account record.
• The programmer controls “navigation”: accounts → branches
o How about branches → accounts?
26

13
International University, VNU-HCMC

What’s wrong?
• The best navigation strategy & the best way of organizing the
data depend on data/workload characteristics
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

With the CODASYL approach


• To write correct code, programmers need to know how data is
organized physically (e.g., which indexes exist)
• To write efficient code, programmers also need to worry about
data/workload characteristics.
• Can’t cope with changes in data/workload characteristics.

27

International University, VNU-HCMC

The relational revolution (1970’s)


•A simple model: data is stored in relations (tables)
•A declarative query language: SQL
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

SELECT Account.owner
FROM Account, Branch
WHERE Account.balance = 0
AND Branch.location = 'Springfield'
AND Account.branch_id = Branch.branch_id;
• The programmer specifies what answers a query should
return but not how the query is executed.
• DBMS picks the best execution strategy based on the
availability of indexes, data/workload characteristics, etc.
•Provides physical data independence

28

14
International University, VNU-HCMC

Physical data independence


• Applications should NOT worry about how data is
physically structured and stored
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

• Applications should work with a logical data model and


declarative query language
• Leave the implementation details and optimization to
DBMS
• The single most important reason behind the success of
DBMS today
o And a Turing Award for E. F. Codd in 1981

29

International University, VNU-HCMC

Standard DBMS features


•Persistent storage of data
• Logical data model; declarative queries and updates →
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

physical data independence


o The relational model is the dominating technology
today
•What else?

30

15
International University, VNU-HCMC

DBMS is multi-user
•Example
get account balance from a database;
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

if balance > amount of withdrawal, then


balance = balance - amount of withdrawal;
dispense cash;
store new balance into a database;
•Homer at ATM1 withdraws $100
•Marge at ATM2 withdraws $50
•Initial balance = $400, final balance = ?
o Should be $250 no matter who goes first

31

International University, VNU-HCMC

Final balance = $300


Homer withdraws $100: Marge withdraws $50:
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

read balance; $400


read balance; $400
if balance > amount then
balance = balance - amount; $350
write balance; $350
if balance > amount then
balance = balance - amount; $300
write balance; $300

32

16
International University, VNU-HCMC

Final balance = $350


Homer withdraws $100: Marge withdraws $50:
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

read balance; $400 read balance; $400


if balance > amount then $300
balance = balance - amount; if balance > amount then
write balance; $300 balance = balance - amount;
write balance; $350

33

International University, VNU-HCMC

Concurrency control in DBMS


•Similar to concurrent programming problems?
o But data, not main-memory variables
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

•Similar to file system concurrent access?


o Lock the whole table before access
§ The approach taken by MySQL in the old days
§ is still used by SQLite (as of Version 3)
§ but we want to control it at a much finer granularity.
Otherwise, one withdrawal would lock up all
accounts!

34

17
International University, VNU-HCMC

Recovery in DBMS
•Example: balance transfer
decrement the balance of account X by $100;
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

increment the balance of account Y by $100;


•Scenario 1: Power goes out after the first operation
• Scenario 2: DBMS buffers and updates data in memory (for
efficiency); before they are written back to disk, power goes
out
How can DBMS deal with these failures?

35

International University, VNU-HCMC

Standard DBMS features: summary


•Persistent storage of data
•Logical data model; declarative queries and updates
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

→ physical data independence


•Multi-user concurrent access
•Safety from system failures
•Performance, performance, performance
o Massive amounts of data (terabytes~petabytes)
o High throughput (thousands~millions transactions/hour)
o High availability (≥ 99.999% uptime)

36

18
International University, VNU-HCMC

Standard DBMS architecture


•Much of the OS may be bypassed for performance
and safety.
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

•Throughout the semester, we will fill in many details


of the DBMS box.

37

International University, VNU-HCMC

AYBABTU?
“Us” = relational databases
•Most data are not in them!
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

o Personal data, web, scientific data, system data,


• Text and semi-structured data management
o XML, JSON, …
• “NoSQL” and “NewSQL” movement
o MongoDB, Cassandra, BigTable, HBase, Spanner, HANA,
Spark…
• This course will look beyond relational databases

Use of AYBABTU inspired by Garcia-Molina


Image: http://upload.wikimedia.org/wikipedia/en/0/03/Aybabtu.png 38

19
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD

Thank you for your attention!

39

20

You might also like