Course Intro &
Relational Model
Lecture #01 Database Systems Andy Pavlo
15-445/15-645 Computer Science Dept.
Fall 2017 Carnegie Mellon Univ.
Wait List
Overview
Course Logistics
Relational Model
Homework #1
CMU 15-445/645 (Fall 2017)
2
WA I T L I ST
There are currently 130 people on
the waiting list.
Max capacity is 90.
We will enroll people from the
waiting list in the order that you
complete Homework #1.
CMU 15-445/645 (Fall 2017)
3
C O U R S E OV E R V I E W
This course is on the design and
implementation of disk-oriented
database management systems.
This is not a course on how to use
a database to build applications
or how to administer a database.
→ See CMU 95-703 (Heinz College)
CMU 15-445/645 (Fall 2017)
4
COURSE OUTLINE
Relational Databases
Storage
Execution
Concurrency Control
Recovery
Distributed Databases
Potpourri
CMU 15-445/645 (Fall 2017)
5
C O U R S E LO G I ST I C S
Course Policies + Schedule:
→ Refer to course web page.
Academic Honesty:
→ Refer to CMU policy page.
→
→
All discussion + announcements
will be on Canvas.
CMU 15-445/645 (Fall 2017)
6
TEXTBOOK
Database System Concepts
6th Edition
Silberschatz, Korth, & Sudarshan
We will also provide lecture notes
that covers topics not found in
textbook.
CMU 15-445/645 (Fall 2017)
7
COURSE RUBRIC
Homeworks (15%)
Projects (45%)
Midterm Exam (20%)
Final Exam (20%)
Extra Credit (+10%)
CMU 15-445/645 (Fall 2017)
8
H O M E WO R K S
Six homework assignments
throughout the semester.
First homework is a SQL
assignment. The rest will be
pencil-and-paper assignments.
All homeworks should be done
individually.
CMU 15-445/645 (Fall 2017)
9
P R OJ E C T S
Four programming projects based
on the SQLite DBMS.
→ You will build your own storage
manager from scratch.
We will not teach you how to
write/debug C++11 code.
See 2015 video from SQLite
creator for more info.
CMU 15-445/645 (Fall 2017)
10
L AT E P O L I CY
You are allowed 4 slip days for
either homeworks or projects.
points for every 24hrs it is late.
Mark on your submission (1) how
many days you are late and (2)
how many late days you have left.
CMU 15-445/645 (Fall 2017)
11
P L AG I A R I S M WA R N I N G
The homeworks and projects
must be your own work.
You may not copy source code
from other groups or the web.
Plagiarism will not be tolerated.
See CMU's Policy on Academic
Integrity for additional information.
CMU 15-445/645 (Fall 2017)
12
EXAMS
Mid-term Exam (October 18)
Final Exam (End of Semester)
Closed book.
One sheet of handwritten notes.
CMU 15-445/645 (Fall 2017)
13
EXTRA CREDIT
Pick a DBMS and get standard
database benchmarks to run on it.
→ Can be either OLTP or OLAP system.
→ We already have an open-source
testing framework that you can use.
→ We will give you EC2 credits.
→ Groups of at most three people.
We will provide more information
later in the semester.
CMU 15-445/645 (Fall 2017)
14
CMU 15-445/645 (Fall 2017)
15
Databases
DATA B A S E
Organized collection of inter-
related data that models some
aspect of the real-world.
Databases are core the
component of most computer
applications.
CMU 15-445/645 (Fall 2017)
17
DATA B A S E E X A M P L E
Create a database that models a
digital music store.
Things we need store:
→ Information about Artists
→ What Albums those Artists released
→ The Tracks on those Albums
CMU 15-445/645 (Fall 2017)
18
E N T I T Y- R E L AT I O N S H I P name
D I AG R A M year country
Artist
Artists have names, year that they n
started, and country of origin. has
n
Albums have names, release year. name
Album
Tracks have a name and number. year
1
An Album has one or more Artists. has
An Album has multiple Tracks. n
A Track can appear only on one Track
Album.
name number
CMU 15-445/645 (Fall 2017)
19
F L AT F I L E ST R AW M A N
Artist(name, year, country)
"Wu Tang Clan",1992,"USA"
Store the data in comma-
"Notorious BIG",1992,"USA"
separated value (CSV) files.
→ Use a separate file per entity. "Ice Cube",1989,"USA"
→ The application has to parse the files
each time they want to read/update
records. Album(name, artist, year)
"Enter the Wu Tang","Wu Tang Clan",1993
"St.Ides Mix Tape","Wu Tang Clan",1994
CMU 15-445/645 (Fall 2017)
20
F L AT F I L E ST R AW M A N
Artist(name, year, country)
"Wu Tang Clan",1992,"USA"
Store the data in comma-
"Notorious BIG",1992,"USA"
separated value (CSV) files.
→ Use a separate file per entity. "Ice Cube",1989,"USA"
→ The application has to parse the files
each time they want to read/update
records.
for line in file:
Example: Get the year that Ice record = parse(line)
Cube went solo. if “Ice Cube” == record[0]:
print int(record[1])
CMU 15-445/645 (Fall 2017)
20
F L AT F I L E S :
DATA I N T E G R I T Y
How do we ensure that the artist
is the same for each album entry?
What if somebody overwrites the
album year with an invalid string?
How do we store that there are
multiple artists on an album?
CMU 15-445/645 (Fall 2017)
21
F L AT F I L E S :
I M P L E M E N TAT I O N
How do you find a particular
record?
What if we now want to create a
new application that uses the
same database?
What if two threads try to write to
the same file at the same time?
CMU 15-445/645 (Fall 2017)
22
F L AT F I L E S :
DURABILITY
What if the machine crashes while
What if we want to replicate the
database on multiple machines
for high availability?
CMU 15-445/645 (Fall 2017)
23
DATA B A S E
M A N AG E M E N T
S YST E M
A DBMS is software that allows
applications to store and analyze
information in a database.
A general-purpose DBMS is
designed to allow the definition,
creation, querying, update, and
administration of databases.
CMU 15-445/645 (Fall 2017)
25
DATA B A S E
M A N AG E M E N T
S YST E M
DBMSs are used in almost every
application, web site, software
system that you can think of.
Think about the other types of
software that CMU SCS does not
CMU 15-445/645 (Fall 2017)
26
D B M S T Y P E S : TA R G E T
WO R K LOA D S
OLTP
On-line Transaction Processing
→ Fast operations that only read/update
a small amount of data each time.
CMU 15-445/645 (Fall 2017)
27
D B M S T Y P E S : TA R G E T
WO R K LOA D S
OLAP
On-line Analytical Processing
→ Complex queries that read a lot of
data to compute aggregates.
CMU 15-445/645 (Fall 2017)
27
D B M S T Y P E S : TA R G E T
WO R K LOA D S
HTAP
Hybrid Transaction + Analytical
Processing
→ OLTP + OLAP together on the same
database instance
CMU 15-445/645 (Fall 2017)
27
D B M S T Y P E S : DATA
MODEL
Relational
Key/Value
Graph
Document
Column-family
Array / Matrix
Hierarchical
Network
CMU 15-445/645 (Fall 2017)
28
D B M S T Y P E S : DATA
MODEL
Relational
Key/Value
Graph
Document
Column-family
Array / Matrix
Hierarchical
Network
CMU 15-445/645 (Fall 2017)
28
D B M S T Y P E S : DATA
MODEL
Relational
Key/Value
Graph
Document
Column-family
Array / Matrix
Hierarchical
Network
CMU 15-445/645 (Fall 2017)
28
D B M S T Y P E S : DATA
MODEL
Relational
Key/Value
Graph
Document
Column-family
Array / Matrix
Hierarchical
Network
CMU 15-445/645 (Fall 2017)
28
R E L AT I O N A L M O D E L
Artist(name, year, country)
A relation is unordered set that
name year country
contain the relationship of
Wu Tang Clan 1992 USA
attributes that represent entities.
Notorious B.I.G. 1992 USA
Ice Cube 1989 USA
A tuple is a sequence of attribute
values in the relation.
Integrity Constraints:
→ Primary Keys
→ Foreign Keys
CMU 15-445/645 (Fall 2017)
29
R E L AT I O N A L M O D E L :
P R I M A RY K E YS
Artist(name, year, country)
primary key uniquely
name year country
identifies a single tuple.
Wu Tang Clan 1992 USA
Notorious B.I.G. 1992 USA
Some DBMSs support auto-
Ice Cube 1989 USA
generation of unique integer
primary keys:
→ SEQUENCE (SQL:2003)
→ AUTO_INCREMENT (MySQL)
CMU 15-445/645 (Fall 2017)
30
R E L AT I O N A L M O D E L :
P R I M A RY K E YS
Artist(id, name, year, country)
primary key uniquely
id name year country
identifies a single tuple.
123 Wu Tang Clan 1992 USA
456 Notorious B.I.G. 1992 USA
Some DBMSs support auto-
789 Ice Cube 1989 USA
generation of unique integer
primary keys:
→ SEQUENCE (SQL:2003)
→ AUTO_INCREMENT (MySQL)
CMU 15-445/645 (Fall 2017)
30
R E L AT I O N A L M O D E L :
F O R E I G N K E YS
Artist(id, name, year, country)
A foreign key specifies that an
id name year country
attribute from one relation has to
123 Wu Tang Clan 1992 USA
map to a tuple in another relation.
456 Notorious B.I.G. 1992 USA
789 Ice Cube 1989 USA
Album(id, name, artists, year)
id name artists year
11 Enter the Wu Tang 123 1993
22 St.Ides Mix Tape ??? 1994
CMU 15-445/645 (Fall 2017)
31
R E L AT I O N A L M O D E L :
F O R E I G N K E YS
Artist(id, name, year, country)
A foreign key specifies that an
id name year country
attribute from one relation has to
123 Wu Tang Clan 1992 USA
map to a tuple in another relation.
456 Notorious B.I.G. 1992 USA
789 Ice Cube 1989 USA
ArtistAlbum(artist_id, album_id)
artist_id album_id
123 11 Album(id, name, artists, year)
123 22 id name artists year
789 22 11 Enter the Wu Tang 123 1993
22 St.Ides Mix Tape ??? 1994
CMU 15-445/645 (Fall 2017)
31
R E L AT I O N A L M O D E L :
F O R E I G N K E YS
Artist(id, name, year, country)
A foreign key specifies that an
id name year country
attribute from one relation has to
123 Wu Tang Clan 1992 USA
map to a tuple in another relation.
456 Notorious B.I.G. 1992 USA
789 Ice Cube 1989 USA
ArtistAlbum(artist_id, album_id)
artist_id album_id
123 11 Album(id, name, year)
123 22 id name year
789 22 11 Enter the Wu Tang 1993
22 St.Ides Mix Tape 1994
CMU 15-445/645 (Fall 2017)
31
R E L AT I O N A L M O D E L :
QUERIES
The relational model is for line in file:
independent of any query record = parse(line)
language implementation. if “Ice Cube” == record[0]:
print int(record[1])
SQL is the de facto standard.
SELECT year FROM artists
Next Class: We will define an WHERE name = "Ice Cube“;
algebra + calculus for querying
relations.
CMU 15-445/645 (Fall 2017)
32
C O N C LU S I O N
Databases are ubiquitous.
Relational databases are the most
common data model because it is
the most flexible.
CMU 15-445/645 (Fall 2017)
33
H O M E WO R K # 1
Write SQL queries to perform
basic data analysis on court data.
I will not be teaching basic SQL.
Read the textbook.
Due: Wed Sept 13th @ 11:59pm
http://15445.courses.cs.cmu.edu/fall2017/homework1
CMU 15-445/645 (Fall 2017)
34