0% found this document useful (0 votes)

29 views29 pages

1 - Database Design

The document discusses various database designs, starting from naive implementations to more advanced structures like hashmaps, SSTables, LSM trees, and B-Trees. It highlights the trade-offs between read and write speeds, emphasizing the need for efficient data handling and indexing methods. The conclusion stresses the importance of choosing the right database engine based on specific use cases for optimal performance.

Uploaded by

amit dutta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views29 pages

1 - Database Design

Uploaded by

amit dutta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Database Design

What is a database?
Objectives of a database
● Fast reads
● Fast writes
● Persistent data
Quick comment about disks
Generally, we are using hard drives

Results in slow random reads

We should always aim for sequential operations

Much cheaper than SSDs but slower

Naive Database Implementation
Literally just a list, O(n) reads and updates

Id Name Favorite Color

1 Alan Turing Red

2 Mark Zuckerberg LightBlue

3 Tech Lead Doesn’t see color

4 Mitch McConnell White

Naive Database Implementation
Literally just a list, O(n) reads and writes

Id Name Favorite Color

1 Alan Turing Red

2 Mark Zuckerberg LightBlue

3 Tech Lead Doesn’t see color

4 Mitch McConnell White

Naive Database Implementation
Literally just a list, O(n) reads and writes

Id Name Favorite Color

1 Alan Turing Red

2 Mark Zuckerberg LightBlue

3 Tech Lead Pink

4 Mitch McConnell White

Slightly Better Database Implementation
Append only log on disk to take advantage of sequential logs

Id Name Favorite Color

1 Alan Turing Red

2 Mark Zuckerberg LightBlue

3 Tech Lead Doesn’t see color

4 Mitch McConnell White

Slightly Better Database Implementation
Append only log on disk to take advantage of sequential logs

Id Name Favorite Color

1 Alan Turing Red

2 Mark Zuckerberg LightBlue

3 Tech Lead Doesn’t see color

4 Mitch McConnell White

3 Tech Lead Pink

Better database implementation
Hashmap, O(1) reads and writes

However, this does not scale because the second there is too much data we are in
trouble, hashmap has to go on disk which becomes slow
Indexes - making read times much faster
Keep extra data on each write to improve database read times

Pro: Faster reads

Con: Slower writes (only use indexes if you

need them, do not declare an index for
every field)
Types of index implementations
● Hash Indexes
● LSM trees + SSTables
● B Trees
Hash Index
Keep an in memory hash table of the key mapped to the memory location of the
corresponding data, occasionally write to disk for persistence

Pros: Easy to implement and very Key Offset on Disk

fast (disks are slow, RAM is fast)
jordan 0x04444511
Cons: All of the keys must fit in kobe 0x01112365
memory, bad for range queries
lebron 0x06128989
SSTables and LSM trees
Write first goes to an in-memory balanced binary search tree (memtable),
eventually written to disk
4: brees

When tree becomes too large, write the 2: manning 5: montana

contents of it (sorted by key name) to an 1: brady 3: rodgers 6: favre

SSTable file

To increase persistence, keep log on disk of memtable writes to restore

it in the event of a crash.
SSTables and LSM trees
Write first goes to an in-memory balanced binary search tree (memtable),
eventually written to disk
4: brees

When tree becomes too large, write the 2: manning 5: montana

contents of it (sorted by key name) to an 1: brady 3: rodgers 6: favre

SSTable file
7: young

To increase persistence, keep log on disk of memtable writes to restore

it in the event of a crash.
SSTables and LSM trees continued
Recall: Tree gets written to SSTable files, where the keys are sorted

SSTable file 1 SSTable file 2

0 Westbrook 3 Iverson

7 Anthony 5 Garnett

9 Wade 13 Harden

23 Jordan 23 Lebron

33 Jabbar 33 Pippen

34 Olajuwon 34 Giannis
SSTables and LSM trees continued
Recall: Since we are only using append only logs, there will be duplicate keys

SSTable file 1 SSTable file 2

0 Westbrook 3 Iverson

7 Anthony 5 Garnett

9 Wade 13 Harden

23 Jordan 23 Lebron

33 Jabbar 33 Pippen

34 Olajuwon 34 Giannis
SSTables and LSM trees continued
Recall: Since we are only using append only logs, there will be duplicate keys

SSTable file 1 SSTable file 2 Compacted SSTable

0 Westbrook 3 Iverson 0 Westbrook

3 Iverson
7 Anthony 5 Garnett
5 Garnett
9 Wade 13 Harden
7 Anthony

23 Jordan 23 Lebron 9 Wade

33 Jabbar 33 Pippen 13 Harden

23 Lebron
34 Olajuwon 34 Giannis
33 Pippen

34 Giannis
SSTables and LSM trees continued
Can be merged in O(n) time, in case of duplicate key take the more recent value

SSTable file 1 SSTable file 2 Compacted SSTable

0 Westbrook 3 Iverson 0 Westbrook

3 Iverson
7 Anthony 5 Garnett
5 Garnett
9 Wade 13 Harden
7 Anthony

23 Jordan 23 Lebron 9 Wade

33 Jabbar 33 Pippen 13 Harden

23 Lebron
34 Olajuwon 34 Giannis
33 Pippen

34 Giannis
SSTables and LSM trees continued
Let’s discuss how to quickly read a value by its index!

Second Third
First

MemTable SSTable n SSTable n-1

4: brees 28 Peterson 28 Taylor

2: manning 5: montana 44 Bradshaw 80 Rice

81 Moss 85 Gates
1: brady 3: rodgers 6: favre

Keep going through SSTables until you either find the key or run out!
SSTables and LSM trees continued
For each SSTable, have a sparse in-memory hashmap of keys with their value in
memory. Since each table is sorted, we can quickly binary search the SSTable to
find the value of a key!

In memory hash table SSTable

Alice 0x00000000 Alice 22

Bob 0x00000080 Andy 61

Charlie 0x000000f0 Anna 40

Bob 80

Brian 15

Charlie 35
SSTables and LSM trees summarized
Pros:

● High write throughput due to writes going to in memory buffer

● Good for range queries due to internal sorting of data in the index

Cons:

● Slow reads, especially if the key we are looking for is old or does not exist
● Merging process of log segments can take up background resources
B-Trees

A REF E REF P REF Z

A REF C REF E P REF T REF Z

Thomas 33 Vlad 24 Zeke 50

B-Trees continued

A REF E REF P REF Z

A REF C REF E P REF T REF Z

Thomas 33 Vlad 24 Zeke 50

B-Trees continued

A REF E REF P REF Z

A REF C REF E P REF T REF Z

Thomas 33 Vlad 24 Zeke 50

B-Trees continued

A REF E REF P REF Z

A REF C REF E P REF T REF Z

Thomas 33 Vlad 24 Zeke 50

B-Trees continued
To read: traverse through the tree and find the value

To update: traverse through the tree and change the value

To write: traverse through the tree, if there is extra space in the block where the
value belongs, add the key, otherwise you have to split the location block in two,
add the key, and then update the parent block to reflect this action. Can be made
durable in the event of crashes using a write ahead log.
B-trees summarized
Pros:

● Relatively fast reads, most B-trees can be stored in only 3 or 4 levels

● Good for range queries as data is kept internally sorted

Cons:

● Relatively slow writes, have to write to disk as opposed to memory

Conclusion
In a system, it is important to know what type of database engine/design you are
using so that you can optimize for writes or reads.

Hash indexes: fast but only useful on small datasets

SSTables and LSM-Trees: better for writing, slower for reading

B-Trees: better for reading, slower for writing

Jawaban MTCNA
No ratings yet
Jawaban MTCNA
13 pages
Car Mechanic Simulator 2021 Car Modding Guide
100% (3)
Car Mechanic Simulator 2021 Car Modding Guide
50 pages
Designing Data Intensive Applications
25% (4)
Designing Data Intensive Applications
61 pages
Designing Data-Intensive Apps - CH 3
No ratings yet
Designing Data-Intensive Apps - CH 3
7 pages
Build Your Own Database From Scratch 1nbsped 9798391723394
100% (1)
Build Your Own Database From Scratch 1nbsped 9798391723394
120 pages
Database Storage and Indexing
No ratings yet
Database Storage and Indexing
14 pages
Indexing
No ratings yet
Indexing
4 pages
51 - Choosing A Database
No ratings yet
51 - Choosing A Database
17 pages
E Commerce Term Paper Topics
100% (1)
E Commerce Term Paper Topics
8 pages
SQL Indexes
No ratings yet
SQL Indexes
20 pages
cs186 Notes
No ratings yet
cs186 Notes
31 pages
Fibonacci Search: Observation On Unimodal Functions
No ratings yet
Fibonacci Search: Observation On Unimodal Functions
5 pages
4 - Key-Value Storage
No ratings yet
4 - Key-Value Storage
109 pages
Indexing: Part V: CPS 216 Advanced Database Systems
No ratings yet
Indexing: Part V: CPS 216 Advanced Database Systems
6 pages
DBMS Internals: How Does It All Work?
No ratings yet
DBMS Internals: How Does It All Work?
94 pages
File Structure and Indexing
No ratings yet
File Structure and Indexing
7 pages
Week7 Slides
No ratings yet
Week7 Slides
70 pages
Unit 5 DBMS
No ratings yet
Unit 5 DBMS
38 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Unit 4
No ratings yet
Unit 4
18 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
File Organization
No ratings yet
File Organization
47 pages
Tuning SQL Queries - Oracle
100% (1)
Tuning SQL Queries - Oracle
27 pages
05 Storage2
No ratings yet
05 Storage2
4 pages
A Detailed Guide On Database Indexes 11
No ratings yet
A Detailed Guide On Database Indexes 11
14 pages
DBMS File
No ratings yet
DBMS File
22 pages
1 - Database Design
No ratings yet
1 - Database Design
29 pages
Unit 1
No ratings yet
Unit 1
33 pages
The Impact of Cloud Computing On Organisational Ag
No ratings yet
The Impact of Cloud Computing On Organisational Ag
18 pages
KKS Power Plant Identification System
No ratings yet
KKS Power Plant Identification System
3 pages
DBMS File & Index Organization
No ratings yet
DBMS File & Index Organization
10 pages
Build Your Own Database From Scratch-2023-英文版
No ratings yet
Build Your Own Database From Scratch-2023-英文版
120 pages
VIDWAN
No ratings yet
VIDWAN
4 pages
Systems Design & Database Insights
No ratings yet
Systems Design & Database Insights
32 pages
04-Storage2 2
No ratings yet
04-Storage2 2
4 pages
Blog Algomaster Io P A Detailed Guide On Database Indexes
No ratings yet
Blog Algomaster Io P A Detailed Guide On Database Indexes
8 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Unit-5 DBMS
No ratings yet
Unit-5 DBMS
28 pages
Examination Calendar All Ug Course
No ratings yet
Examination Calendar All Ug Course
4 pages
DBMS Case Study 19 1
No ratings yet
DBMS Case Study 19 1
12 pages
Week 7
No ratings yet
Week 7
16 pages
Cheat Sheet v4
No ratings yet
Cheat Sheet v4
3 pages
Placement
No ratings yet
Placement
21 pages
Designing Data Intensive Applications: Part 1: Storage and Retrieval
No ratings yet
Designing Data Intensive Applications: Part 1: Storage and Retrieval
85 pages
StartUp Engineering
100% (2)
StartUp Engineering
218 pages
UGProject Official
No ratings yet
UGProject Official
37 pages
10 Data Structures That Make Databases Fast and Scalable
No ratings yet
10 Data Structures That Make Databases Fast and Scalable
12 pages
Da 2 Arunkumar
No ratings yet
Da 2 Arunkumar
5 pages
DPS5020 Operating Manual
No ratings yet
DPS5020 Operating Manual
9 pages
MAD 1 - Week 7 Parampreet Singh
No ratings yet
MAD 1 - Week 7 Parampreet Singh
11 pages
Intel® Core™2 Duo Processor E7500
No ratings yet
Intel® Core™2 Duo Processor E7500
4 pages
Databse (Up) Xi RPL
No ratings yet
Databse (Up) Xi RPL
2 pages
Advanced Indexing Techniques: Bibliographical Notes
No ratings yet
Advanced Indexing Techniques: Bibliographical Notes
4 pages
Lab Manual 01
No ratings yet
Lab Manual 01
8 pages
Software Requirements Specification
No ratings yet
Software Requirements Specification
7 pages
Single Axis Solar Tracking System Using Microcontroller (ATmega328) and Servo Motor
No ratings yet
Single Axis Solar Tracking System Using Microcontroller (ATmega328) and Servo Motor
4 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
Adaptive Radix Tree for Databases
No ratings yet
Adaptive Radix Tree for Databases
12 pages
Database Management Systems
No ratings yet
Database Management Systems
29 pages
Puri Rohan Pore Shriram LSM-based Storage Techniques Strengths and Trade-Offs PDF
No ratings yet
Puri Rohan Pore Shriram LSM-based Storage Techniques Strengths and Trade-Offs PDF
44 pages
The Chronicles of Riddick PC Game Download
No ratings yet
The Chronicles of Riddick PC Game Download
2 pages
Column Vs Row
No ratings yet
Column Vs Row
64 pages
Arinj: O# / Afo19 - U T8Jpf-I3-I2
No ratings yet
Arinj: O# / Afo19 - U T8Jpf-I3-I2
31 pages
Memsql
No ratings yet
Memsql
23 pages
c4 Index PDF
No ratings yet
c4 Index PDF
100 pages
02 Blocking - Addional
No ratings yet
02 Blocking - Addional
74 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
Column vs. Row Stores: A Deep Dive
No ratings yet
Column vs. Row Stores: A Deep Dive
64 pages
Duolingo App: Sebastián Valencia
No ratings yet
Duolingo App: Sebastián Valencia
11 pages
Harnessing The Reasoning Economy A Survey of Efficient Reasoning For Large Language Models
No ratings yet
Harnessing The Reasoning Economy A Survey of Efficient Reasoning For Large Language Models
24 pages
Data Warehouse - Bitmap Indexing
No ratings yet
Data Warehouse - Bitmap Indexing
24 pages
Mandarine Log
No ratings yet
Mandarine Log
37 pages
Encrypted Text Analysis
No ratings yet
Encrypted Text Analysis
77 pages
Weatherwax - Conte - Solution - Manual Capitulo 2 y 3
No ratings yet
Weatherwax - Conte - Solution - Manual Capitulo 2 y 3
59 pages
SIR2 Manual
No ratings yet
SIR2 Manual
32 pages
Jurnal Pricing Strategy - Khoerul & Fajar
No ratings yet
Jurnal Pricing Strategy - Khoerul & Fajar
20 pages
Fall 2011 - CS502 - 1
No ratings yet
Fall 2011 - CS502 - 1
3 pages
SRAN (2G+4G) With 3G Classical Sync Process
No ratings yet
SRAN (2G+4G) With 3G Classical Sync Process
14 pages
Native Otp Authentication With Netscaler
No ratings yet
Native Otp Authentication With Netscaler
14 pages
3842 PDF
No ratings yet
3842 PDF
8 pages
Smart Load Cell Digital Filtering
No ratings yet
Smart Load Cell Digital Filtering
6 pages
FAX236S Brochure 2
No ratings yet
FAX236S Brochure 2
1 page
Wi-Fi Test Suite Release Notes
No ratings yet
Wi-Fi Test Suite Release Notes
10 pages
Curl Multi Perform
No ratings yet
Curl Multi Perform
1 page
Database Indexing Techniques
No ratings yet
Database Indexing Techniques
2 pages

1 - Database Design

Uploaded by

1 - Database Design

Uploaded by

Database Design

Results in slow random reads

We should always aim for sequential operations

Much cheaper than SSDs but slower

Id Name Favorite Color

1 Alan Turing Red

2 Mark Zuckerberg LightBlue

3 Tech Lead Doesn’t see color

4 Mitch McConnell White

Id Name Favorite Color

1 Alan Turing Red

2 Mark Zuckerberg LightBlue

3 Tech Lead Doesn’t see color

4 Mitch McConnell White

Id Name Favorite Color

1 Alan Turing Red

2 Mark Zuckerberg LightBlue

3 Tech Lead Pink

4 Mitch McConnell White

Id Name Favorite Color

1 Alan Turing Red

2 Mark Zuckerberg LightBlue

3 Tech Lead Doesn’t see color

4 Mitch McConnell White

Id Name Favorite Color

1 Alan Turing Red

2 Mark Zuckerberg LightBlue

3 Tech Lead Doesn’t see color

4 Mitch McConnell White

3 Tech Lead Pink

Pro: Faster reads

Con: Slower writes (only use indexes if you

Pros: Easy to implement and very Key Offset on Disk

When tree becomes too large, write the 2: manning 5: montana

contents of it (sorted by key name) to an 1: brady 3: rodgers 6: favre

To increase persistence, keep log on disk of memtable writes to restore

When tree becomes too large, write the 2: manning 5: montana

contents of it (sorted by key name) to an 1: brady 3: rodgers 6: favre

To increase persistence, keep log on disk of memtable writes to restore

SSTable file 1 SSTable file 2

SSTable file 1 SSTable file 2

SSTable file 1 SSTable file 2 Compacted SSTable

23 Jordan 23 Lebron 9 Wade

33 Jabbar 33 Pippen 13 Harden

SSTable file 1 SSTable file 2 Compacted SSTable

23 Jordan 23 Lebron 9 Wade

33 Jabbar 33 Pippen 13 Harden

MemTable SSTable n SSTable n-1

4: brees 28 Peterson 28 Taylor

2: manning 5: montana 44 Bradshaw 80 Rice

In memory hash table SSTable

Bob 0x00000080 Andy 61

Charlie 0x000000f0 Anna 40

● High write throughput due to writes going to in memory buffer

A REF E REF P REF Z

A REF C REF E P REF T REF Z

Thomas 33 Vlad 24 Zeke 50

A REF E REF P REF Z

A REF C REF E P REF T REF Z

Thomas 33 Vlad 24 Zeke 50

A REF E REF P REF Z

A REF C REF E P REF T REF Z

Thomas 33 Vlad 24 Zeke 50

A REF E REF P REF Z

A REF C REF E P REF T REF Z

Thomas 33 Vlad 24 Zeke 50

To update: traverse through the tree and change the value

● Relatively fast reads, most B-trees can be stored in only 3 or 4 levels

● Relatively slow writes, have to write to disk as opposed to memory

Hash indexes: fast but only useful on small datasets

SSTables and LSM-Trees: better for writing, slower for reading

B-Trees: better for reading, slower for writing

You might also like