Database Design
What is a database?
Objectives of a database
● Fast reads
● Fast writes
● Persistent data
Quick comment about disks
Generally, we are using hard drives
Results in slow random reads
We should always aim for sequential operations
Much cheaper than SSDs but slower
Naive Database Implementation
Literally just a list, O(n) reads and updates
Id Name Favorite Color
1 Alan Turing Red
2 Mark Zuckerberg LightBlue
3 Tech Lead Doesn’t see color
4 Mitch McConnell White
Naive Database Implementation
Literally just a list, O(n) reads and writes
Id Name Favorite Color
1 Alan Turing Red
2 Mark Zuckerberg LightBlue
3 Tech Lead Doesn’t see color
4 Mitch McConnell White
Naive Database Implementation
Literally just a list, O(n) reads and writes
Id Name Favorite Color
1 Alan Turing Red
2 Mark Zuckerberg LightBlue
3 Tech Lead Pink
4 Mitch McConnell White
Slightly Better Database Implementation
Append only log on disk to take advantage of sequential logs
Id Name Favorite Color
1 Alan Turing Red
2 Mark Zuckerberg LightBlue
3 Tech Lead Doesn’t see color
4 Mitch McConnell White
Slightly Better Database Implementation
Append only log on disk to take advantage of sequential logs
Id Name Favorite Color
1 Alan Turing Red
2 Mark Zuckerberg LightBlue
3 Tech Lead Doesn’t see color
4 Mitch McConnell White
3 Tech Lead Pink
Better database implementation
Hashmap, O(1) reads and writes
However, this does not scale because the second there is too much data we are in
trouble, hashmap has to go on disk which becomes slow
Indexes - making read times much faster
Keep extra data on each write to improve database read times
Pro: Faster reads
Con: Slower writes (only use indexes if you
need them, do not declare an index for
every field)
Types of index implementations
● Hash Indexes
● LSM trees + SSTables
● B Trees
Hash Index
Keep an in memory hash table of the key mapped to the memory location of the
corresponding data, occasionally write to disk for persistence
Pros: Easy to implement and very Key Offset on Disk
fast (disks are slow, RAM is fast)
jordan 0x04444511
Cons: All of the keys must fit in kobe 0x01112365
memory, bad for range queries
lebron 0x06128989
SSTables and LSM trees
Write first goes to an in-memory balanced binary search tree (memtable),
eventually written to disk
4: brees
When tree becomes too large, write the 2: manning 5: montana
contents of it (sorted by key name) to an 1: brady 3: rodgers 6: favre
SSTable file
To increase persistence, keep log on disk of memtable writes to restore
it in the event of a crash.
SSTables and LSM trees
Write first goes to an in-memory balanced binary search tree (memtable),
eventually written to disk
4: brees
When tree becomes too large, write the 2: manning 5: montana
contents of it (sorted by key name) to an 1: brady 3: rodgers 6: favre
SSTable file
7: young
To increase persistence, keep log on disk of memtable writes to restore
it in the event of a crash.
SSTables and LSM trees continued
Recall: Tree gets written to SSTable files, where the keys are sorted
SSTable file 1 SSTable file 2
0 Westbrook 3 Iverson
7 Anthony 5 Garnett
9 Wade 13 Harden
23 Jordan 23 Lebron
33 Jabbar 33 Pippen
34 Olajuwon 34 Giannis
SSTables and LSM trees continued
Recall: Since we are only using append only logs, there will be duplicate keys
SSTable file 1 SSTable file 2
0 Westbrook 3 Iverson
7 Anthony 5 Garnett
9 Wade 13 Harden
23 Jordan 23 Lebron
33 Jabbar 33 Pippen
34 Olajuwon 34 Giannis
SSTables and LSM trees continued
Recall: Since we are only using append only logs, there will be duplicate keys
SSTable file 1 SSTable file 2 Compacted SSTable
0 Westbrook 3 Iverson 0 Westbrook
3 Iverson
7 Anthony 5 Garnett
5 Garnett
9 Wade 13 Harden
7 Anthony
23 Jordan 23 Lebron 9 Wade
33 Jabbar 33 Pippen 13 Harden
23 Lebron
34 Olajuwon 34 Giannis
33 Pippen
34 Giannis
SSTables and LSM trees continued
Can be merged in O(n) time, in case of duplicate key take the more recent value
SSTable file 1 SSTable file 2 Compacted SSTable
0 Westbrook 3 Iverson 0 Westbrook
3 Iverson
7 Anthony 5 Garnett
5 Garnett
9 Wade 13 Harden
7 Anthony
23 Jordan 23 Lebron 9 Wade
33 Jabbar 33 Pippen 13 Harden
23 Lebron
34 Olajuwon 34 Giannis
33 Pippen
34 Giannis
SSTables and LSM trees continued
Let’s discuss how to quickly read a value by its index!
Second Third
First
MemTable SSTable n SSTable n-1
4: brees 28 Peterson 28 Taylor
2: manning 5: montana 44 Bradshaw 80 Rice
81 Moss 85 Gates
1: brady 3: rodgers 6: favre
Keep going through SSTables until you either find the key or run out!
SSTables and LSM trees continued
For each SSTable, have a sparse in-memory hashmap of keys with their value in
memory. Since each table is sorted, we can quickly binary search the SSTable to
find the value of a key!
In memory hash table SSTable
Alice 0x00000000 Alice 22
Bob 0x00000080 Andy 61
Charlie 0x000000f0 Anna 40
Bob 80
Brian 15
Charlie 35
SSTables and LSM trees summarized
Pros:
● High write throughput due to writes going to in memory buffer
● Good for range queries due to internal sorting of data in the index
Cons:
● Slow reads, especially if the key we are looking for is old or does not exist
● Merging process of log segments can take up background resources
B-Trees
A REF E REF P REF Z
A REF C REF E P REF T REF Z
Thomas 33 Vlad 24 Zeke 50
B-Trees continued
A REF E REF P REF Z
A REF C REF E P REF T REF Z
Thomas 33 Vlad 24 Zeke 50
B-Trees continued
A REF E REF P REF Z
A REF C REF E P REF T REF Z
Thomas 33 Vlad 24 Zeke 50
B-Trees continued
A REF E REF P REF Z
A REF C REF E P REF T REF Z
Thomas 33 Vlad 24 Zeke 50
B-Trees continued
To read: traverse through the tree and find the value
To update: traverse through the tree and change the value
To write: traverse through the tree, if there is extra space in the block where the
value belongs, add the key, otherwise you have to split the location block in two,
add the key, and then update the parent block to reflect this action. Can be made
durable in the event of crashes using a write ahead log.
B-trees summarized
Pros:
● Relatively fast reads, most B-trees can be stored in only 3 or 4 levels
● Good for range queries as data is kept internally sorted
Cons:
● Relatively slow writes, have to write to disk as opposed to memory
Conclusion
In a system, it is important to know what type of database engine/design you are
using so that you can optimize for writes or reads.
Hash indexes: fast but only useful on small datasets
SSTables and LSM-Trees: better for writing, slower for reading
B-Trees: better for reading, slower for writing