Maps and Dictionary
Data structures and Algorithms
Acknowledgement:
These slides are adapted from slides provided with Data Structures and Algorithms in C++
Goodrich, Tamassia and Mount (Wiley, 2004)
Outline
Maps (9.1)
Hash tables (9.2)
Dictionaries (9.3)
Skip Lists (9.4)
Phạm Bảo Sơn - DSA
Maps and Dictionaries 2
Maps & Dictionaries
Map ADT and Dictionary ADT:
model a searchable collection of key-value entries
main operations are searching, inserting, and deleting entries
Map: multiple entries Dictionary: multiple entries
with the same key with the same key
are not allowed are allowed
Map applications: Dictionary applications:
address book word-definition pairs
student-record database credit card authorizations
DNS mapping of host names (e.g.,
datastructures.net) to internet IP
addresses (e.g., 128.148.34.101)
Maps and Dictionaries 3
Maps
Maps and Dictionaries 4
The Map ADT
Map ADT methods:
get(k): if the map M has an entry with key k, return its
associated value; else, return null
put(k, v): insert entry (k, v) into the map M; if key k is
not already in M, then return null; else, return old value
associated with k
remove(k): if the map M has an entry with key k, remove
it from M and return its associated value; else, return null
size(), isEmpty()
keys(): return an iterator of the keys in M
values(): return an iterator of the values in M
Phạm Bảo Sơn - DSA
Maps and Dictionaries 5
Example
Operation Output Map
isEmpty() true Ø
put(5,A) null (5,A)
put(7,B) null (5,A),(7,B)
put(2,C) null (5,A),(7,B),(2,C)
put(8,D) null (5,A),(7,B),(2,C),(8,D)
put(2,E) C (5,A),(7,B),(2,E),(8,D)
get(7) B (5,A),(7,B),(2,E),(8,D)
get(4) null (5,A),(7,B),(2,E),(8,D)
get(2) E (5,A),(7,B),(2,E),(8,D)
size() 4 (5,A),(7,B),(2,E),(8,D)
remove(5) A (7,B),(2,E),(8,D)
remove(2) E (7,B),(8,D)
get(2) null (7,B),(8,D)
isEmpty() false (7,B),(8,D)
Phạm Bảo Sơn - DSA
Maps and Dictionaries 6
A Simple List-Based Map
We can implement a map using an unsorted list
We store the items of the map in a list S (based on a
doubly-linked list), in arbitrary order
header nodes/positions trailer
9 c 6 c 5 c 8 c
entries
Phạm Bảo Sơn - DSA
Maps and Dictionaries 7
The put(k,v) Algorithm
put(k,v): insert entry (k, v) into
Algorithm put(k,v): the map M; if key k is not
B = S.positions() already in M, then return null;
else, return old value
while B.hasNext() do associated with k
p = B.next() M is implemented in the list S
if p.element().key() = k then
t = p.element().value()
B.replace(p,(k,v))
return t {return the old value}
S.insertLast((k,v))
n=n+1 {increment variable storing number of entries}
return null {there was no previous entry with key equal to k}
Phạm Bảo Sơn - DSA
Maps and Dictionaries 8
The remove(k) Algorithm
Algorithm remove(k): remove(k): if the map M has an
entry with key k, remove it from M
B = S.positions() and return its associated value;
while B.hasNext() do else, return null
p = B.next() M is implemented in the list S
if p.element().key() = k then
t = p.element().value()
S.remove(p)
n=n–1 {decrement number of entries}
return t {return the removed value}
return null {there is no entry with key equal to k}
Phạm Bảo Sơn - DSA
Maps and Dictionaries 9
Performance of a List-Based Map
put takes O(n) time since we have to search the
sequence to check if the given key exists (O(1) if keys
are always unique)
get and remove take O(n) time since in the worst
case (the item is not found) we traverse the entire
sequence to look for an item with the given key
The unsorted list implementation is effective only for
maps of small size, or
maps in which puts are the most common operations with
unique keys, while searches and removals are rarely
performed (e.g., historical record of logins to a workstation)
Phạm Bảo Sơn - DSA
Maps and Dictionaries 10
How to do better than that?
Hash Tables
0 ∅
1 025-612-0001
2 981-101-0002
3 ∅
4 451-229-0004
Maps and Dictionaries 12
Hash table
Expected time of search, put: O(1)
Bucket array
Hash function
Maps and Dictionaries 13
Hash Functions and Hash Tables
A hash function h maps keys of a given type to
integers in a fixed interval [0, N − 1]
Example: h(x) = x mod N
is a hash function for integer keys
The integer h(x) is called the hash value of key x
A hash table for a given key type consists of
Hash function h
Array (called table) of size N
When implementing a map with a hash table, the goal is to
store item (k, o) at index i = h(x)
Maps and Dictionaries 14
Example
We design a hash table for 0 ∅
a map storing entries as 1 025-612-0001
(SSN, Name), where SSN 2 981-101-0002
3 ∅
(social security number) is a 4 451-229-0004
nine-digit positive integer
…
Our hash table uses an
array of size N = 10,000 and 9997 ∅
9998 200-751-9998
the hash function 9999 ∅
h(x) = last four digits of x
Maps and Dictionaries 15
Hash Functions
A hash function is The hash code map is
usually specified as the applied first, and the
compression map is
composition of two
applied next on the
functions: result, i.e.,
Hash code: h(x) = h2(h1(x))
h1: keys → integers The goal of the hash
function is to “disperse”
Compression function:
the keys in an apparently
h2: integers → [0, N − 1] random way
minimize collisions
Maps and Dictionaries 16
Hash Codes
Memory address: Component sum:
We reinterpret the memory We partition the bits of the
address of the key object as an key into components of
integer fixed length (e.g., 16 or 32
Good in general, except for bits) and we sum the
numeric and string keys (same key components (ignoring
should have the same hash code) overflows)
Integer cast: Suitable for numeric keys
of fixed length greater than
We reinterpret the bits of the key or equal to the number of
as an integer bits of the integer type
Suitable for keys of length less (e.g., long and double)
than or equal to the number of bits
of the integer type (e.g., byte,
short, int and float in C/C++)
Maps and Dictionaries 17
Hash Codes (cont.)
Polynomial accumulation: Polynomial p(z) can be
Order is important evaluated in O(n) time
We partition the bits of the key into a using Horner’s rule:
sequence of components of fixed The following
length (e.g., 8, 16 or 32 bits) polynomials are
a0 a1 … an−1 successively computed,
We evaluate the polynomial each from the previous
p(z) = an-1 + an-2z + an-3z2 + … + a0zn−1 one in O(1) time
at a fixed value z, ignoring overflows p0(z) = an−1
Especially suitable for strings (e.g., the pi (z) = an−i−1 + zpi−1(z)
choice z = 33 gives at most 6 collisions (i = 1, 2, …, n −1)
on a set of 50,000 English words) We have p(z) = pn−1(z)
Maps and Dictionaries 18
Compression Functions
Multiply, Add and Divide
Division:
(MAD):
h2 (y) = y mod N
h2 (y) = (ay + b) mod N
The size N of the hash
N is prime, a and b are
table is usually chosen
nonnegative integers
to be a prime
such that
- Reason: reduce collisions
a mod N ≠ 0
- How: number theory and
is beyond the scope of this Otherwise, every integer would
course map to the same value b
Maps and Dictionaries 19
Collision Handling
Collisions occur when different elements are mapped to
the same cell
Two ways to handle collisions
Separate chaining & Linear probing
Separate Chaining: 0 ∅
1 025-612-0001
let each cell in the table 2 ∅
point to a linked list of 3 ∅
entries that map there 4 451-229-0004 981-101-0004
Load factor: n/N < 1
Separate chaining is simple, but requires additional
memory outside the table
Maps and Dictionaries 20
Exercise: chaining
Assume you have a hash table H with
N=9 slots (H[0,8]) and let the hash
function be h(k)=k mod N.
Demonstrate (by picture) the insertion
of the following keys into a hash table
with collisions resolved by chaining.
5, 28, 19, 15, 20, 33, 12, 17, 10
Maps and Dictionaries 21
Map Methods with Separate Chaining
used for Collisions
Delegate operations to a list-based map at each cell:
Algorithm get(k):
Output: The value associated with the key k in the map, or null if there is no
entry with key equal to k in the map
return A[h(k)].get(k) {delegate the get to the list-based map at A[h(k)]}
Algorithm put(k,v):
Output: If there is an existing entry in our map with key equal to k, then we
return its value (replacing it with v); otherwise, we return null
t = A[h(k)].put(k,v) {delegate the put to the list-based map at A[h(k)]}
if t = null then {k is a new key}
n=n+1
return t
Algorithm remove(k):
Output: The (removed) value associated with key k in the map, or null if there
is no entry with key equal to k in the map
t = A[h(k)].remove(k) {delegate the remove to the list-based map at A[h(k)]}
if t ≠ null then {k was found}
n=n-1
return t
Phạm Bảo Sơn - DSA
Maps and Dictionaries 22
Linear Probing
Open addressing: the Example:
colliding item is placed in a
different cell of the table h(x) = x mod 13
Linear probing handles Insert keys 18, 41, 22,
collisions by placing the colliding 44, 59, 32, 31, 73,
item in the next (circularly) in this order
available table cell
Each table cell inspected is
referred to as a “probe”
0 1 2 3 4 5 6 7 8 9 10 11 12
Colliding items lump together,
causing future collisions to cause
a longer sequence of probes
41 18 44 59 32 22 31 73
0 1 2 3 4 5 6 7 8 9 10 11 12
Maps and Dictionaries 23
Search with Linear Probing
Consider a hash table A Algorithm get(k)
i ← h(k)
that uses linear probing p←0
get(k) repeat
We start at cell h(k) c ← A[i]
if c = ∅
We probe consecutive
return null
locations until one of the
else if c.key () = k
following occurs
return c.element()
An item with key k is found,
else
or
i ← (i + 1) mod N
An empty cell is found, or
p←p+1
N cells have been
until p = N
unsuccessfully probed
return null
Maps and Dictionaries 24
Updates with Linear Probing
To handle insertions and put(k, o)
deletions, we introduce a We throw an exception if the
special object, called table is full
AVAILABLE, which replaces We start at cell h(k)
deleted elements We probe consecutive cells
until one of the following
remove(k) occurs
We search for an entry with key k A cell i is found that is either
If such an entry (k, o) is found, empty or stores
we replace it with the special item AVAILABLE, or
AVAILABLE and we return N cells have been
unsuccessfully probed
element o
We store entry (k, o) in cell i
Else, we return null
Maps and Dictionaries 25
Double Hashing
Double hashing uses a
secondary hash function d(k) Common choice of
and handles collisions by compression function for
placing an item in the first the secondary hash
available cell of the series function:
h(k,i) = (h(k) + id(k)) mod N d2(k) = q − (k mod q)
for i = 0, 1, … , N − 1 where
The secondary hash function q<N
d(k) cannot have zero values q is a prime
The table size N must be a The possible values for
prime to allow probing of all d2(k) are
the cells 1, 2, … , q
Maps and Dictionaries 26
Example of Double Hashing
k h (k ) d (k ) Probes
Consider a hash 18 5 3 5
table storing integer 41 2 1 2
22 9 6 9
keys that handles 44 5 5 5 10
collision with double 59 7 4 7
32 6 3 6
hashing 31 5 4 5 9 0
N = 13 73 8 4 8
h(k) = k mod 13
d(k) = 7 − k mod 7
0 1 2 3 4 5 6 7 8 9 10 11 12
Insert keys 18, 41,
22, 44, 59, 32, 31,
73, in this order 31 41 18 32 59 73 22 44
0 1 2 3 4 5 6 7 8 9 10 11 12
Maps and Dictionaries 27
Performance of Hashing
In the worst case, searches, The expected running time
insertions and removals on a hash of all the dictionary ADT
table take O(n) time operations in a hash table is
The worst case occurs when all O(1)
the keys inserted into the map In practice, hashing is very
collide fast provided the load factor
The load factor α = n/N affects the is not close to 100%
performance of a hash table Applications of hash tables:
Assuming that the hash values are small databases
like random numbers, it can be compilers
shown that the expected number browser caches
of probes for an insertion with Open addressing is not
open addressing is faster than chaining method
1 / (1 − α) if space is an issue.
Phạm Bảo Sơn - DSA
Maps and Dictionaries 28
Example
Counting Word Frequencies.
Phạm Bảo Sơn - DSA
Maps and Dictionaries 29
Dictionaries
< 6
2 9
>
1 4 = 8
Maps and Dictionaries 30
Dictionary ADT
The dictionary ADT models a Dictionary ADT methods:
searchable collection of key- find(k): if the dictionary has an
value entries: ordered and entry with key k, returns it,
unordered. else, returns null
The main operations of a findAll(k): returns an iterator of
dictionary are searching, all entries with key k
inserting, and deleting items
insert(k, o): inserts and returns
Multiple items with the same the entry (k, o)
key are allowed
remove(e): remove the entry e
Applications: from the dictionary
word-definition pairs
entries(): returns an iterator of
credit card authorizations the entries in the dictionary
DNS mapping of host names
size(), isEmpty()
(e.g., datastructures.net) to
internet IP addresses (e.g.,
128.148.34.101)
Maps and Dictionaries 31
Example
Operation Output Dictionary
insert(5,A) (5,A) (5,A)
insert(7,B) (7,B) (5,A),(7,B)
insert(2,C) (2,C) (5,A),(7,B),(2,C)
insert(8,D) (8,D) (5,A),(7,B),(2,C),(8,D)
insert(2,E) (2,E) (5,A),(7,B),(2,C),(8,D),(2,E)
find(7) (7,B) (5,A),(7,B),(2,C),(8,D),(2,E)
find(4) null (5,A),(7,B),(2,C),(8,D),(2,E)
find(2) (2,C) (5,A),(7,B),(2,C),(8,D),(2,E)
findAll(2) (2,C),(2,E) (5,A),(7,B),(2,C),(8,D),(2,E)
size() 5 (5,A),(7,B),(2,C),(8,D),(2,E)
remove(find(5)) (5,A) (7,B),(2,C),(8,D),(2,E)
find(5) null (7,B),(2,C),(8,D),(2,E)
Phạm Bảo Sơn - DSA
Maps and Dictionaries 32
Implement Dictionary ADT
Unordered dictionary
List-based dictionary
Hash table
Ordered dictionary
Array-based dictionary – search table
Skip list
Maps and Dictionaries 33
A List-Based Dictionary
A log file or audit trail is a dictionary implemented by means of
an unsorted sequence
We store the items of the dictionary in a sequence (based on a
doubly-linked list or array), in arbitrary order
Performance:
insert takes O(1) time since we can insert the new item at the
beginning or at the end of the sequence
find and remove take O(n) time since in the worst case (the item is
not found) we traverse the entire sequence to look for an item with
the given key
The log file is effective only for dictionaries of small size or for
dictionaries on which insertions are the most common
operations, while searches and removals are rarely performed
(e.g., historical record of logins to a workstation)
Phạm Bảo Sơn - DSA
Maps and Dictionaries 34
The findAll(k) Algorithm
Algorithm findAll(k):
Input: A key k
Output: An iterator of entries with key equal to k
Create an initially-empty list L
B = D.entries()
while B.hasNext() do
e = B.next()
if e.key() = k then
L.insertLast(e)
return L.elements()
Phạm Bảo Sơn - DSA
Maps and Dictionaries 35
The insert and remove Methods
Algorithm insert(k,v):
Input: A key k and value v
Output: The entry (k,v) added to D
Create a new entry e = (k,v)
S.insertLast(e) {S is unordered}
return e
Algorithm remove(e):
Input: An entry e
Output: The removed entry e or null if e was not in D
{We don’t assume here that e stores its location in S}
B = S.positions()
while B.hasNext() do
p = B.next()
if p.element() = e then
S.remove(p)
return e
return null {there is no entry e in D}
Phạm Bảo Sơn - DSA
Maps and Dictionaries 36
Search Table
A search table (or lookup table) is a dictionary implemented by
means of a sorted array
We store the items of the dictionary in an array-based sequence,
sorted by key
We use an external comparator for the keys
Performance:
find takes O(log n) time, using binary search
insert takes O(n) time since in the worst case we have to shift n
items to make room for the new item
remove takes O(n) time since in the worst case we have to shift n
items to compact the items after the removal
A search table is effective only for dictionaries of small size or
for dictionaries on which searches are the most common
operations, while insertions and removals are rarely performed
(e.g., credit card authorizations)
Phạm Bảo Sơn - DSA
Maps and Dictionaries 37
Binary Search
Ordered dictionaries.
Binary search performs operation find(k) on a dictionary implemented by
means of an array-based sequence, sorted by key
similar to the high-low game
at each step, the number of candidate items is halved
terminates after a logarithmic number of steps
Example: find(7)
0 1 3 4 5 7 8 9 11 14 16 18 19
l m h
0 1 3 4 5 7 8 9 11 14 16 18 19
l m h
0 1 3 4 5 7 8 9 11 14 16 18 19
l m h
0 1 3 4 5 7 8 9 11 14 16 18 19
l=m =h
Maps and Dictionaries 38
Hash Table Implementation of
Dictionary ADT
Unordered dictionaries.
We can also create a hash-table
dictionary implementation.
If we use separate chaining to handle
collisions, then each operation can be
delegated to a list-based dictionary
stored at each hash table cell.
Phạm Bảo Sơn - DSA
Maps and Dictionaries 39
Skip Lists
What is a skip list (§8.6)
Operations
Search
S3 −∞ +∞
Insertion
Deletion S2 −∞ 15 +∞
Implementation S1 −∞ 15 23 +∞
Analysis S0 −∞ 10 15 23 36 +∞
Space usage
Search and update times
Maps and Dictionaries 40
What is a Skip List
A skip list for a set S of distinct (key, value) items is a series of
lists S0, S1 , … , Sh such that
Each list Si contains the special keys +∞ and −∞
List S0 contains the keys of S in nondecreasing order
Each list is a subsequence of the previous one, i.e.,
S0 ⊇ S1 ⊇ … ⊇ Sh
List Sh contains only the two special keys
We show how to use a skip list to implement the dictionary ADT
S3 −∞ +∞
S2 −∞ 31 +∞
S1 −∞ 23 31 34 64 +∞
S0 −∞ 12 23 26 31 34 44 56 64 78 +∞
Maps and Dictionaries 41
Search
We search for a key x in a a skip list as follows:
We start at the first position of the top list
At the current position p, we compare x with y ← key(next(p))
x = y: we return element(next(p))
x > y: we “scan forward”
x < y: we “drop down”
If we try to drop down past the bottom list, we return null
Example: search for 78
S3 −∞ +∞
S2 −∞ 31 +∞
S1 −∞ 23 31 34 64 +∞
S0 −∞ 12 23 26 31 34 44 56 64 78 +∞
Maps and Dictionaries 42
Randomized Algorithms
A randomized algorithm We analyze the expected
performs coin tosses (i.e., uses running time of a randomized
random bits) to control its algorithm under the following
execution assumptions
It contains statements of the the coins are unbiased, and
type the coin tosses are independent
b ← random() The worst-case running time of
if b = 0 a randomized algorithm is often
do A … large but has very low
else { b = 1} probability (e.g., it occurs when
do B … all the coin tosses give “heads”)
Its running time depends on the We use a randomized algorithm
outcomes of the coin tosses to insert items into a skip list
Maps and Dictionaries 43
Insertion
To insert an entry (x, o) into a skip list, we use a randomized
algorithm:
We repeatedly toss a coin until we get tails, and we denote with i
the number of times the coin came up heads
If i ≥ h, we add to the skip list new lists Sh+1, … , Si +1, each
containing only the two special keys
We search for x in the skip list and find the positions p0, p1 , …, pi
of the entries with largest key less than x in each list S0, S1, … , Si
For j ← 0, …, i, we insert entries (x, o) into list Sj after position pj
Example: insert key 15, with i = 2
S3 −∞ +∞
p2
S2 −∞ +∞ S2 −∞ 15 +∞
p1
S1 −∞ 23 +∞ S1 −∞ 15 23 +∞
p0
S0 −∞ 10 23 36 +∞ S0 −∞ 10 15 23 36 +∞
Maps and Dictionaries 44
Deletion
To remove an entries with key x from a skip list, we proceed as
follows:
We search for x in the skip list and find the positions p0, p1 , …, pi
of the entries with key x, where position pj is in list Sj
We remove positions p0, p1 , …, pi from the lists S0, S1, … , Si
We remove all but one list containing only the two special keys
Example: remove key 34
S3 −∞ +∞
p2
S2 −∞ 34 +∞ S2 −∞ +∞
p1
S1 −∞ 23 34 +∞ S1 −∞ 23 +∞
p0
S0 −∞ 12 23 34 45 +∞ S0 −∞ 12 23 45 +∞
Maps and Dictionaries 45
Implementation
We can implement a skip list
with quad-nodes
A quad-node stores:
entry
link to the node prev
quad-node
link to the node next
link to the node below x
link to the node above
Also, we define special keys
PLUS_INF and MINUS_INF,
and we modify the key
comparator to handle them
Maps and Dictionaries 46
Space Usage
Consider a skip list with n
The space used by a skip list
entries
depends on the random bits
By Fact 1, we insert an entry
used by each invocation of the
in list Si with probability 1/2i
insertion algorithm
By Fact 2, the expected size
We use the following two basic of list Si is n/2i
probabilistic facts:
The expected number of
Fact 1: The probability of getting i
nodes used by the skip list is
consecutive heads when
flipping a coin is 1/2i h
n h
1
Fact 2: If each of n entries is ∑ 2i ∑ 2 i < 2 n
= n
i =0 i =0
present in a set with
probability p, the expected size Thus, the expected space
of the set is np usage of a skip list with n
items is O(n)
Maps and Dictionaries 47
Height
The running time of the Consider a skip list with n
search an insertion entires
algorithms is affected by the By Fact 1, we insert an entry
height h of the skip list in list Si with probability 1/2i
By Fact 3, the probability that
We show that with high list Si has at least one item is
probability, a skip list with n at most n/2i
items has height O(log n) By picking i = 3log n, we have
We use the following that the probability that S3log n
additional probabilistic fact: has at least one entry is
at most
Fact 3: If each of n events has
n/23log n = n/n3 = 1/n2
probability p, the probability
that at least one event Thus a skip list with n entries
occurs is at most np has height at most 3log n with
probability at least 1 − 1/n2
Maps and Dictionaries 48
Search and Update Time
The search time in a skip list When we scan forward in a
is proportional to list, the destination key does
the number of drop-down not belong to a higher list
steps, plus A scan-forward step is
the number of scan-forward associated with a former coin
steps toss that gave tails
The drop-down steps are By Fact 4, in each list the
bounded by the height of the expected number of scan-
skip list and thus are O(log n) forward steps is 2
with high probability Thus, the expected number of
To analyze the scan-forward scan-forward steps is O(log n)
steps, we use yet another We conclude that a search in a
probabilistic fact: skip list takes O(log n)
Fact 4: The expected number of expected time
coin tosses required in order The analysis of insertion and
to get tails is 2 deletion gives similar results
Maps and Dictionaries 49
Summary
A skip list is a data Using a more complex
structure for probabilistic analysis,
dictionaries that uses a one can show that
randomized insertion these performance
algorithm bounds also hold with
In a skip list with n high probability
entries Skip lists are fast and
The expected space used simple to implement in
is O(n) practice
The expected search,
insertion and deletion
time is O(log n)
Maps and Dictionaries 50