Linked lists
• A linked list is a data structure in which the objects are arranged in a
linear order.
• Linked list vs. array
• Array: the linear order is determined by the array indices
• Linked list: the order is determined by a pointer in each object
• A doubly linked list 𝐿 is an object with an attribute key and two other
pointer attributes: 𝑛𝑒𝑥𝑡 and 𝑝𝑟𝑒𝑣
• The object may also contain satellite data.
Linked list
• Given an element 𝑥 in the list, 𝑥. 𝑛𝑒𝑥𝑡 points to its successor in the
linked list, and 𝑥. 𝑝𝑟𝑒𝑣 points to its predecessor.
• If 𝑥. 𝑝𝑟𝑒𝑣 = NIL, then the element 𝑥 has no predecessor and is
therefore the first element of the list
• If 𝑥. 𝑛𝑒𝑥𝑡 = NIL, then the element 𝑥 has no successor and is therefore
the last element of the list.
• If a list is singly linked then we drop the 𝑝𝑟𝑒𝑣 pointer.
• If a list is sorted, the linear order of the list corresponds to the linear
order of the keys stored in the elements of the list.
List-search
List-insert
• Given an element 𝑥 whose key attribute has already been set, the LIST-INSERT procedure
“splices” 𝑥 onto the front of the linked list.
• Our attribute notation can cascade, so that 𝐿. ℎ𝑒𝑎𝑑. 𝑝𝑟𝑒𝑣 denotes the 𝑝𝑟𝑒𝑣 attribute of the
object that 𝐿. ℎ𝑒𝑎𝑑 points to.
List-delete
• The procedure list delete removes an element 𝑥 from a linked list 𝐿.
• It must be given a pointer to 𝑥.
• it then splices 𝑥 out of the list by updating the pointers.
• If we wish to delete an element with a
given key, we must first call LIST-SEARCH.
Sentinels
• The code for LIST-DELETE would be simpler if we could ignore the
boundary conditions and head and tail of the list
• A sentinel is a dummy object that allows us to simplify boundary
conditions.
• For the list 𝐿 we provide 𝐿. 𝑛𝑖𝑙 that represents NIL but have all the
attributes of the other objects in the list.
• Wherever we have a reference to NIL in list code, we replace it by a
reference to the sentinel 𝐿. 𝑛𝑖𝑙
Hash tables
• Insert, search, and delete are the dictionary operations.
• Hash table is an effective data structure for implementing dictionaries.
• Although searching for an element in a hash table can take as long as
searching for an element in a linked list – Θ(𝑛) time in the worst case
– in practice, hashing performs extremely well.
• Under reasonable assumptions, the average time to search for an
element in a hash table is 𝑂 𝑛 .
Direct-address tables
Hash tables
• A hash function ℎ computes the slot from the key 𝑘
• A hash function maps the universe 𝑈 of keys in to slots of a hash table
𝑇[0 . . 𝑚 − 1]:
ℎ: 𝑈 → 0, 1, … , 𝑚 − 1
where the size 𝑚 of the hash table is typically much less than 𝑈 .
• We say that an element with key 𝑘 hashes to slot ℎ 𝑘
• We also say that ℎ 𝑘 is the hash value of key 𝑘.
Resolving collision
• If two keys have the same hash, we call this situation a collision.
• Collision resolution by chaining
• In chaining, we place all the elements that hash to the same slot into the same
linked list. Slot 𝑗 contains a pointer to the head of the list of all stored elements
that hash to 𝑗. If there are not such elements, slot 𝑗 contains NIL.
Analysis of hashing with chaining
• In a hash table in which collisions are resolved by chaining, an
unsuccessful search takes average-case time Θ 1 + 𝛼 , under the
assumption of simple uniform hashing.
• In a hash table in which collisions are resolved by chaining, a
successful search takes average-case time Θ 1 + 𝛼 , under the
assumption of simple uniform hashing.
Analysis of hashing with chaining
• Given a hash table 𝑇 with 𝑚 slots that stores 𝑛 elements, we define the
load factor 𝛼 for 𝑇 as 𝑛/𝑚, that is the average number of elements
stored in a chain.
• Our analysis will be in terms of 𝛼, which can be less than, equal to, or
greater that 1.
• Simple Uniform Hashing:
• It is assumed that any given element is equally likely to hash into any of the 𝑚
slots, independently of where any other element has hashed to.
• For 𝑗 = 0, 1, … , 𝑚 − 1, let us denote the length of the list 𝑇[𝑗] by 𝑛𝑗 , so that
𝑛 = 𝑛0 + 𝑛1 + ⋯ + 𝑛𝑚−1 . The expected value of 𝑛𝑗 is 𝐸 𝑛𝑗 = 𝛼 = 𝑛/𝑚.
Hash functions
• The division method
• ℎ 𝑘 = 𝑘 mod 𝑚
• Avoid taking 𝑚 to be a power to 2
• A prime number not too close to a power of 2 is often a good choice.
• The multiplication method:
• First multiply 𝑘 by a constant 𝐴 in the range 0 < 𝐴 < 1
• Extract the fractional part of the number 𝑘𝐴
• Multiply this value by 𝑚 and take the floor of the result
ℎ 𝑘 = ⌊𝑚 𝑘𝐴 mod 1 ⌋
Hashing with chaining
• In a hash table in which collisions are resolved by chaining, an
unsuccessful search takes average-case time Θ 1 + 𝛼 , under the
assumption of simple uniform hashing.
• Under the assumption of simple uniform hashing, any key 𝑘 not
already stored in the table is equally likely to hash to any of the 𝑚
slots. The expected time to search unsuccessfully for a key 𝑘 is the
expected time to search to the end of list 𝑇 ℎ 𝑘 , which has expected
length 𝐸 𝑛ℎ 𝑘 = 𝛼. Thus, the expected number of elements
examined in an unsuccessful search is 𝛼, and the total time required
(including the time for computing ℎ 𝑘 ) is Θ 1 + 𝛼 .
Hashing with chaining
• In a hash table in which collisions are resolved by chaining, a
successful search takes average-case time Θ(1 + 𝛼), under the
assumption of simple uniform hashing.
• Let 𝑥𝑖 denote the 𝑖th element added to 𝑥’s list after 𝑥 was added to the list, for
𝑖 = 1,2, … , 𝑛, and let 𝑘𝑖 = 𝑥𝑖 . 𝑘𝑒𝑦. For keys 𝑘𝑖 and 𝑘𝑗 define indicator random
1
variable 𝑋𝑖𝑗 = 𝐼 ℎ 𝑘𝑖 = ℎ 𝑘𝑗 = . So 𝐸 𝑋𝑖𝑗 = 1/𝑚.
𝑚
• The expected number of elements examined in a successful search is:
Open addressing
• In open addressing, all elements occupy the hash table itself. That is,
each table entry contains either an element of the dynamic set or NIL.
• When searching for an element, we systematically examine table slots
until we find the desired element or have ascertained that the element
is not in the table.
• No lists/elements are stored outside the table, unlike in chaining.
• In open addressing, the hash table can “fill up” so that no further
insertions can be made.
• The load factor 𝛼 = 𝑛/𝑚 can never exceed 1.
Open Addressing
• Advantages
• It avoids pointers altogether.
• We compute the sequence of slots to be examined.
• The extra memory freed by not storing pointers provides the hash table with a
larger number of slots for the same amount of memory, potentially yielding
fewer collisions and faster retrieval.
• To perform insertion using open addressing, we successively examine, or
probe, the hash table until we find an empty slot in which to put the key.
• The sequence of positions probed depends upon the key being inserted.
Probing
• To perform insertion using open addressing, we successively examine,
or probe the hash table until we find an empty slot in which to put the
key.
• The sequence of positions probed depends on the key being inserted.
• To determine which slots to probe, we extend the hash function to
include the probe number (starting from 0) as a second input. Thus the
hash function becomes
ℎ: 𝑈 × 0, 1, … , 𝑚 − 1 → 0, 1, … , 𝑚 − 1
• The probe sequence: ℎ 𝑘, 0 , ℎ 𝑘, 1 , … , ℎ 𝑘, 𝑚 − 1
Linear Probing
• Given an ordinary hash function ℎ ′ : 𝑈 → 0, 1, … , 𝑚 − 1 , which we
refer to as an auxiliary hash function, the method of linear probing
uses that hash function ℎ 𝑘, 𝑖 = ℎ ′ 𝑘 + 𝑖 mod 𝑚 for 𝑖 =
0, 1, … , 𝑚 − 1.
• Given key 𝑘, we first probe 𝑇[ℎ ′ 𝑘 ], i.e., the slot given my the
auxiliary hash function.
• We next probe slot 𝑇 ℎ ′ 𝑘 + 1 , and so on up to slot 𝑇 𝑚 − 1 . Then
wrap around to slots 𝑇 0 , 𝑇 1 , … until fillay probe the slot
𝑇 ℎ′ 𝑘 − 1 .
Linear Probing
• Linear probing is easy to implement, but it suffers from a problem
known as primary clustering.
• Long runs of occupied slots build up, increasing the average search
time.
• Clusters arise because an empty slot preceded by 𝑖 full slots gets filled
next with probability (𝑖 + 1)/𝑚.
• Long runs of occupied slots tend to get longer and the average search
time increases.
Quadratic Probing
• Quadratic probing uses a hash function of the form
ℎ 𝑘, 𝑖 = ℎ ′ 𝑘 + 𝑐1 𝑖 + 𝑐2 𝑖 2 mod 𝑚
where ℎ′ is an auxiliary hash function, 𝑐1 and 𝑐2 are positive auxiliary
constants, and 𝑖 = 0, 1, … , 𝑚 − 1.
• Secondary clustering.
Double hashing
• Double hashing offers one of the best methods available for open
addressing because the permutations produced have many of the
characteristics or randomly chosen permutations. Double hashing
uses a hash function of the form
ℎ 𝑘, 𝑖 = ℎ1 𝑘 + 𝑖ℎ2 𝑘 mod 𝑚
where both ℎ1 and ℎ2 are auxiliary hash functions.
Perfect hashing