Detailed Notes on Hashing
### Introduction to Hash Tables
- **Comparison-based Models**: Traditional searching methods like binary search rely on
comparisons, which take O(log n)
time for searching in sorted data. The question arises: Can we do better?
- **Comparison model**: In this model, keys are only compared, and it's proven that any algorithm
using this model
must take at least O(log n) time.
- **Improvement**: By going beyond comparison models, we can achieve faster operations. For
example, using **hashing**
allows us to search, insert, and delete items in O(1) time.
### Basics of Hashing
- **Hash Function**: A hash function is used to map keys to integers, typically within a fixed interval
[0, N-1],
where N is the size of the hash table.
- **Example**: For integer keys, a simple hash function could be h(x) = x mod N, where N is the
size of the array (hash table).
- **Hash Value**: The output of the hash function for a given key is called the **hash value**. For
instance, if x = 15
and N = 10, the hash value is 15 mod 10 = 5.
- **Hash Table**: It consists of:
- A **hash function** to compute indices.
- An **array** (or table) of size N.
When storing an item, the key is hashed to determine its index in the table.
### Example of a Hash Table
- Imagine we want to store people's Aadhar (ID) numbers along with their names. A simple way to
do this is by using a
hash function that takes the last four digits of the Aadhar number as the hash value. For instance,
the Aadhar number
"451-229-0004" would be stored at index 0004.
- **Size of the table**: Here, we assume the table size N = 10,000, and the hash function is defined
as h(x) = last four
digits of x.
### Hash Function Breakdown
A hash function is often made up of two parts:
1. **Hash Code**: Maps keys to integers.
2. **Compression Function**: Converts those integers into indices within the bounds of the table [0,
N - 1].
- **Example**: If you use a hash function like h(x) = h_2(h_1(x)), the hash code is applied first, and
then the compression
function is applied to ensure the result is within the table size.
### Handling Collisions
When two keys hash to the same index, a **collision** occurs. Several strategies exist to handle
collisions:
1. **Separate Chaining**: Each cell in the hash table points to a linked list of entries that hash to the
same index.
2. **Linear Probing**: If a collision occurs, the algorithm looks for the next available cell (probing) in
a circular manner.
3. **Double Hashing**: A secondary hash function determines how far to jump when resolving
collisions.
### Performance of Hashing
- In the worst case (when all keys collide), operations like search, insert, and delete could take O(n)
time.
- **Load Factor** alpha = n/N, where n is the number of items, and N is the table size. As the load
factor increases
(closer to 1), the chance of collisions increases, reducing performance.
- In practice, hashing is very efficient, provided the load factor is kept under control (e.g., less than
100%).
### Applications of Hashing
Hash tables are widely used in areas like:
- Small databases.
- Caching systems (e.g., web browser caches).
- Algorithms that detect duplicates or find distinct elements in large datasets.