Hashing in Computer Science
1. Introduction to Hashing
Hashing is a technique used to uniquely identify objects from a group of similar objects by
converting input data of variable size into a fixed-size value. The output, known as a hash value or
hash code, is generated using a function called the hash function.
Hashing is widely used in various applications such as database management, cryptography, and
data storage for fast retrieval.
2. Key Terminology
• Hash Function: A function that maps input data to a fixed-size hash value.
• Hash Table: A data structure that stores data in an associative manner using key-value pairs.
• Bucket: A slot in the hash table where elements are stored.
• Collision: A scenario where two inputs produce the same hash value.
• Load Factor: The ratio of the number of elements in the hash table to the total number of
buckets.
3. Hash Function
A good hash function should:
1. Minimize Collisions: Generate unique hash values for distinct inputs.
2. Be Fast: Quickly compute hash values.
3. Distribute Uniformly: Spread the data across the table to avoid clustering.
3.1 Examples of Hash Functions
1. Division Method:
h(k)=kmodm
Here, k is the key, and m is the size of the hash table.
2. Multiplication Method:
h(k)=⌊m(kAmod1)⌋
Where A is a constant, and m is the size of the table.
3. Universal Hashing: A family of hash functions to minimize worst-case scenarios.
4. Collision Handling
Collisions occur when multiple keys hash to the same bucket. Strategies to handle collisions
include:
4.1 Separate Chaining
Each bucket points to a linked list or chain containing all elements that hash to the same value.
Advantages:
• Simple to implement.
• Handles collisions effectively.
Disadvantages:
• Performance degrades with increasing chains.
4.2 Open Addressing
Collisions are resolved by finding an empty bucket using a probing sequence. Common probing
techniques include:
1. Linear Probing: Search sequentially for the next empty bucket.
2. Quadratic Probing: Use a quadratic function to search for an empty bucket.
3. Double Hashing: Use a second hash function for probing.
5. Load Factor and Resizing
The load factor (α) measures the utilization of a hash table:
of elements sizeα=Table sizeNumber of elements
To maintain efficiency, hash tables are resized when the load factor exceeds a threshold.
Resizing Strategy:
1. Create a larger table.
2. Rehash all elements into the new table.
6. Applications of Hashing
Hashing is used in various real-world scenarios:
6.1 Hash Maps and Dictionaries
• Data storage for key-value pairs.
• Efficient retrieval in constant time (O(1)) in average cases.
6.2 Cryptography
• Hashing secures data through hash functions like MD5, SHA-1, and SHA-256.
• Used for digital signatures and password storage.
6.3 Caching
• Used in web caching for mapping URLs to cached pages.
6.4 Databases
• Indexing data for efficient retrieval.
6.5 Bloom Filters
• A space-efficient probabilistic data structure using hashing to test membership.
7. Cryptographic Hash Functions
Cryptographic hash functions are designed for security and possess the following properties:
1. Deterministic: Same input always produces the same hash.
2. Fast: Compute hash values quickly.
3. Collision-Resistant: Hard to find two inputs producing the same hash.
4. Preimage Resistance: Hard to reverse-engineer the input from the hash.
Examples:
• MD5: Fast but vulnerable to attacks.
• SHA-1: Improved security but not recommended for modern use.
• SHA-256: Widely used in blockchain technologies.
8. Hashing vs Other Data Structures
Feature Hash Table Array Binary Search Tree
Access Time O(1)* O(1) O(logn)
Search Time O(1)* O(n) O(logn)
Ordered Storage No Yes Yes
O(1) in average case; O(n) in worst case due to collisions.
9. Advantages of Hashing
1. Efficiency: Average-case constant time for lookups and insertions.
2. Scalability: Effective for large datasets.
3. Flexibility: Adaptable to various applications.
10. Disadvantages of Hashing
1. Collisions: Can degrade performance.
2. Space Overhead: Requires additional memory for empty buckets.
3. Complexity: Designing a good hash function is challenging.
11. Real-World Example
Caching with Hashing
Web browsers use hashing to map URLs to cached pages, enabling faster access. For example:
1. Compute a hash of the URL.
2. Use the hash to index the cache and retrieve the content.
12. Practical Implementation
Python Example Using Dictionaries:
python
Copy code
# Using Python's dictionary to demonstrate hashing
hash_table = {}
# Insert key-value pairs
hash_table["name"] = "Alice"
hash_table["age"] = 25
# Retrieve value by key
print(hash_table["name"]) # Output: Alice
# Delete key-value pair
del hash_table["age"]
13. Future of Hashing
1. Quantum Computing: Adapting hash functions for quantum-resistant cryptography.
2. Blockchain: Enhanced hashing techniques for secure and scalable blockchain systems.