Hash Function
30 September, 2024
Motivation
I Hash function is an important primitive in cryptography.
I Hash function is used as a primary building block for diverse
cryptographic applications.
I Extremely hard to construct due to the demands on security and
performance requirements.
I Some examples of hash function: SHA-1, RIPEMD-160, SHA-2
family (including SHA-256, SHA-384, SHA-512 etc).
I SHA-3 (Keccak): latest addition, standardised by NIST in 2015.
Definition
I A hash function is a mapping H such that:
1. H maps inputs of arbitrary bit-lengths to outputs of a fixed bit-length:
H : {0, 1}∗ → {0, 1}n
More generally, H maps elements of a set D to a set R where
|D| ≥ |R|.
2. H(x) can be efficiently computed for all x ∈ {0, 1}∗ .
Definition
I A hash function is a mapping H such that:
1. H maps inputs of arbitrary bit-lengths to outputs of a fixed bit-length:
H : {0, 1}∗ → {0, 1}n
More generally, H maps elements of a set D to a set R where
|D| ≥ |R|.
2. H(x) can be efficiently computed for all x ∈ {0, 1}∗ .
I H is called an n-bit hash function.
I H(x) is called the hash value, hash digest or message digest of x.
I Description of H is public.
I Typically no secret key is involved in the description of H.
Security Properties
H:D→R
I Preimage Resistance: Given a hash digest y ∈R R, it is
computationally infeasible to find (with non-negligible probability of
success) any input x ∈ D such that H(x) = y .
I x is called a preimage of y .
I H is called one-way or preimage resistant if the preimage cannot be
computed efficiently.
Security Properties
H:D→R
I Preimage Resistance: Given a hash digest y ∈R R, it is
computationally infeasible to find (with non-negligible probability of
success) any input x ∈ D such that H(x) = y .
I x is called a preimage of y .
I H is called one-way or preimage resistant if the preimage cannot be
computed efficiently.
I Second Preimage Resistance: Given an input x ∈R D, it is
computationally infeasible to find (with non-negligible probability of
success) a second input x 0 6= x such that H(x) = H(x 0 ).
I H is called second preimage resistant if the second preimage cannot be
computed efficiently.
Security Properties (Contd.)
I Collision Resistance: It is computationally infeasible to find (with
non-negligible probability of success) two distinct inputs x, x 0 such
that H(x) = H(x 0 ).
I The pair (x, x 0 ) is called a collision for H.
I H is called collision resistant if a collision cannot be computed
efficiently.
Some applications of Hash Function
Some applications of Hash Function
1. Password protection: The server stores huserid, H(password)i in a
password file.
I If attacker gets access to the password file, (s)he cannot retrieve any
password.
I Which security property H must satisfy?
Some applications of Hash Function
1. Password protection: The server stores huserid, H(password)i in a
password file.
I If attacker gets access to the password file, (s)he cannot retrieve any
password.
I Which security property H must satisfy? Preimage Resistance
Some applications of Hash Function
1. Password protection: The server stores huserid, H(password)i in a
password file.
I If attacker gets access to the password file, (s)he cannot retrieve any
password.
I Which security property H must satisfy? Preimage Resistance
2. Modification Detection Codes (MDC): To ensure that a message m is
not modified by unauthorized means, compute H(m) and protect
H(m) from unauthorized modification.
I Used in intrusion detection – Tripwire.
I Which security property H has to satisfy?
Some applications of Hash Function
1. Password protection: The server stores huserid, H(password)i in a
password file.
I If attacker gets access to the password file, (s)he cannot retrieve any
password.
I Which security property H must satisfy? Preimage Resistance
2. Modification Detection Codes (MDC): To ensure that a message m is
not modified by unauthorized means, compute H(m) and protect
H(m) from unauthorized modification.
I Used in intrusion detection – Tripwire.
I Which security property H has to satisfy?
Second Preimage Resistance.
Applications (Contd.)
3. Digital Signature: Instead of signing a message m of arbitrary length,
a much shorter message digest H(m) is usually signed.
I Which security property H must satisfy?
Applications (Contd.)
3. Digital Signature: Instead of signing a message m of arbitrary length,
a much shorter message digest H(m) is usually signed.
I Which security property H must satisfy?
I Collision Resistance: Suppose the signer can choose two distinct
messages m1 and m2 , such that H(m1 ) = H(m2 ). She signs m1 and
later claims that she signed m2 .
Applications (Contd.)
3. Digital Signature: Instead of signing a message m of arbitrary length,
a much shorter message digest H(m) is usually signed.
I Which security property H must satisfy?
I Collision Resistance: Suppose the signer can choose two distinct
messages m1 and m2 , such that H(m1 ) = H(m2 ). She signs m1 and
later claims that she signed m2 .
4. Message Authentication Codes (MAC): Provides data integrity and
data origin authentication.
5. Key Derivation Function (KDF): Generate a key from a shared secret.
6. Hash-Based Signature: Secure against quantum attackers.
Relationship among security notions
Collision resistance implies second preimage resistance.
Relationship among security notions
Collision resistance implies second preimage resistance.
I This fact can be established through contradiction.
I A ⇒ B ⇐⇒ B̄ ⇒ Ā.
I Suppose we have a collision resistant hash function H. We want to
establish that H is second preimage resistant.
Relationship among security notions
Collision resistance implies second preimage resistance.
I This fact can be established through contradiction.
I A ⇒ B ⇐⇒ B̄ ⇒ Ā.
I Suppose we have a collision resistant hash function H. We want to
establish that H is second preimage resistant.
I Assume that H is not second preimage resistant.
I Given H and an algorithm B (as a black box) that finds second
preimage for H, we construct an attacker A that breaks the collision
resistance of H.
I This contradicts our initial proposition that H is collision resistant.
Collision Resistance and Preimage Resistance
Collision resistance does not guarantee preimage resistance!
Collision Resistance and Preimage Resistance
Collision resistance does not guarantee preimage resistance!
I Suppose H : {0, 1}∗ → {0, 1}n is a collision resistant hash function
(CRHF).
I Define another function G : {0, 1}∗ → {0, 1}n+1 as follows:
0||x if x ∈ {0, 1}n
G (x) =
1||H(x) if x ∈ / {0, 1}n
I Claim: G is collision resistant.
Collision Resistance and Preimage Resistance
Collision resistance does not guarantee preimage resistance!
I Suppose H : {0, 1}∗ → {0, 1}n is a collision resistant hash function
(CRHF).
I Define another function G : {0, 1}∗ → {0, 1}n+1 as follows:
0||x if x ∈ {0, 1}n
G (x) =
1||H(x) if x ∈/ {0, 1}n
I Claim: G is collision resistant.
I G is not preimage resistant as one can easily find preimage for at
least 1/2 of all y ∈ {0, 1}n+1 .
Collision Resistance and Preimage Resistance II
Collision resistance does guarantee preimage resistance if H is “somewhat
uniform” and |D| ≥ 2|R|.
Somewhat uniform: All the hash digests have roughly the same number of
preimages.
Collision Resistance and Preimage Resistance II
Collision resistance does guarantee preimage resistance if H is “somewhat
uniform” and |D| ≥ 2|R|.
Somewhat uniform: All the hash digests have roughly the same number of
preimages.
I Suppose H is collision resistant but not preimage resistant and B is
an algorithm that finds a preimage with non-negligible probability.
I Construct an algorithm A that finds a collision in H with
non-negligible property.
Collision Resistance and Preimage Resistance II
Collision resistance does guarantee preimage resistance if H is “somewhat
uniform” and |D| ≥ 2|R|.
Somewhat uniform: All the hash digests have roughly the same number of
preimages.
I Suppose H is collision resistant but not preimage resistant and B is
an algorithm that finds a preimage with non-negligible probability.
I Construct an algorithm A that finds a collision in H with
non-negligible property.
1. Choose some x ∈R D, compute y = H(x) and give y to B.
2. B returns some x 0 as preimage of y .
3. If x 6= x 0 then return (x, x 0 ).
I Let the success probability of B be . What will be the probability of
success for A?
Security Notions
I For practical hash functions it is reasonable to assume:
Collision Resistance implies both Second Preimage Resistance and
Preimage Resistance.
I Collision Resistance is the hardest to achieve.
I Most hash function constructions (and attacks) target this property.
Generic Attacks
I A generic attack on a hash function H : {0, 1}∗ → {0, 1}n does not
take into account concrete properties of a particular hash function,
say SHA-256.
I Here we treat H as a random function.
I For each new x ∈ {0, 1}∗ , y = H(x) is “computed” by selecting y
uniformly at random from {0, 1}n .
Generic Attacks
I A generic attack on a hash function H : {0, 1}∗ → {0, 1}n does not
take into account concrete properties of a particular hash function,
say SHA-256.
I Here we treat H as a random function.
I For each new x ∈ {0, 1}∗ , y = H(x) is “computed” by selecting y
uniformly at random from {0, 1}n .
I From security point of view such a random function is an ideal hash
function.
I From implementation point of view such a function cannot be realized
efficiently. [Why?]
I Generic attacks provide an upper bound on security for any hash
function.
I A concrete hash function achieves ideal security if the best attack
against it matches the generic attack.
Generic Attack: Find-Preimage
H : {0, 1}∗ → {0, 1}n
I Given some y ∈ {0, 1}n , select arbitrary x ∈ {0, 1}∗ and compute
H(x).
I If H(x) = y then STOP and return x.
I Else, continue with a different value of x.
I Expected number of steps (hash function evaluation):
Generic Attack: Find-Preimage
H : {0, 1}∗ → {0, 1}n
I Given some y ∈ {0, 1}n , select arbitrary x ∈ {0, 1}∗ and compute
H(x).
I If H(x) = y then STOP and return x.
I Else, continue with a different value of x.
I Expected number of steps (hash function evaluation):
I 2n .
I Exercise: Devise a generic attack to find second preimage and analyse
its cost in terms of number of hash compuations.
Generic Attack: Collision
H : {0, 1}∗ → {0, 1}n
Goal: Output (x, x 0 ) s.t. H(x) = H(x 0 ) but x 6= x 0 .
Generic Attack: Collision
H : {0, 1}∗ → {0, 1}n
Goal: Output (x, x 0 ) s.t. H(x) = H(x 0 ) but x 6= x 0 .
I Select an arbitrary x ∈ {0, 1}∗ and store [H(x), x] in a table (sorted
by the first entry).
I Continue this process until a collision is found.
I What is the expected number of steps (hash function evaluation)?
Generic Attack: Collision
H : {0, 1}∗ → {0, 1}n
Goal: Output (x, x 0 ) s.t. H(x) = H(x 0 ) but x 6= x 0 .
I Select an arbitrary x ∈ {0, 1}∗ and store [H(x), x] in a table (sorted
by the first entry).
I Continue this process until a collision is found.
I What is the expected number of steps (hash function evaluation)?
I 2n/2 (approximately) [Birthday Paradox]
I What is the space requirement?
Generic Attack: Collision
H : {0, 1}∗ → {0, 1}n
Goal: Output (x, x 0 ) s.t. H(x) = H(x 0 ) but x 6= x 0 .
I Select an arbitrary x ∈ {0, 1}∗ and store [H(x), x] in a table (sorted
by the first entry).
I Continue this process until a collision is found.
I What is the expected number of steps (hash function evaluation)?
I 2n/2 (approximately) [Birthday Paradox]
I What is the space requirement? 2n/2 (approximately).
I An elegant method called VW Parallel Collision Search makes the
space requirement negligible (van Oorschot & Wiener 1993) without
increasing the time complexity.
Note: Finding a collision is easier than finding preimage or second
preimage.
Quantum Setting
What are the best possible generic attacks for finding
1. Preimage
2. Second Preimage
3. Collision
How to construct a hash function?
I Suppose we are given a function:
f : {0, 1}n × {0, 1}r → {0, 1}n
I f is called a Compression Function that maps (n + r )-bit inputs to
n-bit outputs.
How to construct a hash function?
I Suppose we are given a function:
f : {0, 1}n × {0, 1}r → {0, 1}n
I f is called a Compression Function that maps (n + r )-bit inputs to
n-bit outputs.
I Our goal is to construct another function H based on f that takes
arbitrary inputs and map them to n-bit outputs.
I We want the assurance that if f is collision resistant then H must also
be collision resistant.
I Achieved through an iterated hash function.
Merkle’s Meta Construction
I Components:
I A compression function: f : {0, 1}n+r → {0, 1}n .
I An initial vector IV ∈ {0, 1}n .
Merkle’s Meta Construction
I Components:
I A compression function: f : {0, 1}n+r → {0, 1}n .
I An initial vector IV ∈ {0, 1}n .
I Computing H(x) (bit-length of x is b < 2r )
Preprocessing:
1. Break-up x into r -bit blocks: x̄ = x1 , x2 , . . . , xt (pad the last block
with 0 bits if necessary).
2. Add the length-block xt+1 to hold the right-justified representation of
b.
Merkle’s Meta Construction
I Components:
I A compression function: f : {0, 1}n+r → {0, 1}n .
I An initial vector IV ∈ {0, 1}n .
I Computing H(x) (bit-length of x is b < 2r )
Preprocessing:
1. Break-up x into r -bit blocks: x̄ = x1 , x2 , . . . , xt (pad the last block
with 0 bits if necessary).
2. Add the length-block xt+1 to hold the right-justified representation of
b.
I Processing:
1. Set H0 = IV.
2. Compute chaining variables Hi = f (xi , Hi−1 ) for 1 ≤ i ≤ t + 1.
3. Output H(x) = Ht+1 .
Iterated Hash Function
Iterated Hash Function
I [Claim] If f is collision resistant then H is also collision resistant.
I The proof is by contradiction.
I A collision in H can be converted into a collision of f .
I Inclusion of the length-block is necessary for the proof.
I Problem of constructing a CRHF H is now reduced to constructing a
collision resistant compression function f .
Reference
1. Chapter 8 of Boneh-Shoup (includes discussion on MAC from Hash
Function)
2. Video lacture V3 of Alfred Menezes:
https://cryptography101.ca/crypto101-building-blocks/