Introduction To Cryptography (2025)
Introduction To Cryptography (2025)
Alessandro Barenghi
Definition
• The study of techniques to allow secure communication and data storage in
presence of attackers
Features provided
• Confidentiality: data can be accessed only by chosen entities
• Integrity/freshness: detect/prevent tampering or replays
• Authenticity: data and their origin are certified
• Non-repudiation: data creator cannot repudiate created data
• Advanced features: proofs of knowledge/computation
Alice Bob
"secret
message"
Untrusted
channel
(Internet)
Threat agent
A Brief History of Cryptography
● From Greek: kryptos, hidden, and graphein,
to write (i.e., “art of secret writing”)
● Ancient history: writing itself was already a
“secret technique”.
● Cryptography born in ancient society, when
writing became more common, and
hidden writing became a need.
Cryptographic prehistory
Original approach
• A battle of wits between
• cryptographers: ideate a secret
method to obfuscate a text
• cryptanalysts: figure out the
method, break the “cipher”
• Bellaso (1553) [1] separates the
encryption method from the key
In this course
• Definitions of ciphers as components with functionalities
• How to obtain confidentiality, integrity, data/origin authentication
• An overview of protocols (combinations of ciphers)
• Goal: be able to use cryptographic components properly
https://xkcd.com/221/
A word on randomness
• Randomness (in this course) characterizes a generative process
• Stating: “00101 is a random string” actually makes little sense
Data
• Plaintext space P: set of possible messages ptx 2 P
• Old times: words in some human-readable alphabet, modern times {0, 1}l
• Ciphertext space C: set of possible ciphertext ctx 2 C
• Usually {0, 1}l 0 , not necessarily l = l 0 (ciphertexts may be larger)
• Key space K: set of possible keys
• {0, 1} , keys with special formats are derived from bitstrings
Functions
• Encryption function E : P ⇥ K ! C
• Decryption function D : C ⇥ K ! P
• Correctness: for all ptx 2 P, we need k, k 0 2 K s.t. D(E(ptx, k), k 0 ) = ptx
Decrypted
Plaintext Plaintext
Encryption Decryption
key key
Ciphertext Ciphertext
Goal
• Prevent anyone not authorized from being able to understand data
Goal
• Prevent anyone not authorized from being able to understand data
key
secret
#$%#$fdasd
"Hello
Bob" E hasd4hhel3
45489dsf57
symmetric
plaintext encryption
ciphertext
function
over
untrusted
channel
Symmetric Encryption
Alice Bob
key key
secret secret
#$%#$fdasd
"Hello D "Hello
Bob" E hasd4hhel3
bob"
45489dsf57
symmetric symmetric
plaintext encryption decryption plaintext
ciphertext
function function
over
untrusted
channel
Symmetric Encryption
Alice Bob
key key
Trusted
secret secret
channel
#$%#$fdasd
"Hello D "Hello
Bob" E hasd4hhel3
bob"
45489dsf57
symmetric symmetric
plaintext encryption decryption plaintext
ciphertext
function function
over
untrusted
channel
Symmetric Encryption
● The basic idea of encryption
○ Use key K to encrypt plaintext in ciphertext
○ Use same key K to decrypt ciphertext in plaintext
● Synonyms: shared key encryption, secret
key encryption
● Issue: how do we agree on the key?
○ Cannot send key on same channel as message!
○ Off-band transmission mechanism needed
● Issue: scalability
● A symmetric algorithm is a cocktail...
First ingredient: substitution
H A L L O
ciphertext
E V E R
Toy example (matrix):
Y O N E !
○ Write by rows, read by columns
○ Key: K = (R, C) with R * C ~ len(msg)
Many issues (it’s a toy example!)
Example - Diffusion
H A L L O
E V E R
Y O N E !
m= HALLO EVERYONE!
k=(3,5)
c=H YAEOLVNLEEOR!
Example - Diffusion
H A L L O
m= HALLO
k=(3,5)
c=H A L L O
R * C >> len(msg)
15 >> 4
Second ingredient: transposition
Transposition (or diffusion) means “swapping
the values of given bits” plaintext
H A L L O
ciphertext
E V E R
Toy example (matrix):
Y O N E
○ Write by rows, read by columns
○ Key: K = (R, C) with R * C ~ len(msg)
Many issues (it’s a toy example!):
○ Keyspace still relatively small
Second ingredient: transposition
Transposition (or diffusion) means “swapping
the values of given bits” plaintext
H A L L O
ciphertext
E V E R
Toy example (matrix):
Y O N E
○ Write by rows, read by columns
○ Key: K = (R, C) with R * C ~ len(msg)
Many issues (it’s a toy example!):
○ Keyspace still relatively small
But repetitions and structure gone
○ We now really need to test all possible structures
Perfectly secure cipher
Definition
• In a perfect cipher, for all ptx 2 P and ctx 2 C,
Pr(ptx sent = ptx) = Pr(ptx sent = ptx | ctx sent = ctx)
• In other words: seeing a ciphertext c 2 C gives us no information on what the
plaintext corresponding to c could be
Question
• The definition is not constructive! Does a perfect cipher exist?
• If yes, what does it look like?
A concrete apparatus
• Gilbert Vernam actually patented a
telegraphic machine implementing
ptx k on Baudot code in 1919
• Joseph Mauborgne suggested the use
of a random tape containing k
• Using Vernam’s encrypting machine
with Mauborgne’s suggestion
implements a perfect cipher
Key storage/management
• storing key material and changing keys
is a nightmare
• perfect cipher broken in practice due
to key theft/reuse
• generating random keys was also an
issue (and caused breaks)
Photo courtesy of Cryptomuseum.com
PERFECT OR NOT ?
The Zip Example
Algorithm: C = K xor M
- K(hex) = AA BB CC DD .. .. .. .. .. .. .. (repeat the key)
- M(hex) = 50 4B 03 04 BA DA 55 55 .. .. .. (and so on)
XOR
- C(hex) = FA F0 CF D9 10 61 99 88 .. .. .. .. .. .. ..
- C = K xor M
- K = M xor C
- K = X X X X.... xor FA F0 CF D9....
The Zip Example
Algorithm: C = K xor M
- K(hex) = AA BB CC DD .. .. .. .. .. .. .. (repeat the key)
- M(hex) = 50 4B 03 04 BA DA 55 55 .. .. .. (and so on)
XOR
- C(hex) = FA F0 CF D9 10 61 99 88 .. .. .. .. .. .. ..
- K = M xor C
- K = 50 4B 03 04 xor FA F0 CF D9 = AA BB CC DD ->
- ATTACK?
The Zip Example
Algorithm: C = K xor M
- K(hex) = AA BB CC DD .. .. .. .. .. .. .. (repeat the key)
- M(hex) = 50 4B 03 04 BA DA 55 55 .. .. .. (and so on)
XOR
- C(hex) = FA F0 CF D9 10 61 99 88 .. .. .. .. .. .. ..
- K = M xor C
- K = 50 4B 03 04 xor FA F0 CF D9 = AA BB CC DD ->
- KNOWN PLAINTEXT ATTACK
Computationally secure cryptography
Definition
A CSPRNG is a deterministic function prng: {0, 1} ! {0, 1} +l whose output cannot
be distinguished from an uniform random sampling of {0, 1} +l in O(poly( )). l is the
CSPRNG stretch.
Existence
• In practice, we have only candidate CSPRNGs
• We have no proof that a function prng exists
• Proving that a CSPRNG exists implies directly P6=NP
Practical constructions
• Building a CSPRNG “from scratch” is possible, but it is not the way they are
commonly built (not efficient)
• Practically built with another building block: PseudoRandom Permutations
(PRPs)
• defined starting from PseudoRandom Functions (PRFs)
Definition
• A function prfseed : {0, 1}in ! {0, 1}out taking an input and a
bits seed.
• The entire prfseed is described by the value of the seed
• It cannot be told apart from a random
f 2 {f : {0, 1}in ! {0, 1}out } in poly( )
• That is, if they give you a 2 {f : {0, 1}in ! {0, 1}out }, you
can’t tell which one of the following is true
• a = prfseed (·) with seed $ {0, 1}
• b $ F, where F = {f : {0, 1}in ! {0, 1}out }
Operatively speaking
• acts on a block of bits outputs another one of the same size
• the output “looks unrelated” to the input
• its action is fully identified by the seed
• Useful to think of the seed as a key
The issue
• No formally proven PRP exists, yet
• again, its existence would imply P6=NP
Typical construction
1 Compute a small bijective Boolean function f of input and key
Basic Solution ?
Keyspace and Brute Forcing
Keyspace generally measured in bits
● Attack time exponential on the number of
bits (i.e., 33 bits need twice the time of 32)
● Need to balance computational power vs key
length.
Keyspace vs. Time for Brute Forcing
Quantifying computational unfeasibility
ptx0
k Enc
ctx0
No deterministic encryption
• The CTR mode of operation is insecure against CPA
• The encryption is deterministic: same ptxs ! same ctx
Getting it right
• The construction takes the name from the mechanical component: it is not
possible to roll-back the procedure once you delete the value carried by green
arrows
Malleability
• Making changes to the ciphertext (not knowing the key) maps to predictable
changes in the plaintext
• Think about AES-CTR and AES-ECB
• Can be creatively abused to build decryption attacks
• Can be turned into a feature (homomorphic encryption)
Confidentiality 6) Integrity
• Up to now our encryption schemes provide confidentiality
• Changes in the ciphertext are undetected (at best)
Definition
• A MAC is constituted by a pair of functions:
• compute tag(string,key): returns the tag for the input string
• verify tag(string,tag,key): returns true or false
• Ideal attacker model:
• knows as many message-tag pairs as he wants
• cannot forge a valid tag for a message for which he does not know it already
• forgery also includes tag splicing from valid messages
• N.B. the tag creating entity and the verifying entity must both know the same
secret key
• The tag verifier is able to create a valid tag too
• ... and there goes the non-repudiation property
tag
Browser cookies
• HTTP cookies are a “note to self” for the HTTP servera
• The note should not be tampered between server reads
• Solution: server runs compute tag(cookie,k) and stores both the (cookie,tag)
a
You can find a two slides cookies refresher at slides 40-41 of
https://polimi365-my.sharepoint.com/:b:
/r/personal/10032133_polimi_it/Documents/FCI/2-Livello_Applicativo_v2020.pdf
Testing integrity
• Testing the integrity of a file requires us to compare it bit by bit with an intact
copy or read it entirely to compute a MAC
• It would be fantastic to test only short, fixed length strings independently from
the file size, representing the file itself
• Major roadblock: there is a lower bound to the number of bits to encode a given
content without information loss
• Can we build something close to the ideal scenario?
yes
h %^682erhf? KO
"Hello h = h'
H 348754
Bob" no
H
Untrusted Channel plaintext "Hello
bob"
What is a Hash Function
A function H( ) that maps arbitrary-length input x
on fixed-length output, h
● Need to be Fast
● Collisions: codomain “smaller” than domain.
What to use
• SHA-2 was privately designed (NSA), d 2 {256, 384, 512}
• SHA-3 followed a public design contest (similar to AES), selected among ⇡ 60
candidates, d 2 {256, 384, 512}
• Both currently unbroken and widely standardized (NIST, ISO)
Pseudonymized match
• Store/compare hashes instead of values (e.g., Signal contact discovery)
MACs
• Building MACs: generate tag hashing together the message and a secret string,
verify tag recomputing the same hash
• A field-proven way of combining message and secret is HMAC
• Standardized (RFC 2104, NIST FIPS 198)
• Uses a generic hash function as a plug-in, combination denoted as HMAC-hash name
• HMAC-SHA1 (!), HMAC-SHA2 and HMAC-SHA3 are ok
Forensic use
• Write down only the hash of the disk image you obtained in official documents
Goal
• Make two parties share secret value w/ only public messages
Attacker model
• Can eavesdrop anything, but not tamper
• The Computational Diffie-Hellman assumption should hold
CDH Assumption
• Let (G, ·) ⌘ hg i be a finite cyclic group, and two numbers a, b sampled unif. from
{0, . . . , |G| - 1} ( = len(a) ⇡ log2 |G|)
• given g a , g b finding g ab costs more than poly(log |G|)
• Best current attack approach: find either b or a (discrete log problem)
Alice Bob
YA "6"
Untrusted
channel
"3" YB
XA XB
"6" "3"
How does D-H work (3)
Alice Bob
YA "6"
Untrusted
channel
"3" YB
XA 6? 3? XB
"6" "3"
How does D-H work (4) - Secret
At this point, they can compute a secret K
● Since
● Alice
● Bob
Alice Bob
Untrusted
channel
te
c
va
bli
te
lic
pri
va
pu
b
pri
pu
SA PA
SB PB
key pair
key pair
Public key encryption: Key Exchange
Alice Bob
PA "here is my public key"
Untrusted
channel
"here is my public key" PB
te
c
va
bli
te
lic
pri
va
pu
b
pri
pu
SA PA
SB PB
key pair
key pair
Trust assumption:
PB SB
only Bob knows his private key
#$%#$fdasd
"Hello D "Hello
Bob" E hasd4hhel3
bob"
45489dsf57
asymmetric asymmetric
plaintext encryption decryption plaintext
ciphertext
function function
over
untrusted
channel
Exercise: what is this instead?
Trust assumption:
Everybody knows Alice's public key
only Alice knows her private key
SA PA
#$%#$fdasd
"Hello D "Hello
Bob" E hasd4hhel3
bob"
45489dsf57
asymmetric asymmetric
plaintext encryption decryption plaintext
ciphertext
function function
over
untrusted
channel
Public Key Encryption
Plaintext Ciphertext
Ciphertext Plaintext
Components
• Di↵erent keys are employed in encryption/decryption
• It is computationally hard to:
• Decrypt a ciphertext without the private key
• Compute the private key given only the public key
Computational hardness
• Up to now, enumeration of the secret parameter was the best possible attack
• This is ok for modern block ciphers ! best attack: O(2 )
• Asymmetric cryptosystems rely on hard problems for which bruteforcing the secret
parameter is not the best attack
⇣ 1 2
⌘
• Factoring a bit number takes O e k( ) 3 (log( )) 3
Assumption
• A public channel between Alice and Bob is available
• For the moment, the attacker model is “eavesdrop only”
Recipient
Random
public Asymm. symmetric Symmetric
encryption Encryption key Encryption
key
Encrypted
Ciphertext
random key
Motivations
• To build a secure hybrid encryption scheme we need to be sure that the public key
the sender uses is the one of the recipient
• We’d like to be able to verify the authenticity of a piece of data without a
pre-shared secret
Digital signatures
• Provide strong evidence that data is bound to a specific user
• No shared secret is needed to check (validate) the signature
• Proper signatures cannot be repudiated by the user
• They are asymmetric cryptographic algorithms
• formally proven that you cannot get non repudiation otherwise
Signature Verification
Sign key Key(pair) key Verify
(private)
generation
(public)
SA PA
#$%#$fdasd
"Hello D "Hello
Bob" E hasd4hhel3
bob"
45489dsf57
asymmetric asymmetric
plaintext encryption decryption plaintext
function ciphertext function
over untrusted channel
Digital signature: Authentication and Integrity
Trust assumption:
only Alice knows her private key Everybody knows Alice's public key
SA PA
Integrity: OK!
yes
h %^682erhf? h
"Hello
H 348754 D h = h' KO
Bob" E no
h'
asymmetric asymmetric
plaintext hash
encryption decryption
function
H
function function
signature + plaintext
over untrusted channel
"Hello
bob"
plaintext
Widespread Signature schemes
Authenticating users
• Alternative to password-based login to a system
• The server has the user’s public verification key (e.g. deposited at account creation)
• The server asks the client to sign a long randomly generated bitstring (challenge)
• If the client returns a correctly signed challenge, it has proven its identity to the
server
Cautionary note
• Both in asymmetric encryption and digital signatures, the public key must be
bound to the correct user identity
• If public keys are not authentic:
• A MITM attack is possible on asymmetric encryption
• Anyone can produce a signature on behalf of anyone else
• The public key authenticity is guaranteed with... another signature
• We need someone to sign the public-key/identity pair
• We need a format to distribute signed pairs
Digital certificates
• They bind a public key to a given identity, which is:
• for humans: an ASCII string
• for machines: either the CNAME or IP address
• They specify the intended use for the public key contained
• Avoids ambiguities when a key format is ok for both an encryption and a signature
algorithm
• They contain a time interval in which they are valid
• Most widely deployed format is described in ITU X.509
Bob CA
PB
SB
Bob's ID card + public key
Bob's Digital Certificate (2)
Trust assumption:
SCA
only the CA knows its private key
Identity
(DN)
“Bob”
Public Key H E
PB
asymmetric
hash
encryption
CA's digital function
function
signature
Retrieving Bob's Certificate
Alice CA
Bob
I need Bob's public key
PB
Trust assumption:
SB
only Bob knows his private key
Untrusted
"Hello D "Hello
Bob" E bob"
channel
asymmetric asymmetric
plaintext encryption decryption plaintext
function function
Zoom in: Is the public key valid?
Alice CA
PB
Identity
(DN)
“Bob”
Public Key
PB
CA's digital
Valid?
signature
PKI
● A PKI uses a trusted third party called a
certification authority (CA)
● The CA digitally signs files called digital
certificates, which bind an identity to a
public key
○ Identity = “Distinguished Name (DN)”
○ As defined in the X.509 standard (most used one)
● Now we can recognize a number of
subjects...provided that we can obtain the
public key of the CA
Zoom in: Is the public key valid?
Alice CA
PB
Identity
(DN)
“Bob”
Public Key
PB
CA's digital
Valid?
signature
PB
Identity
(DN)
“Bob”
Public Key
PB
CA's digital
Valid?
signature
PB
Identity
(DN)
“Bob”
Public Key
PB
CA's digital
Valid?
signature
PB
Identity
(DN)
“Bob”
Public Key
PB
CA's digital
Valid?
signature
Trusted storage
Subject
Root CA
CA1.com
Subject
public key
Issuer
CA1.com
Signature
made by CA1.com
Subsidiary
Subject Subject Subject
CA2.com CA3.com CA4.com
An authority releases it
● the state
● a regulator
● the organization management
An authority releases it
● the state
● a regulator
● the organization management
Let
me
sig
ny
our
cer
t
PKCS#7 Envelope
to recipient
Les jeux sont faits: Lupin's Certficate
Document
Certi�cation Sender
Authority Recipient (public) actions Random
actions encryption key symmetric Plaintext
and identity key
Recipient
Random
Signing (private) public Asymm. symmetric Symmetric
key of the CA Sign encryption Encryption key Encryption
key
Encrypted
Recipient certificate Ciphertext
random key
Information Information
Encoder Channel Decoder
source destination
Basics
• A communication takes place between two endpoints
• sender: made of an information source and an encoder
• receiver: made of an information destination and a dencoder
• Information is carried by a channel in the form of a sequence of symbols of a finite
alphabet
Desirable properties
• Non negative measure of uncertainty
• “combining uncertainties” should map to adding entropies
Definition
• Let X be a discrete r.v. with n outcomes in {x0 , . . . , xn-1 } with Pr(X = xi ) = pi
for all 0 6 i 6 n
P
• The entropy of X is H(X) = n-1 i=0 -pi logb (pi )
• The measurement unit of entropy depends on the base b of the logarithm: typical
case for b = 2 is bits
https://xkcd.com/1210/
Statement (informal)
It is possible to encode the outcomes n of i.i.d. random variables, each one with
entropy H(X), into no less than nH(X) bits per outcome. If < nH(X) bits are used,
some information will be lost.
Consequences
• Arbitrarily compression of bitstrings is impossible without loss
• Cryptographic hashes must discard some information
• Guessing a piece of information (= one outcome of X) is at least as hard as
guessing a H(X) bit long bitstring
• overlooking for a moment the e↵ort of decoding the guess
A practical mismatch
• It is possible to have distributions with the same entropies
John Nash.
Personal communication to the us national security agency, Feb. 1955.
Claude E. Shannon.
A mathematical theory of communication.
Bell Syst. Tech. J., 27(3 and 4):379–423 and 623–656, 1948.
Claude E. Shannon.
Communication theory of secrecy systems.
Bell Syst. Tech. J., 28(4):656–715, 1949.
Marc Stevens.
The hashclash project, 2009.
Marc Stevens, Elie Bursztein, Pierre Karpman, Ange Albertini, and Yarik Markov.
The first collision for full SHA-1.
In Jonathan Katz and Hovav Shacham, editors, Advances in Cryptology - CRYPTO 2017 - 37th Annual International Cryptology Conference,
Santa Barbara, CA, USA, August 20-24, 2017, Proceedings, Part I, volume 10401 of Lecture Notes in Computer Science, pages 570–596.
Springer, 2017.
Marc Stevens, Alexander Sotirov, Jacob Appelbaum, Arjen K. Lenstra, David Molnar, Dag Arne Osvik, and Benne de Weger.
Short chosen-prefix collisions for MD5 and the creation of a rogue CA certificate.
In Shai Halevi, editor, Advances in Cryptology - CRYPTO 2009, 29th Annual International Cryptology Conference, Santa Barbara, CA, USA,
August 16-20, 2009. Proceedings, volume 5677 of Lecture Notes in Computer Science, pages 55–69. Springer, 2009.