ILLC Project Course in Information Theory
Crash course
13 January 17 January 2014
12:00 to 14:00
Monday
Probability theory
Uncertainty and coding
Student presentations
27 January 31 January 2014
12:00 to 14:00
Tuesday
The weak law of large numbers
The source coding theorem
Location
ILLC, room F1.15,
Science Park 107, Amsterdam
Wednesday
Random processes
Arithmetic coding
Materials
informationtheory.weebly.com
Thursday
Divergence
Kelly Gambling
Contact
Mathias Winther Madsen
[email protected]
Friday
Kolmogorov Complexity
The limits of statistics
PLAN
Some combinatorical
preliminaries
Turing machines
Kolmogorov complexity
The universality of
Kolmogorov complexity
The equivalence of
Kolmogorov complexity
and coin flipping entropy
Monkeys with typewriters
PLAN
Some combinatorical
preliminaries:
Factorials
Stirling's approximation
Binomial coefficients
The binary entropy
approximation
There are 3 2 1 ways to sort three letters:
ABC, ACB, BAC, BCA, CAB, CBA
Notation:
n! == 1 2 3 n
or n factorial.
The natural logarithm of a factorial can be
approximated by Stirling's approximation,
ln(n!) == n ln n n
The error of this approximation grows
slower than linearly.
n
10
20
30
40
50
ln(n!)
15.1
42.3
74.6
110.3
148.5
Stir(n)
13.0
40.0
72.0
107.6
145.6
Sproof:
The anti-derivative of ln(x) is x ln(x) x.
William Feller: An Introduction to
Probability Theory and its Applications (1950)
There are
24
4 3 ==
==
== 12
2
2!
4!
ways to put two objects into four boxes:
1
1
2
1
2
1
2
2
1
If the objects are identical, the number of
options is a factor of 2! smaller:
24
4!
==
4
2! 2!
*
== 6
*
*
*
*
*
*
In general, the number of ways to put k
identical objects into n distinct boxes is
n
k
n!
==
(n k)! k!
This is the binomial coeffi cient, or
n choose k.
For a nice introduction, see the fi rst chapter of
Victor Bryant: Aspects of Combinatorics (1993)
When applied to a binomial coeffi cient,
Stirling's approximation gives
ln
n
k
k
== n H2
n
H2 is here the binary entropy function
measured in nats (natural units, 1.44 bits).
The error grows slower than a linearly.
Example: n = 40, k = 20
137846528820
log
== 25.649 27.726.
1099511627776
PLAN
Some combinatorical
preliminaries
Turing machines
Kolmogorov complexity
The universality of
Kolmogorov complexity
The equivalence of
Kolmogorov complexity
and coin flipping entropy
Monkeys with typewriters
Shortest description?
001001001001.
1100111100111111001111111100.
01011011101111011111.
0100011011000001010011100101110111.
101100010111001000010111111101.
The Kolmogorov Complexity of a fi nite string
is the length of the shortest program which will
print that string.
001001001001.
1100111100111111001111111100.
01011011101111011111.
0100011011000001010011100101110111.
101100010111001000010111111101.
The Turing Machine
Theorem:
There are universal machines.
Sproof:
Consequence:
The Kolmogorov complexity of a string
on two different universal machines
differs only by the length of the
longest simulation program:
KM1(x) KM2(x) == c(M1, M2)
(And constants are sublinear.)
The Kolmogorov Complexity of a fi nite string is
the length of the shortest program which will
print that string.
001001001001001001 001001001.
11001111001111110011111111 00.
01011011101111011111 11111111.
010001101100000101001110010 1.
10110001011100100001011111 01.
Theorem:
Most strings don't have any structure.
Proof:
There are 2n strings of length n, and
1 + 2 + 4 + 8 + 16 + + 2n 1
programs of length < n.
PrintString(n, k, i):
construct all string of length n
select the ones containing k 1s
print the ith of those strings.
n = 10,
k = 3,
i = 13.
1 0 1 0 , 1 1 , 1 1 0 1
11 00 11 00 01 11 11 01 11 11 00 11
n = 10,
k = 3,
i = 13.
1 0 1 0 , 1 1 , 1 1 0 1
11 00 11 00 01 11 11 01 11 11 00 11
2 + 4 + 2
Stirling's approximation for a binomial
coeffi cient is
ln
n
k
k
== n H2
n
H2 is here the binary entropy function
measured in nats (natural units, 1.44 bits).
The error grows slower than a linearly.
So
K(x)
k
2n H2
n
where is sublinear.
(n)
For coin fl ipping sequences, Kolmogorov
complexity is equal to Shannon entropy,
plus or minus a sublinear term.
For other sequences, Kolmogorov complexity is smaller than the Shannon entropy of
the string if modeled as as a coin fl ipping
sequence.
Conclusion: Coin fl ipping is the worst.
Random
monkey
Universal
machine
Finite string
(or nothing)
Ray J. Solomonoff: A formal theory of inductive
inference, Information and control, 1964.
Again, it doesn't matter which universal
machine you use.
The universal probability of a string x is
close to 2K(x).
For long strings, fi nding the probability of
the shortest description is thus as good as
summing up the probability of all
descriptions.