Sieves
June 24, 2017
1 The problem
Compute all the primes ≤ N .
1.1 A trivial solution
We can loop over all the numbers from 1 to N , and for each number, if it is
prime, print it.
How many steps does this method take? Let’s assume we do the simple test for
primes (running a loop from 2 to m − 1, and checking at each step whether the
number divides m, to check whether m is prime).
This is actually a little tricky to analyse exactly, since our loop to test for
primality may terminate early the moment we find a divisor; we instead go for
a loose upper bound.
We know that the loop to test whether m is a prime will take at most m − 1
iterations, which is O(m).
over eachnumber from 1 to N , we have O(1+2+· · ·+N ),
Adding the cost to test
N ×(N −1)
which simplifies to O 2 , which is O(N 2 ) (why?).
1.2 A simple optimisation
Do we really need√to go all the way to m − 1 to know that
√ m is a prime?
No, say that q > m is a divisor of m, then m < √m = m.
√q m
So we only need to check for divisors
√ upto m.
√ √
This speeds up our solution to O 1 + 2 + · · · + N , which is a little hard
√
to analyse, so let’s replace each of those terms by N (so that we still have
√
an upper bound, however loose), and we have O N × N , which is good
improvement over O(N 2 ).
2 Need more speed
Can we do better though?
So far, we’ve optimised the idea
1
1 for ( int i = 1; i <= N ; ++ i ) {
2 if ( is_prime ( i )) {
3 cout << i << ’\ n ’;
4 /* remember , ’\ n ’ is an escape sequence
5 * that prints a newline .
6 */
7 }
8 }
The other way to approach this problem, is called a sieve; instead of looking for
factors of each number, we cross off multiples of each number.
The idea is to start off assuming that each natural number except 1 is prime,
because we haven’t found any non-trivial (different from 1 and the number itself)
divisors of it.
Then, for each number m > 1, we know that 2 × m, 3 × m, . . . are all definitely
not prime. So we can safely cross off these numbers. Let’s translate this idea
into code.
1 const int N = 1000000;
2 /* the const keyword is prefixed
3 * to variables whose values do not change
4 * during the execution of the program
5 */
6
7 bool is_prime [ N + 1];
8 is_prime [0] = false ;
9 is_prime [1] = false ;
10 for ( int m = 2; m <= N ; ++ m ) {
11 is_prime [ i ] = true ;
12 }
13 /* We initially assume that every natural number other
14 * than 1 is prime ,
15 */
16
17 for ( long long m = 2; m <= N ; ++ m ) {
18 if ( is_prime [ m ]) {
19 cout << m << ’\ n ’;
20 }
21 for ( long long j = 2 * m ; j <= N ; j += m ) {
22 is_prime [ j ] = false ;
23 /* j is of the form k * m , where k > 2
24 * so now we know for sure that j is
25 * not a prime number
26 */
27 }
28 }
2
See an animation of a sieve (slightly more complicated than the one above, but
the general idea is similar) here. Before proceeding, convince yourself that this
method is correct.
Next, we analyse how many steps this takes to run.
The loop starting in line 10, clearly runs in O(N ).
The outer loop in line 17 runs a total of N − 1 times, but the inner loop runs
N N N
1 times on its first run, 2 times on its second run, 3 times on the third, and
so on. So adding the overall work, we do
N N N
+ + ··· +
1 2 N
steps here. Simplifying, we have
N
1 1 1 X 1
N× + + ··· + =N×
1 2 N m=1
m
PN 1
m=1 m is called the N th harmonic number, and is actually O(log N ). (we will
not prove this result here, but a reader familiar with limits or some calculus
may read this thread)
This means our formula for the number of steps simplifies to O(N × log N ).
This is a significant improvement over even√ the optimised algorithm from earlier,
since log N grows much more √ slowly than N ; in fact, for N = 106 , N × log N
6 9
is only, 6 × 10 , whereas N N is 10 .
2.1 Exercises for even more speed
(don’t try to copy and paste the code from this file, it will not work)
• Do we really need to check the multiples of all m in the inner-loop, starting
on line 21? (hint: use the fundamental theorem of arithmetic to argue that
every number has at least 1 prime factor, and use this to speed up the
algorithm).
• When we are checking multiples of m, do we really need to start from
2 × m? Try starting with higher values, like 3 × m, 4 × m, and so on.
Finally, try starting with m × m.
Implementing both of these exercises together, you have what is known as the
Sieve of Eratosthenes, which actually runs in O(N log log N ). (the proof is
again, beyond the scope of this write-up, but the interested reader may read up
on the prime harmonic series)