|
| 1 | +<!--?title Montgomery Multiplication --> |
| 2 | +# Montgomery Multiplication |
| 3 | + |
| 4 | +Many algorithms in number theory, like prime testing or factorization, and in cryptography, like RSA, require lots of operations modulo a large number. |
| 5 | +A multiplications like $x y \bmod{n}$ is quite slow to compute with the typical algorithms, since it requires a division to know how many times $n$ has to be subtracted from the product. |
| 6 | +And division is a really expensive operation, especially with big numbers. |
| 7 | + |
| 8 | +The **Montgomery (modular) multiplication** is a method that allows computing such multiplications faster. |
| 9 | +Instead of dividing the product and subtracting $n$ multiple times, it adds multiples of $n$ to cancel out the lower bits and then just discards the lower bits. |
| 10 | + |
| 11 | +## Montgomery representation |
| 12 | + |
| 13 | +However the Montgomery multiplication doesn't come for free. |
| 14 | +The algorithm works only in the **Montgomery space**. |
| 15 | +And we need to transform our numbers into that space, before we can start multiplying. |
| 16 | + |
| 17 | +For the space we need a positive integer $r \ge n$ coprime to $n$, i.e. $\gcd(n, r) = 1$. |
| 18 | +In practice we always choose $r$ to be $2^m$ for a positive integer $m$, since multiplications, divisions and modulo $r$ operations can then be efficiently implemented using shifts and other bit operations. |
| 19 | +$n$ will be an odd number in pretty much all applications, since it is not hard to factorize an even number. |
| 20 | +So every power of $2$ will be coprime to $n$. |
| 21 | + |
| 22 | +The representative $\bar{x}$ of a number $x$ in the Montgomery space is defined as: $$\bar{x} := x \cdot r \bmod n$$ |
| 23 | + |
| 24 | +Notice, the transformation is actually such a multiplication that we want to optimize. |
| 25 | +So this is still an expensive operation. |
| 26 | +However you only need to transform a number once into the space. |
| 27 | +As soon as you are in the Montgomery space, you can perform as many operations as you want efficiently. |
| 28 | +And at the end you transform the final result back. |
| 29 | +So as long as you are doing lots of operations modulo $n$, this will be no problem. |
| 30 | + |
| 31 | +Inside the Montgomery space you can still perform most operations as usual. |
| 32 | +You can add two elements ($x \cdot r + y \cdot r \equiv (x + y) \cdot r \bmod n$), subtract, check for equality, and even compute the greatest common multiple of a number with $n$ (since $\gcd(n, r) = 1$). |
| 33 | +All with the usual algorithms. |
| 34 | + |
| 35 | +However this is not the case for multiplication. |
| 36 | +We expect the result to be: |
| 37 | +$$\bar{x} * \bar{y} = \overline{x \cdot y} = (x \cdot y) \cdot r \bmod n.$$ |
| 38 | +But the normal multiplication will give us: |
| 39 | +$$\bar{x} \cdot \bar{y} = (x \cdot y) \cdot r \cdot r \bmod n.$$ |
| 40 | +Therefore the multiplication in the Montgomery space is defined as: |
| 41 | +$$\bar{x} * \bar{y} := \bar{x} \cdot \bar{y} \cdot r^{-1} \bmod n.$$ |
| 42 | + |
| 43 | +## Montgomery reduction |
| 44 | + |
| 45 | +The multiplication of two numbers in the Montgomery space requires an efficient computation of $x \cdot r^{-1} \bmod n$. |
| 46 | +This operation is called the **Montgomery reduction**, and is also known as the algorithm **REDC**. |
| 47 | + |
| 48 | +Because $\gcd(n, r) = 1$, we know that there are two numbers $r^{-1}$ and $n^{\prime}$ with $0 < r^{-1}, n^{\prime} < n$ with |
| 49 | +$$r \cdot r^{-1} + n \cdot n^{\prime} = 1.$$ |
| 50 | +Both $r^{-1}$ and $n^{\prime}$ can be computed using the [Extended Euclidean algorithm](./algebra/extended-euclid-algorithm.html). |
| 51 | + |
| 52 | +Using this identity we can write $x \cdot r^{-1}$ as: |
| 53 | +$$\begin{aligned} |
| 54 | +x \cdot r^{-1} &= x \cdot r \cdot r^{-1} / r = x \cdot (-n \cdot n^{\prime} + 1) / r \\\\ |
| 55 | +&= (-x \cdot n \cdot n^{\prime} + x) / r \equiv (-x \cdot n \cdot n^{\prime} + l \cdot r \cdot n + x) / r \bmod n\\\\ |
| 56 | +&\equiv ((-x \cdot n^{\prime} + l \cdot r) \cdot n + x) / r \bmod n\\\\ |
| 57 | +\end{aligned}$$ |
| 58 | + |
| 59 | +The equivalences hold for any arbitrary integer $l$. |
| 60 | +This means, that we can add or subtract an arbitrary multiple of $r$ to $x \cdot n^{\prime}$, or in other words, we can compute $q := x \cdot n^{\prime}$ modulo $r$. |
| 61 | + |
| 62 | +This gives us the following algorithm to compute $x \cdot r^{-1} \bmod n$: |
| 63 | + |
| 64 | +```text |
| 65 | +function reduce(x): |
| 66 | + q = (x mod r) * n' mod r |
| 67 | + a = (x - q * n) / r |
| 68 | + if a < 0: |
| 69 | + a += n |
| 70 | + return a |
| 71 | +``` |
| 72 | + |
| 73 | +Since $x < n \cdot n < r \cdot n$ (even if $x$ is the product of a multiplication) and $q \cdot n < r \cdot n$ we know that $-n < (x - q \cdot n) / r < n$. |
| 74 | +Therefore the final modulo operation is implemented using a single check and one addition. |
| 75 | + |
| 76 | +As we see, we can perform the Montgomery reduction without any heavy modulo operations. |
| 77 | +If we choose $r$ as a power of $2$, the modulo operations and divisions in the algorithm can be computed using bitmasking and shifting. |
| 78 | + |
| 79 | +A second application of the Montgomery reduction is to transfer a number back from the Montgomery space into the normal space. |
| 80 | + |
| 81 | +## Fast inverse trick |
| 82 | + |
| 83 | +For computing the inverse $n^{\prime} := n^{-1} \bmod r$ efficiently, we can use the following trick (which is inspired from the Newton's method): |
| 84 | +$$a \cdot x \equiv 1 \bmod 2^k \Longrightarrow a \cdot x \cdot (2 - a \cdot x) \equiv 1 \bmod 2^{2k}$$ |
| 85 | +This can easily be proven. |
| 86 | +If we have $a \cdot x = 1 + m \cdot 2^k$, then we have: |
| 87 | +$$\begin{aligned} |
| 88 | +a \cdot x \cdot (2 - a \cdot x) &= 2 \cdot a \cdot x - (a \cdot x)^2 \\\\ |
| 89 | +&= 2 \cdot (1 + m \cdot 2^k) - (1 + m \cdot 2^k)^2 \\\\ |
| 90 | +&= 2 + 2 \cdot m \cdot 2^k - 1 - 2 \cdot m \cdot 2^k - m^2 \cdot 2^{2k} \\\\ |
| 91 | +&= 1 - m^2 \cdot 2^{2k} \\\\ |
| 92 | +&\equiv 1 \bmod 2^{2k}. |
| 93 | +\end{aligned}$$ |
| 94 | + |
| 95 | +This means we can start with $x = 1$ as the inverse of $a$ modulo $2^1$, apply the trick a few times and in each iteration we double the number of correct bits of $x$. |
| 96 | + |
| 97 | +## Implementation |
| 98 | + |
| 99 | +Using the GCC compiler we can compute $x \cdot y \bmod n$ still efficiently, when all three numbers are 64 bit integer, since the compiler supports 128 bit integer with the types `__int128` and `__uint128`. |
| 100 | + |
| 101 | +```cpp |
| 102 | +long long result = (__int128)x * y % n; |
| 103 | +``` |
| 104 | + |
| 105 | +However there is no type for 256 bit integer. |
| 106 | +Therefore we will here show an implementation for a 128 bit multiplication. |
| 107 | + |
| 108 | +```cpp |
| 109 | +using u64 = uint64_t; |
| 110 | +using u128 = __uint128_t; |
| 111 | +using i128 = __int128_t; |
| 112 | + |
| 113 | +struct u256 { |
| 114 | + u128 high, low; |
| 115 | + |
| 116 | + static u256 mult(u128 x, u128 y) { |
| 117 | + u64 a = x >> 64, b = x; |
| 118 | + u64 c = y >> 64, d = y; |
| 119 | + // (a*2^64 + b) * (c*2^64 + d) = |
| 120 | + // (a + c) * 2^128 + (a*d + b*c)*2^64 + (b*d) |
| 121 | + u128 ac = (u128)a * c; |
| 122 | + u128 ad = (u128)a * d; |
| 123 | + u128 bc = (u128)b * c; |
| 124 | + u128 bd = (u128)b * d; |
| 125 | + u128 carry = (u128)(u64)ad + (u128)(u64)bc + (bd >> 64u); |
| 126 | + u128 high = ac + (ad >> 64u) + (bc >> 64u) + (carry >> 64u); |
| 127 | + u128 low = (ad << 64u) + (bc << 64u) + bd; |
| 128 | + return {high, low}; |
| 129 | + } |
| 130 | +}; |
| 131 | + |
| 132 | +struct Montgomery { |
| 133 | + Montgomery(u128 n) : mod(n), inv(1) { |
| 134 | + for (int i = 0; i < 7; i++) |
| 135 | + inv *= 2 - n * inv; |
| 136 | + } |
| 137 | + |
| 138 | + u128 init(u128 x) { |
| 139 | + x %= mod; |
| 140 | + for (int i = 0; i < 128; i++) { |
| 141 | + x <<= 1; |
| 142 | + if (x >= mod) |
| 143 | + x -= mod; |
| 144 | + } |
| 145 | + return x; |
| 146 | + } |
| 147 | + |
| 148 | + u128 reduce(u256 x) { |
| 149 | + u128 q = x.low * inv; |
| 150 | + i128 a = x.high - u256::mult(q, mod).high; |
| 151 | + if (a < 0) |
| 152 | + a += mod; |
| 153 | + return a; |
| 154 | + } |
| 155 | + |
| 156 | + u128 mult(u128 a, u128 b) { |
| 157 | + return reduce(u256::mult(a, b)); |
| 158 | + } |
| 159 | + |
| 160 | + u128 mod, inv; |
| 161 | +}; |
| 162 | +``` |
| 163 | +
|
| 164 | +## Fast transformation |
| 165 | +
|
| 166 | +The current method of transforming a number into Montgomery space is pretty slow. |
| 167 | +There are faster ways. |
| 168 | +
|
| 169 | +You can notice the following relation: |
| 170 | +$$\bar{x} := x \cdot r \bmod n = x \cdot r^2 / r = x * r^2$$ |
| 171 | +Transforming a number into the space is just a multiplication inside the space of the number with $r^2$. |
| 172 | +Therefore we can precompute $r^2 \bmod n$ and just perform a multiplication instead of shifting the number 128 times. |
| 173 | +
|
| 174 | +In the following code we initialize `r2` with `-n % n`, which is equivalent to $r - n \equiv r \bmod n$, shift it 4 times to get $r \cdot 2^4 \bmod n$. |
| 175 | +This number can be interpreted as $2^4$ in Montgomery space. |
| 176 | +If we square it $5$ times, we get $(2^4)^{2^5} = (2^4)^{32} = 2^{128} = r$ in Montgomery space, which is exactly $r^2 \bmod n$. |
| 177 | +
|
| 178 | +``` |
| 179 | +struct Montgomery { |
| 180 | + Montgomery(u128 n) : mod(n), inv(1), r2(-n % n) { |
| 181 | + for (int i = 0; i < 7; i++) |
| 182 | + inv *= 2 - n * inv; |
| 183 | + |
| 184 | + for (int i = 0; i < 4; i++) { |
| 185 | + r2 <<= 1; |
| 186 | + if (r2 >= mod) |
| 187 | + r2 -= mod; |
| 188 | + } |
| 189 | + for (int i = 0; i < 5; i++) |
| 190 | + r2 = mul(r2, r2); |
| 191 | + } |
| 192 | + |
| 193 | + u128 init(u128 x) { |
| 194 | + return mult(x, r2); |
| 195 | + } |
| 196 | + |
| 197 | + u128 mod, inv, r2; |
| 198 | +}; |
| 199 | +``` |
0 commit comments