Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 12f8e4b

Browse files
authored
Add article about Montgomery Multiplication (#411)
1 parent e3e281f commit 12f8e4b

File tree

2 files changed

+200
-0
lines changed

2 files changed

+200
-0
lines changed
Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
<!--?title Montgomery Multiplication -->
2+
# Montgomery Multiplication
3+
4+
Many algorithms in number theory, like prime testing or factorization, and in cryptography, like RSA, require lots of operations modulo a large number.
5+
A multiplications like $x y \bmod{n}$ is quite slow to compute with the typical algorithms, since it requires a division to know how many times $n$ has to be subtracted from the product.
6+
And division is a really expensive operation, especially with big numbers.
7+
8+
The **Montgomery (modular) multiplication** is a method that allows computing such multiplications faster.
9+
Instead of dividing the product and subtracting $n$ multiple times, it adds multiples of $n$ to cancel out the lower bits and then just discards the lower bits.
10+
11+
## Montgomery representation
12+
13+
However the Montgomery multiplication doesn't come for free.
14+
The algorithm works only in the **Montgomery space**.
15+
And we need to transform our numbers into that space, before we can start multiplying.
16+
17+
For the space we need a positive integer $r \ge n$ coprime to $n$, i.e. $\gcd(n, r) = 1$.
18+
In practice we always choose $r$ to be $2^m$ for a positive integer $m$, since multiplications, divisions and modulo $r$ operations can then be efficiently implemented using shifts and other bit operations.
19+
$n$ will be an odd number in pretty much all applications, since it is not hard to factorize an even number.
20+
So every power of $2$ will be coprime to $n$.
21+
22+
The representative $\bar{x}$ of a number $x$ in the Montgomery space is defined as: $$\bar{x} := x \cdot r \bmod n$$
23+
24+
Notice, the transformation is actually such a multiplication that we want to optimize.
25+
So this is still an expensive operation.
26+
However you only need to transform a number once into the space.
27+
As soon as you are in the Montgomery space, you can perform as many operations as you want efficiently.
28+
And at the end you transform the final result back.
29+
So as long as you are doing lots of operations modulo $n$, this will be no problem.
30+
31+
Inside the Montgomery space you can still perform most operations as usual.
32+
You can add two elements ($x \cdot r + y \cdot r \equiv (x + y) \cdot r \bmod n$), subtract, check for equality, and even compute the greatest common multiple of a number with $n$ (since $\gcd(n, r) = 1$).
33+
All with the usual algorithms.
34+
35+
However this is not the case for multiplication.
36+
We expect the result to be:
37+
$$\bar{x} * \bar{y} = \overline{x \cdot y} = (x \cdot y) \cdot r \bmod n.$$
38+
But the normal multiplication will give us:
39+
$$\bar{x} \cdot \bar{y} = (x \cdot y) \cdot r \cdot r \bmod n.$$
40+
Therefore the multiplication in the Montgomery space is defined as:
41+
$$\bar{x} * \bar{y} := \bar{x} \cdot \bar{y} \cdot r^{-1} \bmod n.$$
42+
43+
## Montgomery reduction
44+
45+
The multiplication of two numbers in the Montgomery space requires an efficient computation of $x \cdot r^{-1} \bmod n$.
46+
This operation is called the **Montgomery reduction**, and is also known as the algorithm **REDC**.
47+
48+
Because $\gcd(n, r) = 1$, we know that there are two numbers $r^{-1}$ and $n^{\prime}$ with $0 < r^{-1}, n^{\prime} < n$ with
49+
$$r \cdot r^{-1} + n \cdot n^{\prime} = 1.$$
50+
Both $r^{-1}$ and $n^{\prime}$ can be computed using the [Extended Euclidean algorithm](./algebra/extended-euclid-algorithm.html).
51+
52+
Using this identity we can write $x \cdot r^{-1}$ as:
53+
$$\begin{aligned}
54+
x \cdot r^{-1} &= x \cdot r \cdot r^{-1} / r = x \cdot (-n \cdot n^{\prime} + 1) / r \\\\
55+
&= (-x \cdot n \cdot n^{\prime} + x) / r \equiv (-x \cdot n \cdot n^{\prime} + l \cdot r \cdot n + x) / r \bmod n\\\\
56+
&\equiv ((-x \cdot n^{\prime} + l \cdot r) \cdot n + x) / r \bmod n\\\\
57+
\end{aligned}$$
58+
59+
The equivalences hold for any arbitrary integer $l$.
60+
This means, that we can add or subtract an arbitrary multiple of $r$ to $x \cdot n^{\prime}$, or in other words, we can compute $q := x \cdot n^{\prime}$ modulo $r$.
61+
62+
This gives us the following algorithm to compute $x \cdot r^{-1} \bmod n$:
63+
64+
```text
65+
function reduce(x):
66+
q = (x mod r) * n' mod r
67+
a = (x - q * n) / r
68+
if a < 0:
69+
a += n
70+
return a
71+
```
72+
73+
Since $x < n \cdot n < r \cdot n$ (even if $x$ is the product of a multiplication) and $q \cdot n < r \cdot n$ we know that $-n < (x - q \cdot n) / r < n$.
74+
Therefore the final modulo operation is implemented using a single check and one addition.
75+
76+
As we see, we can perform the Montgomery reduction without any heavy modulo operations.
77+
If we choose $r$ as a power of $2$, the modulo operations and divisions in the algorithm can be computed using bitmasking and shifting.
78+
79+
A second application of the Montgomery reduction is to transfer a number back from the Montgomery space into the normal space.
80+
81+
## Fast inverse trick
82+
83+
For computing the inverse $n^{\prime} := n^{-1} \bmod r$ efficiently, we can use the following trick (which is inspired from the Newton's method):
84+
$$a \cdot x \equiv 1 \bmod 2^k \Longrightarrow a \cdot x \cdot (2 - a \cdot x) \equiv 1 \bmod 2^{2k}$$
85+
This can easily be proven.
86+
If we have $a \cdot x = 1 + m \cdot 2^k$, then we have:
87+
$$\begin{aligned}
88+
a \cdot x \cdot (2 - a \cdot x) &= 2 \cdot a \cdot x - (a \cdot x)^2 \\\\
89+
&= 2 \cdot (1 + m \cdot 2^k) - (1 + m \cdot 2^k)^2 \\\\
90+
&= 2 + 2 \cdot m \cdot 2^k - 1 - 2 \cdot m \cdot 2^k - m^2 \cdot 2^{2k} \\\\
91+
&= 1 - m^2 \cdot 2^{2k} \\\\
92+
&\equiv 1 \bmod 2^{2k}.
93+
\end{aligned}$$
94+
95+
This means we can start with $x = 1$ as the inverse of $a$ modulo $2^1$, apply the trick a few times and in each iteration we double the number of correct bits of $x$.
96+
97+
## Implementation
98+
99+
Using the GCC compiler we can compute $x \cdot y \bmod n$ still efficiently, when all three numbers are 64 bit integer, since the compiler supports 128 bit integer with the types `__int128` and `__uint128`.
100+
101+
```cpp
102+
long long result = (__int128)x * y % n;
103+
```
104+
105+
However there is no type for 256 bit integer.
106+
Therefore we will here show an implementation for a 128 bit multiplication.
107+
108+
```cpp
109+
using u64 = uint64_t;
110+
using u128 = __uint128_t;
111+
using i128 = __int128_t;
112+
113+
struct u256 {
114+
u128 high, low;
115+
116+
static u256 mult(u128 x, u128 y) {
117+
u64 a = x >> 64, b = x;
118+
u64 c = y >> 64, d = y;
119+
// (a*2^64 + b) * (c*2^64 + d) =
120+
// (a + c) * 2^128 + (a*d + b*c)*2^64 + (b*d)
121+
u128 ac = (u128)a * c;
122+
u128 ad = (u128)a * d;
123+
u128 bc = (u128)b * c;
124+
u128 bd = (u128)b * d;
125+
u128 carry = (u128)(u64)ad + (u128)(u64)bc + (bd >> 64u);
126+
u128 high = ac + (ad >> 64u) + (bc >> 64u) + (carry >> 64u);
127+
u128 low = (ad << 64u) + (bc << 64u) + bd;
128+
return {high, low};
129+
}
130+
};
131+
132+
struct Montgomery {
133+
Montgomery(u128 n) : mod(n), inv(1) {
134+
for (int i = 0; i < 7; i++)
135+
inv *= 2 - n * inv;
136+
}
137+
138+
u128 init(u128 x) {
139+
x %= mod;
140+
for (int i = 0; i < 128; i++) {
141+
x <<= 1;
142+
if (x >= mod)
143+
x -= mod;
144+
}
145+
return x;
146+
}
147+
148+
u128 reduce(u256 x) {
149+
u128 q = x.low * inv;
150+
i128 a = x.high - u256::mult(q, mod).high;
151+
if (a < 0)
152+
a += mod;
153+
return a;
154+
}
155+
156+
u128 mult(u128 a, u128 b) {
157+
return reduce(u256::mult(a, b));
158+
}
159+
160+
u128 mod, inv;
161+
};
162+
```
163+
164+
## Fast transformation
165+
166+
The current method of transforming a number into Montgomery space is pretty slow.
167+
There are faster ways.
168+
169+
You can notice the following relation:
170+
$$\bar{x} := x \cdot r \bmod n = x \cdot r^2 / r = x * r^2$$
171+
Transforming a number into the space is just a multiplication inside the space of the number with $r^2$.
172+
Therefore we can precompute $r^2 \bmod n$ and just perform a multiplication instead of shifting the number 128 times.
173+
174+
In the following code we initialize `r2` with `-n % n`, which is equivalent to $r - n \equiv r \bmod n$, shift it 4 times to get $r \cdot 2^4 \bmod n$.
175+
This number can be interpreted as $2^4$ in Montgomery space.
176+
If we square it $5$ times, we get $(2^4)^{2^5} = (2^4)^{32} = 2^{128} = r$ in Montgomery space, which is exactly $r^2 \bmod n$.
177+
178+
```
179+
struct Montgomery {
180+
Montgomery(u128 n) : mod(n), inv(1), r2(-n % n) {
181+
for (int i = 0; i < 7; i++)
182+
inv *= 2 - n * inv;
183+
184+
for (int i = 0; i < 4; i++) {
185+
r2 <<= 1;
186+
if (r2 >= mod)
187+
r2 -= mod;
188+
}
189+
for (int i = 0; i < 5; i++)
190+
r2 = mul(r2, r2);
191+
}
192+
193+
u128 init(u128 x) {
194+
return mult(x, r2);
195+
}
196+
197+
u128 mod, inv, r2;
198+
};
199+
```

src/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ and adding new articles to the collection.*
3131
- [Discrete Root](./algebra/discrete-root.html)
3232
- [Primitive Root](./algebra/primitive-root.html)
3333
- [Discrete Log](./algebra/discrete-log.html)
34+
- [Montgomery Multiplication](./algebra/montgomery_multiplication.html)
3435
- **Number systems**
3536
- [Balanced Ternary](./algebra/balanced-ternary.html)
3637
- [Gray code](./algebra/gray-code.html)

0 commit comments

Comments
 (0)