@@ -1757,40 +1757,43 @@ k_mul(PyLongObject *a, PyLongObject *b)
17571757
17581758/* (*) Why adding t3 can't "run out of room" above.
17591759
1760- We allocated space for asize + bsize result digits. We're adding t3 at an
1761- offset of shift digits, so there are asize + bsize - shift allocated digits
1762- remaining. Because degenerate shifts of "a" were weeded out, asize is at
1763- least shift + 1. If bsize is odd then bsize == 2*shift + 1, else bsize ==
1764- 2*shift. Therefore there are at least shift+1 + 2*shift - shift =
1765-
1766- 2*shift+1 allocated digits remaining when bsize is even, or at least
1767- 2*shift+2 allocated digits remaining when bsize is odd.
1768-
1769- Now in bh+bl, if bsize is even bh has at most shift digits, while if bsize
1770- is odd bh has at most shift+1 digits. The sum bh+bl has at most
1771-
1772- shift digits plus 1 bit when bsize is even
1773- shift+1 digits plus 1 bit when bsize is odd
1774-
1775- The same is true of ah+al, so (ah+al)(bh+bl) has at most
1776-
1777- 2*shift digits + 2 bits when bsize is even
1778- 2*shift+2 digits + 2 bits when bsize is odd
1779-
1780- If bsize is even, we have at most 2*shift digits + 2 bits to fit into at
1781- least 2*shift+1 digits. Since a digit has SHIFT bits, and SHIFT >= 2,
1782- there's always enough room to fit the 2 bits into the "spare" digit.
1783-
1784- If bsize is odd, we have at most 2*shift+2 digits + 2 bits to fit into at
1785- least 2*shift+2 digits, and there's not obviously enough room for the
1786- extra two bits. We need a sharper analysis in this case. The major
1787- laziness was in the "the same is true of ah+al" clause: ah+al can't actually
1788- have shift+1 digits + 1 bit unless bsize is odd and asize == bsize. In that
1789- case, we actually have (2*shift+1)*2 - shift = 3*shift+2 allocated digits
1790- remaining, and that's obviously plenty to hold 2*shift+2 digits + 2 bits.
1791- Else (bsize is odd and asize < bsize) ah and al each have at most shift digits,
1792- so ah+al has at most shift digits + 1 bit, and (ah+al)*(bh+bl) has at most
1793- 2*shift+1 digits + 2 bits, and again 2*shift+2 digits is enough to hold it.
1760+ Let f(x) mean the floor of x and c(x) mean the ceiling of x. Some facts
1761+ to start with:
1762+
1763+ 1. For any integer i, i = c(i/2) + f(i/2). In particular,
1764+ bsize = c(bsize/2) + f(bsize/2).
1765+ 2. shift = f(bsize/2)
1766+ 3. asize <= bsize
1767+ 4. Since we call k_lopsided_mul if asize*2 <= bsize, asize*2 > bsize in this
1768+ routine, so asize > bsize/2 >= f(bsize/2) in this routine.
1769+
1770+ We allocated asize + bsize result digits, and add t3 into them at an offset
1771+ of shift. This leaves asize+bsize-shift allocated digit positions for t3
1772+ to fit into, = (by #1 and #2) asize + f(bsize/2) + c(bsize/2) - f(bsize/2) =
1773+ asize + c(bsize/2) available digit positions.
1774+
1775+ bh has c(bsize/2) digits, and bl at most f(size/2) digits. So bh+hl has
1776+ at most c(bsize/2) digits + 1 bit.
1777+
1778+ If asize == bsize, ah has c(bsize/2) digits, else ah has at most f(bsize/2)
1779+ digits, and al has at most f(bsize/2) digits in any case. So ah+al has at
1780+ most (asize == bsize ? c(bsize/2) : f(bsize/2)) digits + 1 bit.
1781+
1782+ The product (ah+al)*(bh+bl) therefore has at most
1783+
1784+ c(bsize/2) + (asize == bsize ? c(bsize/2) : f(bsize/2)) digits + 2 bits
1785+
1786+ and we have asize + c(bsize/2) available digit positions. We need to show
1787+ this is always enough. An instance of c(bsize/2) cancels out in both, so
1788+ the question reduces to whether asize digits is enough to hold
1789+ (asize == bsize ? c(bsize/2) : f(bsize/2)) digits + 2 bits. If asize < bsize,
1790+ then we're asking whether asize digits >= f(bsize/2) digits + 2 bits. By #4,
1791+ asize is at least f(bsize/2)+1 digits, so this in turn reduces to whether 1
1792+ digit is enough to hold 2 bits. This is so since SHIFT=15 >= 2. If
1793+ asize == bsize, then we're asking whether bsize digits is enough to hold
1794+ f(bsize/2) digits + 2 bits, or equivalently (by #1) whether c(bsize/2) digits
1795+ is enough to hold 2 bits. This is so if bsize >= 1, which holds because
1796+ bsize >= KARATSUBA_CUTOFF >= 1.
17941797
17951798Note that since there's always enough room for (ah+al)*(bh+bl), and that's
17961799clearly >= each of ah*bh and al*bl, there's always enough room to subtract
0 commit comments