k_mul() comments: In honor of Dijkstra, made the proof that "t3 fits"

tim-one · tim-one · commit ab86c2be2415 · 2002-08-15T20:06:00.000Z
rigorous instead of hoping for testing not to turn up counterexamples.
Call me heretical, but despite that I'm wholly confident in the proof,
and have done it two different ways now, I still put more faith in
testing ...
diff --git a/Objects/longobject.c b/Objects/longobject.c
@@ -1757,40 +1757,43 @@ k_mul(PyLongObject *a, PyLongObject *b)
 
 /* (*) Why adding t3 can't "run out of room" above.
 
-We allocated space for asize + bsize result digits.  We're adding t3 at an
-offset of shift digits, so there are asize + bsize - shift allocated digits
-remaining.  Because degenerate shifts of "a" were weeded out, asize is at
-least shift + 1.  If bsize is odd then bsize == 2*shift + 1, else bsize ==
-2*shift.  Therefore there are at least shift+1 + 2*shift - shift =
-
-2*shift+1 allocated digits remaining when bsize is even, or at least
-2*shift+2 allocated digits remaining when bsize is odd.
-
-Now in bh+bl, if bsize is even bh has at most shift digits, while if bsize
-is odd bh has at most shift+1 digits.  The sum bh+bl has at most
-
-shift   digits plus 1 bit when bsize is even
-shift+1 digits plus 1 bit when bsize is odd
-
-The same is true of ah+al, so (ah+al)(bh+bl) has at most
-
-2*shift   digits + 2 bits when bsize is even
-2*shift+2 digits + 2 bits when bsize is odd
-
-If bsize is even, we have at most 2*shift digits + 2 bits to fit into at
-least 2*shift+1 digits.  Since a digit has SHIFT bits, and SHIFT >= 2,
-there's always enough room to fit the 2 bits into the "spare" digit.
-
-If bsize is odd, we have at most 2*shift+2 digits + 2 bits to fit into at
-least 2*shift+2 digits, and there's not obviously enough room for the
-extra two bits.  We need a sharper analysis in this case.  The major
-laziness was in the "the same is true of ah+al" clause:  ah+al can't actually
-have shift+1 digits + 1 bit unless bsize is odd and asize == bsize.  In that
-case, we actually have (2*shift+1)*2 - shift = 3*shift+2 allocated digits
-remaining, and that's obviously plenty to hold 2*shift+2 digits + 2 bits.
-Else (bsize is odd and asize < bsize) ah and al each have at most shift digits,
-so ah+al has at most shift digits + 1 bit, and (ah+al)*(bh+bl) has at most
-2*shift+1 digits + 2 bits, and again 2*shift+2 digits is enough to hold it.
+Let f(x) mean the floor of x and c(x) mean the ceiling of x.  Some facts
+to start with:
+
+1. For any integer i, i = c(i/2) + f(i/2).  In particular,
+   bsize = c(bsize/2) + f(bsize/2).
+2. shift = f(bsize/2)
+3. asize <= bsize
+4. Since we call k_lopsided_mul if asize*2 <= bsize, asize*2 > bsize in this
+   routine, so asize > bsize/2 >= f(bsize/2) in this routine.
+
+We allocated asize + bsize result digits, and add t3 into them at an offset
+of shift.  This leaves asize+bsize-shift allocated digit positions for t3
+to fit into, = (by #1 and #2) asize + f(bsize/2) + c(bsize/2) - f(bsize/2) =
+asize + c(bsize/2) available digit positions.
+
+bh has c(bsize/2) digits, and bl at most f(size/2) digits.  So bh+hl has
+at most c(bsize/2) digits + 1 bit.
+
+If asize == bsize, ah has c(bsize/2) digits, else ah has at most f(bsize/2)
+digits, and al has at most f(bsize/2) digits in any case.  So ah+al has at
+most (asize == bsize ? c(bsize/2) : f(bsize/2)) digits + 1 bit.
+
+The product (ah+al)*(bh+bl) therefore has at most
+
+    c(bsize/2) + (asize == bsize ? c(bsize/2) : f(bsize/2)) digits + 2 bits
+
+and we have asize + c(bsize/2) available digit positions.  We need to show
+this is always enough.  An instance of c(bsize/2) cancels out in both, so
+the question reduces to whether asize digits is enough to hold
+(asize == bsize ? c(bsize/2) : f(bsize/2)) digits + 2 bits.  If asize < bsize,
+then we're asking whether asize digits >= f(bsize/2) digits + 2 bits.  By #4,
+asize is at least f(bsize/2)+1 digits, so this in turn reduces to whether 1
+digit is enough to hold 2 bits.  This is so since SHIFT=15 >= 2.  If
+asize == bsize, then we're asking whether bsize digits is enough to hold
+f(bsize/2) digits + 2 bits, or equivalently (by #1) whether c(bsize/2) digits
+is enough to hold 2 bits.  This is so if bsize >= 1, which holds because
+bsize >= KARATSUBA_CUTOFF >= 1.
 
 Note that since there's always enough room for (ah+al)*(bh+bl), and that's
 clearly >= each of ah*bh and al*bl, there's always enough room to subtract