Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 44121a6

Browse files
committed
x_mul(): This failed to normalize its result.
k_mul(): This didn't allocate enough result space when one input had more than twice as many bits as the other. This was partly hidden by that x_mul() didn't normalize its result. The Karatsuba recurrence is pretty much hosed if the inputs aren't roughly the same size. If one has at least twice as many bits as the other, we get a degenerate case where the "high half" of the smaller input is 0. Added a special case for that, for speed, but despite that it helped, this can still be much slower than the "grade school" method. It seems to take a really wild imbalance to trigger that; e.g., a 2**22-bit input times a 1000-bit input on my box runs about twice as slow under k_mul than under x_mul. This still needs to be addressed. I'm also not sure that allocating a->ob_size + b->ob_size digits is enough, given that this is computing k = (ah+al)*(bh+bl) instead of k = (ah-al)*(bl-bh); i.e., it's certainly enough for the final result, but it's vaguely possible that adding in the "artificially" large k may overflow that temporarily. If so, an assert will trigger in the debug build, but we'll probably compute the right result anyway(!).
1 parent 877a212 commit 44121a6

1 file changed

Lines changed: 18 additions & 6 deletions

File tree

Objects/longobject.c

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1556,7 +1556,7 @@ x_mul(PyLongObject *a, PyLongObject *b)
15561556
carry >>= SHIFT;
15571557
}
15581558
}
1559-
return z;
1559+
return long_normalize(z);
15601560
}
15611561

15621562
/* A helper for Karatsuba multiplication (k_mul).
@@ -1630,8 +1630,15 @@ k_mul(PyLongObject *a, PyLongObject *b)
16301630
}
16311631

16321632
/* Use gradeschool math when either number is too small. */
1633-
if (ABS(a->ob_size) <= KARATSUBA_CUTOFF)
1634-
return x_mul(a, b);
1633+
if (ABS(a->ob_size) <= KARATSUBA_CUTOFF) {
1634+
/* 0 is inevitable if one kmul arg has more than twice
1635+
* the digits of another, so it's worth special-casing.
1636+
*/
1637+
if (a->ob_size == 0)
1638+
return _PyLong_New(0);
1639+
else
1640+
return x_mul(a, b);
1641+
}
16351642

16361643
shift = ABS(b->ob_size) >> 1;
16371644
if (kmul_split(a, shift, &ah, &al) < 0) goto fail;
@@ -1641,16 +1648,21 @@ k_mul(PyLongObject *a, PyLongObject *b)
16411648
assert(ahbh->ob_size >= 0);
16421649

16431650
/* Allocate result space, and copy ahbh into the high digits. */
1644-
ret = _PyLong_New(ahbh->ob_size + 2*shift + 1);
1651+
ret = _PyLong_New(ABS(a->ob_size) + ABS(b->ob_size));
16451652
if (ret == NULL) goto fail;
16461653
#ifdef Py_DEBUG
16471654
/* Fill with trash, to catch reference to uninitialized digits. */
16481655
memset(ret->ob_digit, 0xDF, ret->ob_size * sizeof(digit));
16491656
#endif
1657+
assert(2*shift + ahbh->ob_size <= ret->ob_size);
16501658
memcpy(ret->ob_digit + 2*shift, ahbh->ob_digit,
16511659
ahbh->ob_size * sizeof(digit));
1652-
/* That didn't copy into the most-significant (overflow) digit. */
1653-
ret->ob_digit[ret->ob_size - 1] = 0;
1660+
1661+
/* Zero-out the digits higher than the ahbh copy. */
1662+
i = ret->ob_size - 2*shift - ahbh->ob_size;
1663+
if (i)
1664+
memset(ret->ob_digit + 2*shift + ahbh->ob_size, 0,
1665+
i * sizeof(digit));
16541666

16551667
/* Compute al*bl, and copy into the low digits. */
16561668
if ((albl = k_mul(al, bl)) == NULL) goto fail;

0 commit comments

Comments
 (0)