roundupsize() and friends: fiddle over-allocation strategy for list

tim-one · tim-one · commit 65b8b8483988 · 2001-05-26T05:28:40.000Z
resizing.

Accurate timings are impossible on my Win98SE box, but this is obviously
faster even on this box for reasonable list.append() cases.  I give
credit for this not to the resizing strategy but to getting rid of integer
multiplication and divsion (in favor of shifting) when computing the
rounded-up size.

For unreasonable list.append() cases, Win98SE now displays linear behavior
for one-at-time appends up to a list with about 35 million elements.  Then
it dies with a MemoryError, due to fatally fragmented *address space*
(there's plenty of VM available, but by this point Win9X has broken user
space into many distinct heaps none of which has enough contiguous space
left to resize the list, and for whatever reason Win9x isn't coalescing
the dead heaps).  Before the patch it got a MemoryError for the same
reason, but once the list reached about 2 million elements.

Haven't yet tried on Win2K but have high hopes extreme list.append()
will be much better behaved now (NT &amp; Win2K didn't fragment address space,
but suffered obvious quadratic-time behavior before as lists got large).

For other systems I'm relying on common sense:  replacing integer * and /
by &lt;&lt; and &gt;&gt; can't plausibly hurt, the number of function calls hasn't
changed, and the total operation count for reasonably small lists is about
the same (while the operations are cheaper now).
diff --git a/Objects/listobject.c b/Objects/listobject.c
@@ -9,17 +9,40 @@
 #include <sys/types.h>		/* For size_t */
 #endif
 
-#define ROUNDUP(n, PyTryBlock) \
-	((((n)+(PyTryBlock)-1)/(PyTryBlock))*(PyTryBlock))
-
 static int
 roundupsize(int n)
 {
-	if (n < 500)
-		return ROUNDUP(n, 10);
-	else
-		return ROUNDUP(n, 100);
-}
+	unsigned int nbits = 0;
+	unsigned int n2 = (unsigned int)n >> 5;
+
+	/* Round up: 
+	 * If n <       256, to a multiple of        8.
+	 * If n <      2048, to a multiple of       64.
+	 * If n <     16384, to a multiple of      512.
+	 * If n <    131072, to a multiple of     4096.
+	 * If n <   1048576, to a multiple of    32768.
+	 * If n <   8388608, to a multiple of   262144.
+	 * If n <  67108864, to a multiple of  2097152.
+	 * If n < 536870912, to a multiple of 16777216.
+	 * ...
+	 * If n < 2**(5+3*i), to a multiple of 2**(3*i).
+	 *
+	 * This over-allocates proportional to the list size, making room
+	 * for additional growth.  The over-allocation is mild, but is
+	 * enough to give linear-time amortized behavior over a long
+	 * sequence of appends() in the presence of a poorly-performing
+	 * system realloc() (which is a reality, e.g., across all flavors
+	 * of Windows, with Win9x behavior being particularly bad -- and
+	 * we've still got address space fragmentation problems on Win9x
+	 * even with this scheme, although it requires much longer lists to
+	 * provoke them than it used to).
+	 */
+	do {
+		n2 >>= 3;
+		nbits += 3;
+	} while (n2);
+	return ((n >> nbits) + 1) << nbits;
+ }
 
 #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems))