Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit bd79264

Browse files
committed
Issue #4258: Make it possible to use 30-bit digits for PyLongs:
- new configure option --enable-big-digits - new structseq sys.int_info giving information about the internal format By default, 30-bit digits are enabled on 64-bit machines but disabled on 32-bit machines.
1 parent e7f45b8 commit bd79264

15 files changed

Lines changed: 865 additions & 68 deletions

File tree

Doc/library/sys.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -413,6 +413,23 @@ always available.
413413
same information.
414414

415415

416+
.. data:: int_info
417+
418+
A struct sequence that holds information about Python's
419+
internal representation of integers. The attributes are read only.
420+
421+
+-------------------------+----------------------------------------------+
422+
| attribute | explanation |
423+
+=========================+==============================================+
424+
| :const:`bits_per_digit` | number of bits held in each digit. Python |
425+
| | integers are stored internally in base |
426+
| | ``2**int_info.bits_per_digit`` |
427+
+-------------------------+----------------------------------------------+
428+
| :const:`sizeof_digit` | size in bytes of the C type used to |
429+
| | represent a digit |
430+
+-------------------------+----------------------------------------------+
431+
432+
416433
.. function:: intern(string)
417434

418435
Enter *string* in the table of "interned" strings and return the interned string

Doc/whatsnew/3.1.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,5 +87,28 @@ Some smaller changes made to the core Python language are:
8787

8888
(Contributed by Fredrik Johansson and Victor Stinner; :issue:`3439`.)
8989

90+
* Integers are now stored internally either in base 2**15 or in base
91+
2**30, the base being determined at build time. Previously, they
92+
were always stored in base 2**15. Using base 2**30 gives
93+
significant performance improvements on 64-bit machines, but
94+
benchmark results on 32-bit machines have been mixed. Therefore,
95+
the default is to use base 2**30 on 64-bit machines and base 2**15
96+
on 32-bit machines; on Unix, there's a new configure option
97+
--enable-big-digits that can be used to override this default.
98+
99+
Apart from the performance improvements this change should be
100+
invisible to end users, with one exception: for testing and
101+
debugging purposes there's a new structseq ``sys.int_info`` that
102+
provides information about the internal format, giving the number of
103+
bits per digit and the size in bytes of the C type used to store
104+
each digit::
105+
106+
>>> import sys
107+
>>> sys.int_info
108+
sys.int_info(bits_per_digit=30, sizeof_digit=4)
109+
110+
111+
(Contributed by Mark Dickinson; :issue:`4258`.)
112+
90113

91114
.. ======================================================================

Include/longintrepr.h

Lines changed: 53 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,24 +7,62 @@ extern "C" {
77

88
/* This is published for the benefit of "friend" marshal.c only. */
99

10-
/* Parameters of the long integer representation.
11-
These shouldn't have to be changed as C should guarantee that a short
12-
contains at least 16 bits, but it's made changeable anyway.
13-
Note: 'digit' should be able to hold 2*MASK+1, and 'twodigits'
14-
should be able to hold the intermediate results in 'mul'
15-
(at most (BASE-1)*(2*BASE+1) == MASK*(2*MASK+3)).
16-
Also, x_sub assumes that 'digit' is an unsigned type, and overflow
17-
is handled by taking the result mod 2**N for some N > SHIFT.
18-
And, at some places it is assumed that MASK fits in an int, as well.
19-
long_pow() requires that SHIFT be divisible by 5. */
10+
/* Parameters of the long integer representation. There are two different
11+
sets of parameters: one set for 30-bit digits, stored in an unsigned 32-bit
12+
integer type, and one set for 15-bit digits with each digit stored in an
13+
unsigned short. The value of PYLONG_BITS_IN_DIGIT, defined either at
14+
configure time or in pyport.h, is used to decide which digit size to use.
2015
21-
typedef unsigned short digit;
22-
typedef short sdigit; /* signed variant of digit */
23-
#define BASE_TWODIGITS_TYPE long
24-
typedef unsigned BASE_TWODIGITS_TYPE twodigits;
25-
typedef BASE_TWODIGITS_TYPE stwodigits; /* signed variant of twodigits */
16+
Type 'digit' should be able to hold 2*PyLong_BASE-1, and type 'twodigits'
17+
should be an unsigned integer type able to hold all integers up to
18+
PyLong_BASE*PyLong_BASE-1. x_sub assumes that 'digit' is an unsigned type,
19+
and that overflow is handled by taking the result modulo 2**N for some N >
20+
PyLong_SHIFT. The majority of the code doesn't care about the precise
21+
value of PyLong_SHIFT, but there are some notable exceptions:
22+
23+
- long_pow() requires that PyLong_SHIFT be divisible by 5
24+
25+
- PyLong_{As,From}ByteArray require that PyLong_SHIFT be at least 8
26+
27+
- long_hash() requires that PyLong_SHIFT is *strictly* less than the number
28+
of bits in an unsigned long, as do the PyLong <-> long (or unsigned long)
29+
conversion functions
30+
31+
- the long <-> size_t/Py_ssize_t conversion functions expect that
32+
PyLong_SHIFT is strictly less than the number of bits in a size_t
33+
34+
- the marshal code currently expects that PyLong_SHIFT is a multiple of 15
35+
36+
- NSMALLNEGINTS and NSMALLPOSINTS should be small enough to fit in a single
37+
digit; with the current values this forces PyLong_SHIFT >= 9
2638
39+
The values 15 and 30 should fit all of the above requirements, on any
40+
platform.
41+
*/
42+
43+
#if HAVE_STDINT_H
44+
#include <stdint.h>
45+
#endif
46+
47+
#if PYLONG_BITS_IN_DIGIT == 30
48+
#if !(defined HAVE_UINT64_T && defined HAVE_UINT32_T && \
49+
defined HAVE_INT64_T && defined HAVE_INT32_T)
50+
#error "30-bit long digits requested, but the necessary types are not available on this platform"
51+
#endif
52+
typedef PY_UINT32_T digit;
53+
typedef PY_INT32_T sdigit; /* signed variant of digit */
54+
typedef PY_UINT64_T twodigits;
55+
typedef PY_INT64_T stwodigits; /* signed variant of twodigits */
56+
#define PyLong_SHIFT 30
57+
#elif PYLONG_BITS_IN_DIGIT == 15
58+
typedef unsigned short digit;
59+
typedef short sdigit; /* signed variant of digit */
60+
typedef unsigned long twodigits;
61+
typedef long stwodigits; /* signed variant of twodigits */
2762
#define PyLong_SHIFT 15
63+
#else
64+
#error "PYLONG_BITS_IN_DIGIT should be 15 or 30"
65+
#endif
2866
#define PyLong_BASE ((digit)1 << PyLong_SHIFT)
2967
#define PyLong_MASK ((digit)(PyLong_BASE - 1))
3068

Include/longobject.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ PyAPI_FUNC(Py_ssize_t) PyLong_AsSsize_t(PyObject *);
2626
PyAPI_FUNC(size_t) PyLong_AsSize_t(PyObject *);
2727
PyAPI_FUNC(unsigned long) PyLong_AsUnsignedLong(PyObject *);
2828
PyAPI_FUNC(unsigned long) PyLong_AsUnsignedLongMask(PyObject *);
29+
PyAPI_FUNC(PyObject *) PyLong_GetInfo(void);
2930

3031
/* It may be useful in the future. I've added it in the PyInt -> PyLong
3132
cleanup to keep the extra information. [CH] */

Include/pyport.h

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,57 @@ Used in: PY_LONG_LONG
6969
#endif
7070
#endif /* HAVE_LONG_LONG */
7171

72+
/* a build with 30-bit digits for Python long integers needs an exact-width
73+
* 32-bit unsigned integer type to store those digits. (We could just use
74+
* type 'unsigned long', but that would be wasteful on a system where longs
75+
* are 64-bits.) On Unix systems, the autoconf macro AC_TYPE_UINT32_T defines
76+
* uint32_t to be such a type unless stdint.h or inttypes.h defines uint32_t.
77+
* However, it doesn't set HAVE_UINT32_T, so we do that here.
78+
*/
79+
#if (defined UINT32_MAX || defined uint32_t)
80+
#ifndef PY_UINT32_T
81+
#define HAVE_UINT32_T 1
82+
#define PY_UINT32_T uint32_t
83+
#endif
84+
#endif
85+
86+
/* Macros for a 64-bit unsigned integer type; used for type 'twodigits' in the
87+
* long integer implementation, when 30-bit digits are enabled.
88+
*/
89+
#if (defined UINT64_MAX || defined uint64_t)
90+
#ifndef PY_UINT64_T
91+
#define HAVE_UINT64_T 1
92+
#define PY_UINT64_T uint64_t
93+
#endif
94+
#endif
95+
96+
/* Signed variants of the above */
97+
#if (defined INT32_MAX || defined int32_t)
98+
#ifndef PY_INT32_T
99+
#define HAVE_INT32_T 1
100+
#define PY_INT32_T int32_t
101+
#endif
102+
#endif
103+
#if (defined INT64_MAX || defined int64_t)
104+
#ifndef PY_INT64_T
105+
#define HAVE_INT64_T 1
106+
#define PY_INT64_T int64_t
107+
#endif
108+
#endif
109+
110+
/* If PYLONG_BITS_IN_DIGIT is not defined then we'll use 30-bit digits if all
111+
the necessary integer types are available, and we're on a 64-bit platform
112+
(as determined by SIZEOF_VOID_P); otherwise we use 15-bit digits. */
113+
114+
#ifndef PYLONG_BITS_IN_DIGIT
115+
#if (defined HAVE_UINT64_T && defined HAVE_INT64_T && \
116+
defined HAVE_UINT32_T && defined HAVE_INT32_T && SIZEOF_VOID_P >= 8)
117+
#define PYLONG_BITS_IN_DIGIT 30
118+
#else
119+
#define PYLONG_BITS_IN_DIGIT 15
120+
#endif
121+
#endif
122+
72123
/* uintptr_t is the C9X name for an unsigned integral type such that a
73124
* legitimate void* can be cast to uintptr_t and then back to void* again
74125
* without loss of information. Similarly for intptr_t, wrt a signed

Lib/test/test_long.py

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ def __str__(self):
1515
return self.format % self.args
1616

1717
# SHIFT should match the value in longintrepr.h for best testing.
18-
SHIFT = 15
18+
SHIFT = sys.int_info.bits_per_digit
1919
BASE = 2 ** SHIFT
2020
MASK = BASE - 1
2121
KARATSUBA_CUTOFF = 70 # from longobject.c
@@ -120,6 +120,35 @@ def test_division(self):
120120
y = self.getran(leny) or 1
121121
self.check_division(x, y)
122122

123+
# specific numbers chosen to exercise corner cases of the
124+
# current long division implementation
125+
126+
# 30-bit cases involving a quotient digit estimate of BASE+1
127+
self.check_division(1231948412290879395966702881,
128+
1147341367131428698)
129+
self.check_division(815427756481275430342312021515587883,
130+
707270836069027745)
131+
self.check_division(627976073697012820849443363563599041,
132+
643588798496057020)
133+
self.check_division(1115141373653752303710932756325578065,
134+
1038556335171453937726882627)
135+
# 30-bit cases that require the post-subtraction correction step
136+
self.check_division(922498905405436751940989320930368494,
137+
949985870686786135626943396)
138+
self.check_division(768235853328091167204009652174031844,
139+
1091555541180371554426545266)
140+
141+
# 15-bit cases involving a quotient digit estimate of BASE+1
142+
self.check_division(20172188947443, 615611397)
143+
self.check_division(1020908530270155025, 950795710)
144+
self.check_division(128589565723112408, 736393718)
145+
self.check_division(609919780285761575, 18613274546784)
146+
# 15-bit cases that require the post-subtraction correction step
147+
self.check_division(710031681576388032, 26769404391308)
148+
self.check_division(1933622614268221, 30212853348836)
149+
150+
151+
123152
def test_karatsuba(self):
124153
digits = list(range(1, 5)) + list(range(KARATSUBA_CUTOFF,
125154
KARATSUBA_CUTOFF + 10))

Lib/test/test_sys.py

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,9 @@ def test_attributes(self):
333333
self.assert_(isinstance(sys.executable, str))
334334
self.assertEqual(len(sys.float_info), 11)
335335
self.assertEqual(sys.float_info.radix, 2)
336+
self.assertEqual(len(sys.int_info), 2)
337+
self.assert_(sys.int_info.bits_per_digit % 5 == 0)
338+
self.assert_(sys.int_info.sizeof_digit >= 1)
336339
self.assert_(isinstance(sys.hexversion, int))
337340
self.assert_(isinstance(sys.maxsize, int))
338341
self.assert_(isinstance(sys.maxunicode, int))
@@ -437,6 +440,7 @@ def setUp(self):
437440
if hasattr(sys, "gettotalrefcount"):
438441
self.header += '2P'
439442
self.vheader += '2P'
443+
self.longdigit = sys.int_info.sizeof_digit
440444
import _testcapi
441445
self.gc_headsize = _testcapi.SIZEOF_PYGC_HEAD
442446
self.file = open(test.support.TESTFN, 'wb')
@@ -471,16 +475,16 @@ def test_gc_head_size(self):
471475
size = self.calcsize
472476
gc_header_size = self.gc_headsize
473477
# bool objects are not gc tracked
474-
self.assertEqual(sys.getsizeof(True), size(vh) + self.H)
478+
self.assertEqual(sys.getsizeof(True), size(vh) + self.longdigit)
475479
# but lists are
476480
self.assertEqual(sys.getsizeof([]), size(vh + 'PP') + gc_header_size)
477481

478482
def test_default(self):
479483
h = self.header
480484
vh = self.vheader
481485
size = self.calcsize
482-
self.assertEqual(sys.getsizeof(True), size(vh) + self.H)
483-
self.assertEqual(sys.getsizeof(True, -1), size(vh) + self.H)
486+
self.assertEqual(sys.getsizeof(True), size(vh) + self.longdigit)
487+
self.assertEqual(sys.getsizeof(True, -1), size(vh) + self.longdigit)
484488

485489
def test_objecttypes(self):
486490
# check all types defined in Objects/
@@ -489,7 +493,7 @@ def test_objecttypes(self):
489493
size = self.calcsize
490494
check = self.check_sizeof
491495
# bool
492-
check(True, size(vh) + self.H)
496+
check(True, size(vh) + self.longdigit)
493497
# buffer
494498
# XXX
495499
# builtin_function_or_method
@@ -607,11 +611,12 @@ def get_gen(): yield 1
607611
check(reversed([]), size(h + 'lP'))
608612
# long
609613
check(0, size(vh))
610-
check(1, size(vh) + self.H)
611-
check(-1, size(vh) + self.H)
612-
check(32768, size(vh) + 2*self.H)
613-
check(32768*32768-1, size(vh) + 2*self.H)
614-
check(32768*32768, size(vh) + 3*self.H)
614+
check(1, size(vh) + self.longdigit)
615+
check(-1, size(vh) + self.longdigit)
616+
PyLong_BASE = 2**sys.int_info.bits_per_digit
617+
check(PyLong_BASE, size(vh) + 2*self.longdigit)
618+
check(PyLong_BASE**2-1, size(vh) + 2*self.longdigit)
619+
check(PyLong_BASE**2, size(vh) + 3*self.longdigit)
615620
# memory
616621
check(memoryview(b''), size(h + 'P PP2P2i7P'))
617622
# module

Misc/NEWS

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,13 @@ What's New in Python 3.1 alpha 2?
1212
Core and Builtins
1313
-----------------
1414

15+
- Issue #4258: Make it possible to use base 2**30 instead of base
16+
2**15 for the internal representation of integers, for performance
17+
reasons. Base 2**30 is enabled by default on 64-bit machines. Add
18+
--enable-big-digits option to configure, which overrides the
19+
default. Add sys.int_info structseq to provide information about
20+
the internal format.
21+
1522
- Issue #4474: PyUnicode_FromWideChar now converts characters outside
1623
the BMP to surrogate pairs, on systems with sizeof(wchar_t) == 4
1724
and sizeof(Py_UNICODE) == 2.

0 commit comments

Comments
 (0)