Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 2f228e7

Browse files
committed
Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask".
The comment following used to say: /* We use ~hash instead of hash, as degenerate hash functions, such as for ints <sigh>, can have lots of leading zeros. It's not really a performance risk, but better safe than sorry. 12-Dec-00 tim: so ~hash produces lots of leading ones instead -- what's the gain? */ That is, there was never a good reason for doing it. And to the contrary, as explained on Python-Dev last December, it tended to make the *sum* (i + incr) & mask (which is the first table index examined in case of collison) the same "too often" across distinct hashes. Changing to the simpler "i = hash & mask" reduced the number of string-dict collisions (== # number of times we go around the lookup for-loop) from about 6 million to 5 million during a full run of the test suite (these are approximate because the test suite does some random stuff from run to run). The number of collisions in non-string dicts also decreased, but not as dramatically. Note that this may, for a given dict, change the order (wrt previous releases) of entries exposed by .keys(), .values() and .items(). A number of std tests suffered bogus failures as a result. For dicts keyed by small ints, or (less so) by characters, the order is much more likely to be in increasing order of key now; e.g., >>> d = {} >>> for i in range(10): ... d[i] = i ... >>> d {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} >>> Unfortunately. people may latch on to that in small examples and draw a bogus conclusion. test_support.py Moved test_extcall's sortdict() into test_support, made it stronger, and imported sortdict into other std tests that needed it. test_unicode.py Excluced cp875 from the "roundtrip over range(128)" test, because cp875 doesn't have a well-defined inverse for unicode("?", "cp875"). See Python-Dev for excruciating details. Cookie.py Chaged various output functions to sort dicts before building strings from them. test_extcall Fiddled the expected-result file. This remains sensitive to native dict ordering, because, e.g., if there are multiple errors in a keyword-arg dict (and test_extcall sets up many cases like that), the specific error Python complains about first depends on native dict ordering.
1 parent 0194ad5 commit 2f228e7

11 files changed

Lines changed: 64 additions & 46 deletions

File tree

Lib/Cookie.py

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,8 @@
7070
>>> C["fig"] = "newton"
7171
>>> C["sugar"] = "wafer"
7272
>>> print C
73-
Set-Cookie: sugar=wafer;
7473
Set-Cookie: fig=newton;
74+
Set-Cookie: sugar=wafer;
7575
7676
Notice that the printable representation of a Cookie is the
7777
appropriate format for a Set-Cookie: header. This is the
@@ -93,8 +93,8 @@
9393
>>> C = Cookie.SmartCookie()
9494
>>> C.load("chips=ahoy; vienna=finger")
9595
>>> print C
96-
Set-Cookie: vienna=finger;
9796
Set-Cookie: chips=ahoy;
97+
Set-Cookie: vienna=finger;
9898
9999
The load() method is darn-tootin smart about identifying cookies
100100
within a string. Escaped quotation marks, nested semicolons, and other
@@ -493,7 +493,9 @@ def OutputString(self, attrs=None):
493493
# Now add any defined attributes
494494
if attrs is None:
495495
attrs = self._reserved_keys
496-
for K,V in self.items():
496+
items = self.items()
497+
items.sort()
498+
for K,V in items:
497499
if V == "": continue
498500
if K not in attrs: continue
499501
if K == "expires" and type(V) == type(1):
@@ -586,7 +588,9 @@ def __setitem__(self, key, value):
586588
def output(self, attrs=None, header="Set-Cookie:", sep="\n"):
587589
"""Return a string suitable for HTTP."""
588590
result = []
589-
for K,V in self.items():
591+
items = self.items()
592+
items.sort()
593+
for K,V in items:
590594
result.append( V.output(attrs, header) )
591595
return string.join(result, sep)
592596
# end output
@@ -595,14 +599,18 @@ def output(self, attrs=None, header="Set-Cookie:", sep="\n"):
595599

596600
def __repr__(self):
597601
L = []
598-
for K,V in self.items():
602+
items = self.items()
603+
items.sort()
604+
for K,V in items:
599605
L.append( '%s=%s' % (K,repr(V.value) ) )
600606
return '<%s: %s>' % (self.__class__.__name__, string.join(L))
601607

602608
def js_output(self, attrs=None):
603609
"""Return a string suitable for JavaScript."""
604610
result = []
605-
for K,V in self.items():
611+
items = self.items()
612+
items.sort()
613+
for K,V in items:
606614
result.append( V.js_output(attrs) )
607615
return string.join(result, "")
608616
# end js_output

Lib/test/output/test_cookie

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
test_cookie
2-
<SimpleCookie: vienna='finger' chips='ahoy'>
3-
Set-Cookie: vienna=finger;
2+
<SimpleCookie: chips='ahoy' vienna='finger'>
43
Set-Cookie: chips=ahoy;
5-
vienna 'finger' 'finger'
64
Set-Cookie: vienna=finger;
75
chips 'ahoy' 'ahoy'
86
Set-Cookie: chips=ahoy;
7+
vienna 'finger' 'finger'
8+
Set-Cookie: vienna=finger;
99
<SimpleCookie: keebler='E=mc2; L="Loves"; fudge=\n;'>
1010
Set-Cookie: keebler="E=mc2; L=\"Loves\"; fudge=\012;";
1111
keebler 'E=mc2; L="Loves"; fudge=\n;' 'E=mc2; L="Loves"; fudge=\n;'

Lib/test/output/test_extcall

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ za () {} -> za() takes exactly 1 argument (0 given)
4040
za () {'a': 'aa'} -> ok za aa B D E V a
4141
za () {'d': 'dd'} -> za() got an unexpected keyword argument 'd'
4242
za () {'a': 'aa', 'd': 'dd'} -> za() got an unexpected keyword argument 'd'
43-
za () {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> za() got an unexpected keyword argument 'd'
43+
za () {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> za() got an unexpected keyword argument 'b'
4444
za (1, 2) {} -> za() takes exactly 1 argument (2 given)
4545
za (1, 2) {'a': 'aa'} -> za() takes exactly 1 non-keyword argument (2 given)
4646
za (1, 2) {'d': 'dd'} -> za() takes exactly 1 non-keyword argument (2 given)
@@ -59,8 +59,8 @@ zade () {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zade() got an unexpected
5959
zade (1, 2) {} -> ok zade 1 B 2 e V e
6060
zade (1, 2) {'a': 'aa'} -> zade() got multiple values for keyword argument 'a'
6161
zade (1, 2) {'d': 'dd'} -> zade() got multiple values for keyword argument 'd'
62-
zade (1, 2) {'a': 'aa', 'd': 'dd'} -> zade() got multiple values for keyword argument 'd'
63-
zade (1, 2) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zade() got multiple values for keyword argument 'd'
62+
zade (1, 2) {'a': 'aa', 'd': 'dd'} -> zade() got multiple values for keyword argument 'a'
63+
zade (1, 2) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zade() got multiple values for keyword argument 'a'
6464
zade (1, 2, 3, 4, 5) {} -> zade() takes at most 3 arguments (5 given)
6565
zade (1, 2, 3, 4, 5) {'a': 'aa'} -> zade() takes at most 3 non-keyword arguments (5 given)
6666
zade (1, 2, 3, 4, 5) {'d': 'dd'} -> zade() takes at most 3 non-keyword arguments (5 given)
@@ -75,7 +75,7 @@ zabk (1, 2) {} -> ok zabk 1 2 D E V {}
7575
zabk (1, 2) {'a': 'aa'} -> zabk() got multiple values for keyword argument 'a'
7676
zabk (1, 2) {'d': 'dd'} -> ok zabk 1 2 D E V {'d': 'dd'}
7777
zabk (1, 2) {'a': 'aa', 'd': 'dd'} -> zabk() got multiple values for keyword argument 'a'
78-
zabk (1, 2) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zabk() got multiple values for keyword argument 'b'
78+
zabk (1, 2) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zabk() got multiple values for keyword argument 'a'
7979
zabk (1, 2, 3, 4, 5) {} -> zabk() takes exactly 2 arguments (5 given)
8080
zabk (1, 2, 3, 4, 5) {'a': 'aa'} -> zabk() takes exactly 2 non-keyword arguments (5 given)
8181
zabk (1, 2, 3, 4, 5) {'d': 'dd'} -> zabk() takes exactly 2 non-keyword arguments (5 given)
@@ -90,12 +90,12 @@ zabdv (1, 2) {} -> ok zabdv 1 2 d E () e
9090
zabdv (1, 2) {'a': 'aa'} -> zabdv() got multiple values for keyword argument 'a'
9191
zabdv (1, 2) {'d': 'dd'} -> ok zabdv 1 2 dd E () d
9292
zabdv (1, 2) {'a': 'aa', 'd': 'dd'} -> zabdv() got multiple values for keyword argument 'a'
93-
zabdv (1, 2) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zabdv() got an unexpected keyword argument 'e'
93+
zabdv (1, 2) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zabdv() got multiple values for keyword argument 'a'
9494
zabdv (1, 2, 3, 4, 5) {} -> ok zabdv 1 2 3 E (4, 5) e
9595
zabdv (1, 2, 3, 4, 5) {'a': 'aa'} -> zabdv() got multiple values for keyword argument 'a'
9696
zabdv (1, 2, 3, 4, 5) {'d': 'dd'} -> zabdv() got multiple values for keyword argument 'd'
97-
zabdv (1, 2, 3, 4, 5) {'a': 'aa', 'd': 'dd'} -> zabdv() got multiple values for keyword argument 'd'
98-
zabdv (1, 2, 3, 4, 5) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zabdv() got multiple values for keyword argument 'd'
97+
zabdv (1, 2, 3, 4, 5) {'a': 'aa', 'd': 'dd'} -> zabdv() got multiple values for keyword argument 'a'
98+
zabdv (1, 2, 3, 4, 5) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zabdv() got multiple values for keyword argument 'a'
9999
zabdevk () {} -> zabdevk() takes at least 2 arguments (0 given)
100100
zabdevk () {'a': 'aa'} -> zabdevk() takes at least 2 non-keyword arguments (1 given)
101101
zabdevk () {'d': 'dd'} -> zabdevk() takes at least 2 non-keyword arguments (0 given)
@@ -105,9 +105,9 @@ zabdevk (1, 2) {} -> ok zabdevk 1 2 d e () {}
105105
zabdevk (1, 2) {'a': 'aa'} -> zabdevk() got multiple values for keyword argument 'a'
106106
zabdevk (1, 2) {'d': 'dd'} -> ok zabdevk 1 2 dd e () {}
107107
zabdevk (1, 2) {'a': 'aa', 'd': 'dd'} -> zabdevk() got multiple values for keyword argument 'a'
108-
zabdevk (1, 2) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zabdevk() got multiple values for keyword argument 'b'
108+
zabdevk (1, 2) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zabdevk() got multiple values for keyword argument 'a'
109109
zabdevk (1, 2, 3, 4, 5) {} -> ok zabdevk 1 2 3 4 (5,) {}
110110
zabdevk (1, 2, 3, 4, 5) {'a': 'aa'} -> zabdevk() got multiple values for keyword argument 'a'
111111
zabdevk (1, 2, 3, 4, 5) {'d': 'dd'} -> zabdevk() got multiple values for keyword argument 'd'
112-
zabdevk (1, 2, 3, 4, 5) {'a': 'aa', 'd': 'dd'} -> zabdevk() got multiple values for keyword argument 'd'
113-
zabdevk (1, 2, 3, 4, 5) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zabdevk() got multiple values for keyword argument 'd'
112+
zabdevk (1, 2, 3, 4, 5) {'a': 'aa', 'd': 'dd'} -> zabdevk() got multiple values for keyword argument 'a'
113+
zabdevk (1, 2, 3, 4, 5) {'a': 'aa', 'b': 'bb', 'd': 'dd', 'e': 'ee'} -> zabdevk() got multiple values for keyword argument 'a'

Lib/test/test_cookie.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@
2020
C = Cookie.SimpleCookie() ; C.load(data)
2121
print repr(C)
2222
print str(C)
23-
for k, v in dict.items():
23+
items = dict.items()
24+
items.sort()
25+
for k, v in items:
2426
print ' ', k, repr( C[k].value ), repr(v)
2527
verify(C[k].value == v)
2628
print C[k]

Lib/test/test_extcall.py

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,6 @@
1-
from test_support import verify, verbose, TestFailed
1+
from test_support import verify, verbose, TestFailed, sortdict
22
from UserList import UserList
33

4-
def sortdict(d):
5-
keys = d.keys()
6-
keys.sort()
7-
lst = []
8-
for k in keys:
9-
lst.append("%r: %r" % (k, d[k]))
10-
return "{%s}" % ", ".join(lst)
11-
124
def f(*a, **k):
135
print a, sortdict(k)
146

@@ -228,8 +220,9 @@ def method(self, arg1, arg2):
228220
lambda x: '%s="%s"' % (x, x), defargs)
229221
if vararg: arglist.append('*' + vararg)
230222
if kwarg: arglist.append('**' + kwarg)
231-
decl = 'def %s(%s): print "ok %s", a, b, d, e, v, k' % (
232-
name, ', '.join(arglist), name)
223+
decl = (('def %s(%s): print "ok %s", a, b, d, e, v, ' +
224+
'type(k) is type ("") and k or sortdict(k)')
225+
% (name, ', '.join(arglist), name))
233226
exec(decl)
234227
func = eval(name)
235228
funcs.append(func)

Lib/test/test_pyexpat.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@
55

66
from xml.parsers import expat
77

8+
from test_support import sortdict
9+
810
class Outputter:
911
def StartElementHandler(self, name, attrs):
10-
print 'Start element:\n\t', repr(name), attrs
12+
print 'Start element:\n\t', repr(name), sortdict(attrs)
1113

1214
def EndElementHandler(self, name):
1315
print 'End element:\n\t', repr(name)

Lib/test/test_regex.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from test_support import verbose
1+
from test_support import verbose, sortdict
22
import warnings
33
warnings.filterwarnings("ignore", "the regex module is deprecated",
44
DeprecationWarning, __name__)
@@ -40,7 +40,7 @@
4040
print cre.group(1, 2)
4141
print cre.group('one', 'two')
4242
print 'realpat:', cre.realpat
43-
print 'groupindex:', cre.groupindex
43+
print 'groupindex:', sortdict(cre.groupindex)
4444

4545
re = 'world'
4646
cre = regex.compile(re)

Lib/test/test_support.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,14 @@ def verify(condition, reason='test failed'):
9090
if not condition:
9191
raise TestFailed(reason)
9292

93+
def sortdict(dict):
94+
"Like repr(dict), but in sorted order."
95+
items = dict.items()
96+
items.sort()
97+
reprpairs = ["%r: %r" % pair for pair in items]
98+
withcommas = ", ".join(reprpairs)
99+
return "{%s}" % withcommas
100+
93101
def check_syntax(statement):
94102
try:
95103
compile(statement, '<string>', 'exec')

Lib/test/test_unicode.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
66
77
"""#"
8-
from test_support import verify, verbose
8+
from test_support import verify, verbose, TestFailed
99
import sys
1010

1111
def test(method, input, output, *args):
@@ -493,11 +493,14 @@ def __str__(self):
493493
'cp856', 'cp857', 'cp864', 'cp869', 'cp874',
494494

495495
'mac_greek', 'mac_iceland','mac_roman', 'mac_turkish',
496-
'cp1006', 'cp875', 'iso8859_8',
496+
'cp1006', 'iso8859_8',
497497

498498
### These have undefined mappings:
499499
#'cp424',
500500

501+
### These fail the round-trip:
502+
#'cp875'
503+
501504
):
502505
try:
503506
verify(unicode(s,encoding).encode(encoding) == s)

Misc/NEWS

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,16 @@ Core
2323
usually for the better, but may also cause numerically unstable
2424
algorithms to break.
2525

26+
- The implementation of dicts suffers fewer collisions, which has speed
27+
benefits. However, the order in which dict entries appear in dict.keys(),
28+
dict.values() and dict.items() may differ from previous releases for a
29+
given dict. Nothing is defined about this order, so no program should
30+
rely on it. Nevertheless, it's easy to write test cases that rely on the
31+
order by accident, typically because of printing the str() or repr() of a
32+
dict to an "expected results" file. See Lib/test/test_support.py's new
33+
sortdict(dict) function for a simple way to display a dict in sorted
34+
order.
35+
2636
- Dictionary objects now support the "in" operator: "x in dict" means
2737
the same as dict.has_key(x).
2838

0 commit comments

Comments
 (0)