@@ -28,13 +28,25 @@ Uniquification
2828 Dictionaries of any size. Bulk of work is in creation.
2929 Repeated writes to a smaller set of keys.
3030 Single read of each key.
31+ Some use cases have two consecutive accesses to the same key.
3132
3233 * Removing duplicates from a sequence.
3334 dict.fromkeys(seqn).keys()
35+
3436 * Counting elements in a sequence.
35- for e in seqn: d[e]=d.get(e,0) + 1
36- * Accumulating items in a dictionary of lists.
37- for k, v in itemseqn: d.setdefault(k, []).append(v)
37+ for e in seqn:
38+ d[e] = d.get(e,0) + 1
39+
40+ * Accumulating references in a dictionary of lists:
41+
42+ for pagenumber, page in enumerate(pages):
43+ for word in page:
44+ d.setdefault(word, []).append(pagenumber)
45+
46+ Note, the second example is a use case characterized by a get and set
47+ to the same key. There are similar used cases with a __contains__
48+ followed by a get, set, or del to the same key. Part of the
49+ justification for d.setdefault is combining the two lookups into one.
3850
3951Membership Testing
4052 Dictionaries of any size. Created once and then rarely changes.
@@ -44,7 +56,7 @@ Membership Testing
4456 such as with the % formatting operator.
4557
4658Dynamic Mappings
47- Characterized by deletions interspersed with adds and replacments .
59+ Characterized by deletions interspersed with adds and replacements .
4860 Performance benefits greatly from the re-use of dummy entries.
4961
5062
@@ -141,6 +153,9 @@ distribution), then there will be more benefit for large dictionaries
141153because any given key is no more likely than another to already be
142154in cache.
143155
156+ * In use cases with paired accesses to the same key, the second access
157+ is always in cache and gets no benefit from efforts to further improve
158+ cache locality.
144159
145160Optimizing the Search of Small Dictionaries
146161-------------------------------------------
@@ -184,7 +199,7 @@ sizes and access patterns, the user may be able to provide useful hints.
184199 more quickly because the first half of the keys will be inserted into
185200 a more sparse environment than before. The preconditions for this
186201 strategy arise whenever a dictionary is created from a key or item
187- sequence of known length .
202+ sequence and the number of unique keys is known .
188203
1892043) If the key space is large and the access pattern is known to be random,
190205 then search strategies exploiting cache locality can be fruitful.
@@ -218,3 +233,13 @@ spend in the collision resolution loop).
218233An additional possibility is to insert links into the empty spaces
219234so that dictionary iteration can proceed in len(d) steps instead of
220235(mp->mask + 1) steps.
236+
237+
238+ Caching Lookups
239+ ---------------
240+ The idea is to exploit key access patterns by anticipating future lookups
241+ based of previous lookups.
242+
243+ The simplest incarnation is to save the most recently accessed entry.
244+ This gives optimal performance for use cases where every get is followed
245+ by a set or del to the same key.
0 commit comments