Speedup ChainMap #98766

zrothberg · 2022-10-27T12:29:26Z

Feature or enhancement

Makes ChainMaps iter, copy, and parents methods lazier.

Pitch

Use itertools to reduce number of objects created. Trying to claw back performance loss due to switching to an order preserving iter method.

While this method is roughly 3 times faster then the current behavior for iter it is still ~5 times slower then sets were originally based on testing high collision chainmaps. In order to get back to the original performance there would either need to be an order preserving set object or a new method added to dict that can copy hashes without copying or accessing the underlying items. The later of which seems much easier to do but the former has more general use cases. I am unsure if any other of the custom dict objects also have suboptimal structures being used that an ordered set would resolve. I believe most of the time it's suggested to just use dict in place of an ordered set.

Previous discussion

https://bugs.python.org/issue32792
The solution I went with was based on the above discussions rejected solution.

#86653

The above change inadvertently added a major performance regression by creating multiple new dicts that caused hash to be called on every object. This removed any benefit to using update over a single iterable based constructor.

No discussion on the list slicing stuff I can find. Was an inadvertent discovery when working on the iter method.

PR: GH-98766: Modest speed-up from ChainMap.__iter__ #98946

zrothberg · 2022-10-27T13:10:11Z

Previous discussion on OrderedSet from the mailing list. Given that I would guess there would be a heavy preference for adding a method to dict that allows you to reuse hashes without accessing the values directly or calling getitem. Seems like a fairly straightforward addition but I know that is a core class. Just having that alone would make it trival to add an OrderedSet to the collections module and remove any need to do any other weird stuff.

serhiy-storchaka · 2022-10-27T16:08:36Z

Could you please provide microbenchmarks which show the speedup? I would be surprised if tuple(islice(maps, 1, None)) turns out be faster than maps[1:].

zrothberg · 2022-10-27T16:52:56Z

Whoops I don't think I benched marked that one properly when I was coding. It is faster when you have 2 items and slower every other time... I shall remove that on the PR. The speedup listed above was just for iter not the islice. For iter that was at roughly a 8 deep chainmap of with dicts holding 5 keys each.

rhettinger · 2022-10-27T22:21:07Z

I'll look at this more later but my first impression is that the analysis and approach are fundamentally unsound.

The whole effort to be "lazy" seems misguided. The payoff for lazy evaluation comes from deferring work until a later time or from possibly not doing all of the work. Neither apply in this case. The __iter__ method builds a complete, flattened dictionary and that involves fully consuming the input mappings. Adding "laziness" doesn't help at all and itself adds a cost.

ISTM that the benchmarks are only measuring the pure python loop overhead. That really stands out with an 8 level chainmap with only 5 keys. Also, if you're using string keys, since they already cache the hash values, the benchmark is not giving credit to the current code for reusing hashes rather than recomputing them (a Decimal object would not be so fortunate).

The C code for dict.update() and dict.fromkeys() both have highly optimized fast paths that are hard to beat for high volume calls. It is the overhead of calling those methods is really what is at issue. We could mitigate the calling overhead by switching to the new |= operator instead of normal Python method call. And possibly the small for-loop overhead could be reduced by using reduce() instead. For larger dicts and shallower depths, neither of those would be noticeable.

One other thought: The ChainMap class was designed to be mostly simple pure python code. It wasn't intended to be a high performance class. Users who care more about performance are almost always better off creating a single flat dictionary. This would still be true even if we rewrote all of ChainMap in C. It's fine to make optimization tweaks if they are readable and not disruptive, but we don't want to "go gonzo" and introduce new dict methods and ordered set type and whatnot. It isn't worth it.

rhettinger · 2022-10-27T22:54:09Z

Try this to see what effect it has on your benchmarks:

def __iter__(self):
    d = {}
    for mapping in map(dict.fromkeys, reversed(self.maps)):
        d |= mapping
    return iter(d)

Also, we should add tests for both dict.fromkeys() and ChainMap.__iter__() to verify that they reuse stored hash values. We really don't want to lose that — it helps a lot in cases where calling __hash__ is expensive.

zrothberg · 2022-10-30T02:41:36Z

I am going to make a fuller test suit to cover all the bases for iter. I have some code right now for checking the hash counts and can make some tests for ensuring that hash reuse happens. Is there any classes to use keys that you suggest outside of decimal and str? I was also going to add test cases for different dict like objects. I can add cases for large and small dictionaries.

The majority of the overhead from what I was testing and have tested seems to be from the creation of intermediate dicts that just get consumed right away. That was heavily outweighing the extra hash calls. The second item causing the performance regression from update to the current method is that fromkeys doesnt appear to reuse the hash in cases where update will. I am not sure why that is. The behavior is consistent from 3.9-now.

In your opinion are larger dict(key number wise) more common or smaller dicts? Not sure which gives the most representative performance.

serhiy-storchaka · 2022-10-30T07:06:53Z

Tuples are commonly used as keys, and they do not cache the hash. Also you can just test with a custom Python object with defined __hash__.

Test also the alternatives proposed in #76973 (comment).

zrothberg · 2022-10-31T00:55:02Z

So this is the most efficient version I was able to construct. Whenever our list of maps is just normal dictionary objects it will fall back to using update. Currently fromkeys does not reuse the hash for subclasses of dict. This produces the same exact number of hashes as the current solution. Whenever the maps are made up of types that cannot reuse the hash it will either bypass the intermediate dicts by zipping the keys or just replace the return dict.

This has my original proposal, dict.fromkeys( (iterable of all keys)) as one of its paths along with the original dict.update as the other path for the cases when maps is all non dict and all dicts respectively. Whenever its made up of mixed groups the runtime is always less then the current setup. It does have a worst case behavior when the first map it parses is a dict and every other one above it not a dict because zip becomes more expensive then having just never building the first dict. But it would make the code complex to setup correctly. Every other cases it performs better then my original proposal.

The lambda is in here just for clarity but it is a function normally because lambda run slower when used with itertools method then normal functions from my testing.

def __iter__(self):
    d = {}
    for k, g in _groupby(reversed(self.maps), lambda x: type(x) is dict):
        if k:
            # without using update only dict reuses hash
            # if dict subclass can safely call update then we change to isinstance
            for mapping in g: 
                d.update(mapping)
        else:
            if d:
                d.update(zip(_chain(*g),_repeat(None))) #faster then intermediate dict 
            else:
                d = dict.fromkeys(_chain(*g)) #fromkeys is faster when replacing dict
    return iter(d)

Testing below was done on 3.10 having trouble with my local build of the current version but will add those numbers tomorrow.

Testing using a dict of tuples of decimals for hash speed effects
For 4 dicts of 100k keys it is roughly twice as fast as the current method
For 4 dict subclasses of 100k keys it is ~50% faster
For 4 dicts of 20 keys it is ~66% faster
For 4 dict subclasses of 20 keys it is ~20% faster

Testing using a dict with string keys
For 4 dicts of 100k keys it is ~75% as faster
For 4 dict subclasses of 100k keys it is ~72% faster
For 4 dicts of 20 keys it is ~40% faster
For 4 dict subclasses of 20 keys it is ~20% faster

rhettinger · 2022-11-01T03:38:56Z

I appreciate the heroic effort but just want to do the simpler edit mentioned in the issue tracker.

The code about looks like it is "killing a mosquito with a cannon" by breaking down the various subcases and using a different approach for each. It seems to try every trick in the book to squeeze out a few clock cycles. As a result, the code is much harder to understand and will be more difficult to maintain and test.

The timing results are dubious. Adding lambdas, groupby, argument unpacking, and multiple conditionals is likely to slow the common cases. Also the interpreter performance changed quite a bit in 3.11 so the relative performance of the components will be different. The code above has the hallmarks of overfitting to a particular Python version , particular build, and to a particular set of benchmarks.

Thank you again, but I really don't want to go down this path. If the performance of ChainMap() ever becomes an issue, I will likely just make a C extension but leave the Python code in a relatively clean form.

zrothberg added the type-feature A feature request or enhancement label Oct 27, 2022

AlexWaygood added performance Performance or resource usage stdlib Python modules in the Lib dir labels Oct 27, 2022

bedevere-bot mentioned this issue Oct 27, 2022

gh-98766: Lazier ChainMap methods #98767

Closed

AlexWaygood linked a pull request Oct 27, 2022 that will close this issue

gh-98766: Lazier ChainMap methods #98767

Closed

rhettinger self-assigned this Oct 27, 2022

rhettinger closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2022

rhettinger added a commit to rhettinger/cpython that referenced this issue Nov 1, 2022

pythonGH-98766: Modest speed-up from ChainMap.__iter__

fe6220d

bedevere-bot mentioned this issue Nov 1, 2022

GH-98766: Modest speed-up from ChainMap.__iter__ #98946

Merged

rhettinger added a commit that referenced this issue Nov 1, 2022

GH-98766: Modest speed-up from ChainMap.__iter__ (GH-98946)

f5afb7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Speedup ChainMap #98766

Speedup ChainMap #98766

zrothberg commented Oct 27, 2022 •

edited by bedevere-bot

Loading

zrothberg commented Oct 27, 2022

Uh oh!

serhiy-storchaka commented Oct 27, 2022

Uh oh!

zrothberg commented Oct 27, 2022

Uh oh!

rhettinger commented Oct 27, 2022 •

edited

Loading

Uh oh!

rhettinger commented Oct 27, 2022 •

edited

Loading

Uh oh!

zrothberg commented Oct 30, 2022

Uh oh!

serhiy-storchaka commented Oct 30, 2022

Uh oh!

zrothberg commented Oct 31, 2022

Uh oh!

rhettinger commented Nov 1, 2022 •

edited

Loading

Uh oh!

Uh oh!

Speedup ChainMap #98766

Speedup ChainMap #98766

Comments

zrothberg commented Oct 27, 2022 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Feature or enhancement

Pitch

Previous discussion

zrothberg commented Oct 27, 2022

Uh oh!

serhiy-storchaka commented Oct 27, 2022

Uh oh!

zrothberg commented Oct 27, 2022

Uh oh!

rhettinger commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zrothberg commented Oct 30, 2022

Uh oh!

serhiy-storchaka commented Oct 30, 2022

Uh oh!

zrothberg commented Oct 31, 2022

Uh oh!

rhettinger commented Nov 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zrothberg commented Oct 27, 2022 •

edited by bedevere-bot

Loading

rhettinger commented Oct 27, 2022 •

edited

Loading

rhettinger commented Oct 27, 2022 •

edited

Loading

rhettinger commented Nov 1, 2022 •

edited

Loading