Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Speedup ChainMap #98766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zrothberg opened this issue Oct 27, 2022 · 9 comments
Closed

Speedup ChainMap #98766

zrothberg opened this issue Oct 27, 2022 · 9 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@zrothberg
Copy link

zrothberg commented Oct 27, 2022

Feature or enhancement

Makes ChainMaps iter, copy, and parents methods lazier.

Pitch

Use itertools to reduce number of objects created. Trying to claw back performance loss due to switching to an order preserving iter method.

While this method is roughly 3 times faster then the current behavior for iter it is still ~5 times slower then sets were originally based on testing high collision chainmaps. In order to get back to the original performance there would either need to be an order preserving set object or a new method added to dict that can copy hashes without copying or accessing the underlying items. The later of which seems much easier to do but the former has more general use cases. I am unsure if any other of the custom dict objects also have suboptimal structures being used that an ordered set would resolve. I believe most of the time it's suggested to just use dict in place of an ordered set.

Previous discussion

https://bugs.python.org/issue32792
The solution I went with was based on the above discussions rejected solution.

#86653

The above change inadvertently added a major performance regression by creating multiple new dicts that caused hash to be called on every object. This removed any benefit to using update over a single iterable based constructor.

No discussion on the list slicing stuff I can find. Was an inadvertent discovery when working on the iter method.

@zrothberg zrothberg added the type-feature A feature request or enhancement label Oct 27, 2022
@AlexWaygood AlexWaygood added performance Performance or resource usage stdlib Python modules in the Lib dir labels Oct 27, 2022
@AlexWaygood AlexWaygood linked a pull request Oct 27, 2022 that will close this issue
@zrothberg
Copy link
Author

Previous discussion on OrderedSet from the mailing list. Given that I would guess there would be a heavy preference for adding a method to dict that allows you to reuse hashes without accessing the values directly or calling getitem. Seems like a fairly straightforward addition but I know that is a core class. Just having that alone would make it trival to add an OrderedSet to the collections module and remove any need to do any other weird stuff.

@rhettinger rhettinger self-assigned this Oct 27, 2022
@serhiy-storchaka
Copy link
Member

Could you please provide microbenchmarks which show the speedup? I would be surprised if tuple(islice(maps, 1, None)) turns out be faster than maps[1:].

@zrothberg
Copy link
Author

Whoops I don't think I benched marked that one properly when I was coding. It is faster when you have 2 items and slower every other time... I shall remove that on the PR. The speedup listed above was just for iter not the islice. For iter that was at roughly a 8 deep chainmap of with dicts holding 5 keys each.

@rhettinger
Copy link
Contributor

rhettinger commented Oct 27, 2022

I'll look at this more later but my first impression is that the analysis and approach are fundamentally unsound.

The whole effort to be "lazy" seems misguided. The payoff for lazy evaluation comes from deferring work until a later time or from possibly not doing all of the work. Neither apply in this case. The __iter__ method builds a complete, flattened dictionary and that involves fully consuming the input mappings. Adding "laziness" doesn't help at all and itself adds a cost.

ISTM that the benchmarks are only measuring the pure python loop overhead. That really stands out with an 8 level chainmap with only 5 keys. Also, if you're using string keys, since they already cache the hash values, the benchmark is not giving credit to the current code for reusing hashes rather than recomputing them (a Decimal object would not be so fortunate).

The C code for dict.update() and dict.fromkeys() both have highly optimized fast paths that are hard to beat for high volume calls. It is the overhead of calling those methods is really what is at issue. We could mitigate the calling overhead by switching to the new |= operator instead of normal Python method call. And possibly the small for-loop overhead could be reduced by using reduce() instead. For larger dicts and shallower depths, neither of those would be noticeable.

One other thought: The ChainMap class was designed to be mostly simple pure python code. It wasn't intended to be a high performance class. Users who care more about performance are almost always better off creating a single flat dictionary. This would still be true even if we rewrote all of ChainMap in C. It's fine to make optimization tweaks if they are readable and not disruptive, but we don't want to "go gonzo" and introduce new dict methods and ordered set type and whatnot. It isn't worth it.

@rhettinger
Copy link
Contributor

rhettinger commented Oct 27, 2022

Try this to see what effect it has on your benchmarks:

def __iter__(self):
    d = {}
    for mapping in map(dict.fromkeys, reversed(self.maps)):
        d |= mapping
    return iter(d)

Also, we should add tests for both dict.fromkeys() and ChainMap.__iter__() to verify that they reuse stored hash values. We really don't want to lose that — it helps a lot in cases where calling __hash__ is expensive.

@zrothberg
Copy link
Author

I am going to make a fuller test suit to cover all the bases for iter. I have some code right now for checking the hash counts and can make some tests for ensuring that hash reuse happens. Is there any classes to use keys that you suggest outside of decimal and str? I was also going to add test cases for different dict like objects. I can add cases for large and small dictionaries.

The majority of the overhead from what I was testing and have tested seems to be from the creation of intermediate dicts that just get consumed right away. That was heavily outweighing the extra hash calls. The second item causing the performance regression from update to the current method is that fromkeys doesnt appear to reuse the hash in cases where update will. I am not sure why that is. The behavior is consistent from 3.9-now.

In your opinion are larger dict(key number wise) more common or smaller dicts? Not sure which gives the most representative performance.

@serhiy-storchaka
Copy link
Member

Tuples are commonly used as keys, and they do not cache the hash. Also you can just test with a custom Python object with defined __hash__.

Test also the alternatives proposed in #76973 (comment).

@zrothberg
Copy link
Author

So this is the most efficient version I was able to construct. Whenever our list of maps is just normal dictionary objects it will fall back to using update. Currently fromkeys does not reuse the hash for subclasses of dict. This produces the same exact number of hashes as the current solution. Whenever the maps are made up of types that cannot reuse the hash it will either bypass the intermediate dicts by zipping the keys or just replace the return dict.

This has my original proposal, dict.fromkeys( (iterable of all keys)) as one of its paths along with the original dict.update as the other path for the cases when maps is all non dict and all dicts respectively. Whenever its made up of mixed groups the runtime is always less then the current setup. It does have a worst case behavior when the first map it parses is a dict and every other one above it not a dict because zip becomes more expensive then having just never building the first dict. But it would make the code complex to setup correctly. Every other cases it performs better then my original proposal.

The lambda is in here just for clarity but it is a function normally because lambda run slower when used with itertools method then normal functions from my testing.

def __iter__(self):
    d = {}
    for k, g in _groupby(reversed(self.maps), lambda x: type(x) is dict):
        if k:
            # without using update only dict reuses hash
            # if dict subclass can safely call update then we change to isinstance
            for mapping in g: 
                d.update(mapping)
        else:
            if d:
                d.update(zip(_chain(*g),_repeat(None))) #faster then intermediate dict 
            else:
                d = dict.fromkeys(_chain(*g)) #fromkeys is faster when replacing dict
    return iter(d)

Testing below was done on 3.10 having trouble with my local build of the current version but will add those numbers tomorrow.

Testing using a dict of tuples of decimals for hash speed effects
For 4 dicts of 100k keys it is roughly twice as fast as the current method
For 4 dict subclasses of 100k keys it is ~50% faster
For 4 dicts of 20 keys it is ~66% faster
For 4 dict subclasses of 20 keys it is ~20% faster

Testing using a dict with string keys
For 4 dicts of 100k keys it is ~75% as faster
For 4 dict subclasses of 100k keys it is ~72% faster
For 4 dicts of 20 keys it is ~40% faster
For 4 dict subclasses of 20 keys it is ~20% faster

@rhettinger
Copy link
Contributor

rhettinger commented Nov 1, 2022

I appreciate the heroic effort but just want to do the simpler edit mentioned in the issue tracker.

The code about looks like it is "killing a mosquito with a cannon" by breaking down the various subcases and using a different approach for each. It seems to try every trick in the book to squeeze out a few clock cycles. As a result, the code is much harder to understand and will be more difficult to maintain and test.

The timing results are dubious. Adding lambdas, groupby, argument unpacking, and multiple conditionals is likely to slow the common cases. Also the interpreter performance changed quite a bit in 3.11 so the relative performance of the components will be different. The code above has the hallmarks of overfitting to a particular Python version , particular build, and to a particular set of benchmarks.

Thank you again, but I really don't want to go down this path. If the performance of ChainMap() ever becomes an issue, I will likely just make a C extension but leave the Python code in a relatively clean form.

@rhettinger rhettinger closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2022
rhettinger added a commit to rhettinger/cpython that referenced this issue Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants