-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Improve SortedSet performance #1955
Conversation
cc: @ellismg, @FiveTimesTheFun |
Aside from the naming of the local, the change looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 this should probably have been part of a different commit.
No major problems stood out to me. I did comment on some minor items and had a question. I do wish this was separated into one commit (or perhaps even one PR) per topic, but I say this as a reminder for the future and not as a reason to reject this PR. |
I take that back... I'm concerned about the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You no longer need to set the version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the constructor. Setting _version
to 0 here is redundant because the CLR initializes all fields to their default values.
- SortedSet's ctor has a loop that's O(D * N), where N is the number of elements being added to the set and D is the number of duplicates in those elements. As the number of duplicates D grows, the time it takes to construct a sorted set grows polynomially, approaching O(N^2), which is particularly bad given that the purpose of a set is to help remove duplicates. The core cause is a loop over the elements to determine whether element n is equal to n-1; if it is, element n is removed, but it's removed via a call to List<T>.RemoveAt, which will shift down all of the elements above it in the list. A better implementation can make this O(N) instead; the ctor overall is still then O(N log N), due to the sort, but that's better than O(N^2). - To sort the elements, the ctor builds a List<T> of the elements but then uses ToArray to get an array used to actually build the set; this incurs an unnecessary (and potentially large) array allocation and copy. We can avoid that entirely by just building up the array manually using the same logic that List<T> does. That same logic is actually duplicated in multiple places, including in Stack<T>, so I've extracted it as a helper to use in both places (there are a few other places in corefx that use the same logic, hence I put the helper into a Common file, but I did not condense those other uses as part of this commit). This then further helps with Stack<T> perf in a few corner cases, such as when a Stack<T> is constructed with an empty enumerable. (It's not clear to me why Queue<T> doesn't have the same IColection<T>-related logic; probably just an oversight, but I didn't want to change its behavior with the additional ICollection<T> check.)
SortedSet's BreadthFirstTreeWalk (which is used by several members, such as RemoveWhere) is currently using a List as a queue, doing a RemoveAt(0) to dequeue, which will incur the cost of shifting all elements down. Using a Queue is both cleaner and better performing.
As I was touching Stack, I noticed that it and Queue had an empty array static field, which was just being initialized to Array.Empty(). The field is unnecessary, and the call site can just use Array.Empty directly. This commit just cleans that up and some unnecessary field setting in the ctors.
2f42b97
to
6337e28
Compare
Thanks. Updated. |
@dotnet-bot test this please |
Improve SortedSet performance
Improve SortedSet performance Commit migrated from dotnet/corefx@c8155b6
Primary changes:
Fixes #1953.