-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Implement IAlternateEqualityComparer support for Dictionary, HashSet, ConcurrentDictionary #102907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement IAlternateEqualityComparer support for Dictionary, HashSet, ConcurrentDictionary #102907
Conversation
… ConcurrentDictionary This adds `IAlternateEqualityComparer<TAlternate, T>`, implements `IAlternateEqualityComparer<ReadOnlySpan<char>, string>` on all of the `StringComparer` singletons, and adds associated `AlternateLookup` types to `Dictionary<TKey, TValue>`, `HashSet<T>`, and `ConcurrentDictionary<TKey, TValue>`.
Note regarding the
|
Tagging subscribers to this area: @dotnet/area-system-runtime |
Out of interest, could this extend to having the comparers implementing IAlternateEqualityComparer<ReadOnlySpan, string> (or some new UTF8 ref struct wrapper) as well to allow for UTF-8 insertion within the same dictionary - as long as the hashes can be made consistent? |
...ibraries/System.Private.CoreLib/src/System/Collections/Generic/IAlternateEqualityComparer.cs
Outdated
Show resolved
Hide resolved
Conceptually, yes. And you could certainly write your own comparer with any number of IAlternateEqualityComparer implementations for different TAlternate types ( |
Agree yes it's not always obvious, but I'm glad it's flexible to support multiple implementations! Might only make sense with a validated UTF8 ref struct type. Thanks for the reply |
ede4ca4
to
9558a8d
Compare
Unfortunately I couldn't quite get rid of the regression here for strings when using/updating a shared implementation, so I created a duplicate implementation of the two functions for span that returns the same hash value as for string but doesn't rely on the null terminator. I kept the slower non-ASCII fallback for both, but improved it to a) use a larger stack allocated buffer and b) not rehash all of the ASCII values leading up to the first non-ASCII value. Should be good to go now. |
Where can I read more about IAlternateEqualityComparer and its uses? |
This adds
IAlternateEqualityComparer<TAlternate, T>
, implementsIAlternateEqualityComparer<ReadOnlySpan<char>, string>
on all of theStringComparer
singletons, and adds associatedAlternateLookup
types toDictionary<TKey, TValue>
,HashSet<T>
, andConcurrentDictionary<TKey, TValue>
.Contributes to #27229. This doesn't completely close that issue as this PR doesn't add support to
FrozenDictionary<TKey, TValue>
orFrozenSet<T>
; those will require some more thought due to how they're implemented, where the core of the search is provided by a virtual method that would likely need a corresponding generic virtual method in this scheme, and that has performance implications.This also doesn't yet implement
IAlternateEqualityComparer<TAlternate, T>
onEqualityComparer<string>.Default
. I'd prototyped doing so, but have a few open issues and have separated it out into #102906. The user-facing ramification if we don't get to that in .NET 9 is just that to be able to use the new lookups the dictionary/set will need to have a StringComparer (or equivalent) comparer explicitly passed into the collection's constructor.I still need to do some perf validation. In particular, I tweaked the string hash code routines to accomodate span inputs (the existing logic took advantage of the \0 guarantee at the end of a string), and I need to see how much this impacts things and whether it'll instead be worth duplicating the functions to have one for string and one for span.
I also added a few APIs that weren't explicitly approved in the API review meeting, but I believe that was an oversight:
There's more code duplication than I'd like between the routines on the main class and the routines on the lookups; we may want to revisit that.
Our choice around exposing the GetAlternateLookup methods as extensions for Dictionary/HashSet does make them more awkward to use, as you need to provide all of the generic parameters, not just the TAlternateKey one. This isn't the case for ConcurrentDictionary, where we did it as an instance method, and thus you don't need to provide TKey/TValue again. This is a visible inconsistency as to how they're consumed.