Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

stephentoub
Copy link
Member

This adds IAlternateEqualityComparer<TAlternate, T>, implements IAlternateEqualityComparer<ReadOnlySpan<char>, string> on all of the StringComparer singletons, and adds associated AlternateLookup types to Dictionary<TKey, TValue>, HashSet<T>, and ConcurrentDictionary<TKey, TValue>.

Contributes to #27229. This doesn't completely close that issue as this PR doesn't add support to FrozenDictionary<TKey, TValue> or FrozenSet<T>; those will require some more thought due to how they're implemented, where the core of the search is provided by a virtual method that would likely need a corresponding generic virtual method in this scheme, and that has performance implications.

This also doesn't yet implement IAlternateEqualityComparer<TAlternate, T> on EqualityComparer<string>.Default. I'd prototyped doing so, but have a few open issues and have separated it out into #102906. The user-facing ramification if we don't get to that in .NET 9 is just that to be able to use the new lookups the dictionary/set will need to have a StringComparer (or equivalent) comparer explicitly passed into the collection's constructor.

I still need to do some perf validation. In particular, I tweaked the string hash code routines to accomodate span inputs (the existing logic took advantage of the \0 guarantee at the end of a string), and I need to see how much this impacts things and whether it'll instead be worth duplicating the functions to have one for string and one for span.

I also added a few APIs that weren't explicitly approved in the API review meeting, but I believe that was an oversight:

  • I added a Dictionary property to the lookups for {Concurrent}Dictionary and a Set property to the lookup for HashSet. This is so that you can just store the lookup struct and access the original object through it. If you look back at the history of the issue and API review discussions, we'd planned to add these, but then I neglected to copy/paste them over for the final review. We could still debate the actual name used.
  • We neglected to add indexers to the dictionary lookups, but they're important, in particular the setter, in order to support update behavior, which otherwise isn't exposed.
  • For the {Try}Remove overloads on the dictionaries that took an out TValue, I added an out TKey actualKey parameter.
  • (We might also want to consider GetOrAdd/AddOrUpdate methods for ConcurrentDictionary, but I didn't add those.)

There's more code duplication than I'd like between the routines on the main class and the routines on the lookups; we may want to revisit that.

Our choice around exposing the GetAlternateLookup methods as extensions for Dictionary/HashSet does make them more awkward to use, as you need to provide all of the generic parameters, not just the TAlternateKey one. This isn't the case for ConcurrentDictionary, where we did it as an instance method, and thus you don't need to provide TKey/TValue again. This is a visible inconsistency as to how they're consumed.

… ConcurrentDictionary

This adds `IAlternateEqualityComparer<TAlternate, T>`, implements `IAlternateEqualityComparer<ReadOnlySpan<char>, string>` on all of the `StringComparer` singletons, and adds associated `AlternateLookup` types to `Dictionary<TKey, TValue>`, `HashSet<T>`, and `ConcurrentDictionary<TKey, TValue>`.
@ghost
Copy link

ghost commented May 31, 2024

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@bencyoung-Fignum
Copy link

Out of interest, could this extend to having the comparers implementing IAlternateEqualityComparer<ReadOnlySpan, string> (or some new UTF8 ref struct wrapper) as well to allow for UTF-8 insertion within the same dictionary - as long as the hashes can be made consistent?

@stephentoub
Copy link
Member Author

Out of interest, could this extend to having the comparers implementing IAlternateEqualityComparer `<ReadOnlySpan, string> (or some new UTF8 ref struct wrapper) as well to allow for UTF-8 insertion within the same dictionary - as long as the hashes can be made consistent?

Conceptually, yes. And you could certainly write your own comparer with any number of IAlternateEqualityComparer implementations for different TAlternate types (ReadOnlySpan<char>, ReadOnlySpan<byte>, etc.) We're not planning to do so on the StringComparers this release, and it's an open question whether we'd do so in the future. There are corner-cases such as what string is new byte[] { 0xFF } equal to? Is it equal to Encoding.UTF8.GetString(new byte[] { 0xFF })? And if so, is it also equal to Encoding.UTF8.GetString(new byte[] { 0xFE }), since those produce the same strings? This brings policy into these core types, so while technically we can do it, it's a more complicated discussion.

@bencyoung-Fignum
Copy link

Out of interest, could this extend to having the comparers implementing IAlternateEqualityComparer `<ReadOnlySpan, string> (or some new UTF8 ref struct wrapper) as well to allow for UTF-8 insertion within the same dictionary - as long as the hashes can be made consistent?

Conceptually, yes. And you could certainly write your own comparer with any number of IAlternateEqualityComparer implementations for different TAlternate types (ReadOnlySpan<char>, ReadOnlySpan<byte>, etc.) We're not planning to do so on the StringComparers this release, and it's an open question whether we'd do so in the future. There are corner-cases such as what string is new byte[] { 0xFF } equal to? Is it equal to Encoding.UTF8.GetString(new byte[] { 0xFF })? And if so, is it also equal to Encoding.UTF8.GetString(new byte[] { 0xFE }), since those produce the same strings? This brings policy into these core types, so while technically we can do it, it's a more complicated discussion.

Agree yes it's not always obvious, but I'm glad it's flexible to support multiple implementations! Might only make sense with a validated UTF8 ref struct type. Thanks for the reply

@stephentoub stephentoub force-pushed the alternateequalitycomparer branch from ede4ca4 to 9558a8d Compare June 5, 2024 14:47
@stephentoub
Copy link
Member Author

I still need to do some perf validation. In particular, I tweaked the string hash code routines to accomodate span inputs (the existing logic took advantage of the \0 guarantee at the end of a string), and I need to see how much this impacts things and whether it'll instead be worth duplicating the functions to have one for string and one for span.

Unfortunately I couldn't quite get rid of the regression here for strings when using/updating a shared implementation, so I created a duplicate implementation of the two functions for span that returns the same hash value as for string but doesn't rely on the null terminator. I kept the slower non-ASCII fallback for both, but improved it to a) use a larger stack allocated buffer and b) not rehash all of the ASCII values leading up to the first non-ASCII value.

Should be good to go now.

@stephentoub stephentoub merged commit 5c403bb into dotnet:main Jun 5, 2024
@stephentoub stephentoub deleted the alternateequalitycomparer branch June 5, 2024 21:57
@TonyValenti
Copy link

Where can I read more about IAlternateEqualityComparer and its uses?

@github-actions github-actions bot locked and limited conversation to collaborators Jul 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants