-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed
Labels
api-approvedAPI was approved in API review, it can be implementedAPI was approved in API review, it can be implementedarea-System.Text.RegularExpressionsin-prThere is an active PR which will close this issue when it is mergedThere is an active PR which will close this issue when it is merged
Milestone
Description
Background and motivation
In .NET 7, we added the EnumerateMatches methods to enable ammortized allocation-free support for matching. However, the Regex.Split method is handy for finding the gaps between matches, and using EnumerateMatches to achieve that is non-trivial; developers then use the more expensive Split.
API Proposal
namespace System.Collections.Generic;
public class Regex
{
public static ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input, [StringSyntax(StringSyntaxAttribute.Regex)] string pattern);
public static ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input, [StringSyntax(StringSyntaxAttribute.Regex, nameof(options))] string pattern, RegexOptions options);
public static ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input, [StringSyntax(StringSyntaxAttribute.Regex, nameof(options))] string pattern, RegexOptions options, TimeSpan matchTimeout);
public ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input);
public ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input, int count);
public ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input, int count, int startat);
public ref struct ValueSplitEnumerator
{
public readonly ValueSplitEnumerator GetEnumerator();
public bool MoveNext();
public readonly Range Current { get; }
}
}
API Usage
Regex regex = ...;
ReadOnlySpan<char> input = ...;
foreach (Range range in regex.EnumerateSplits(input))
{
ReadOnlySpan<char> word = input[range];
...
}
Alternative Designs
- Whereas EnumerateMatches yields custom ValueMatch instances, this yields Ranges. We designed ValueMatch to accomodate a future where it could expose capture information (it doesn't today), but for splits there's no such additional info, just the range between matches. Using Range also matches the new span.Split methods added in .NET 8.
- The overloads exactly match what's exposed for EnumerateMatches, just returning a ValueSplitEnumerator instead of a ValueMatchEnumerator.
- There are two behavioral differences from Split. 1) For some reason, Split not only includes the splits between matches, but it also includes any capture groups from the matches; that is both unintuitive and adds a lot of overhead and complication for the span/enumerator-based API that's ammortized allocation-free, so it's not included in EnumerateSplits. And 2) if RightToLeft is specified, Split reverses the array so that the results are still left-to-right, but as EnumerateSplits is yielding the splits as they're found, its results are still right-to-left with such options.
- The overloads accepting
int count
are less important with EnumerateSplits, as a caller can always choose to stop iterating. However, they're included for two reasons: 1) to keep the overload shape the same with Split, so that someone calling theinput, count
overload switching to use EnumerateSplits doesn't implicitly start calling ainput, startat
overload, and 2) to keep the behavior the same for the last split, which when the count is smaller than the actual number will end up including all of the remainder of the input.
Risks
No response
neon-sunset, Wraith2, kronic, PaulusParssinen, nil4 and 6 more
Metadata
Metadata
Assignees
Labels
api-approvedAPI was approved in API review, it can be implementedAPI was approved in API review, it can be implementedarea-System.Text.RegularExpressionsin-prThere is an active PR which will close this issue when it is mergedThere is an active PR which will close this issue when it is merged