Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[API Proposal]: Regex.EnumerateSplits #100369

@stephentoub

Description

@stephentoub

Background and motivation

In .NET 7, we added the EnumerateMatches methods to enable ammortized allocation-free support for matching. However, the Regex.Split method is handy for finding the gaps between matches, and using EnumerateMatches to achieve that is non-trivial; developers then use the more expensive Split.

API Proposal

namespace System.Collections.Generic;

public class Regex
{
    public static ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input, [StringSyntax(StringSyntaxAttribute.Regex)] string pattern);
    public static ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input, [StringSyntax(StringSyntaxAttribute.Regex, nameof(options))] string pattern, RegexOptions options);
    public static ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input, [StringSyntax(StringSyntaxAttribute.Regex, nameof(options))] string pattern, RegexOptions options, TimeSpan matchTimeout);

    public ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input);
    public ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input, int count);
    public ValueSplitEnumerator EnumerateSplits(ReadOnlySpan<char> input, int count, int startat);
    
    public ref struct ValueSplitEnumerator
    {
        public readonly ValueSplitEnumerator GetEnumerator();
        public bool MoveNext();
        public readonly Range Current { get; }
    }
}

API Usage

Regex regex = ...;
ReadOnlySpan<char> input = ...;
foreach (Range range in regex.EnumerateSplits(input))
{
    ReadOnlySpan<char> word = input[range];
    ...
}

Alternative Designs

  • Whereas EnumerateMatches yields custom ValueMatch instances, this yields Ranges. We designed ValueMatch to accomodate a future where it could expose capture information (it doesn't today), but for splits there's no such additional info, just the range between matches. Using Range also matches the new span.Split methods added in .NET 8.
  • The overloads exactly match what's exposed for EnumerateMatches, just returning a ValueSplitEnumerator instead of a ValueMatchEnumerator.
  • There are two behavioral differences from Split. 1) For some reason, Split not only includes the splits between matches, but it also includes any capture groups from the matches; that is both unintuitive and adds a lot of overhead and complication for the span/enumerator-based API that's ammortized allocation-free, so it's not included in EnumerateSplits. And 2) if RightToLeft is specified, Split reverses the array so that the results are still left-to-right, but as EnumerateSplits is yielding the splits as they're found, its results are still right-to-left with such options.
  • The overloads accepting int count are less important with EnumerateSplits, as a caller can always choose to stop iterating. However, they're included for two reasons: 1) to keep the overload shape the same with Split, so that someone calling the input, count overload switching to use EnumerateSplits doesn't implicitly start calling a input, startat overload, and 2) to keep the behavior the same for the last split, which when the count is smaller than the actual number will end up including all of the remainder of the input.

Risks

No response

Metadata

Metadata

Assignees

Labels

api-approvedAPI was approved in API review, it can be implementedarea-System.Text.RegularExpressionsin-prThere is an active PR which will close this issue when it is merged

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions