Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

TrayanZapryanov
Copy link
Contributor

@TrayanZapryanov TrayanZapryanov commented Sep 1, 2022

Replace loop with Span.StartsWith.

On my machine following benchmark:

 private char[] data = " DOCTYPE abc".ToCharArray();

[Benchmark(Baseline = true)]
public bool Before() => StrEqual(data, 1, 7, "DOCTYPE");

[Benchmark]
public bool After() => StrEqual2(data, 1, 7, "DOCTYPE");

internal static bool StrEqual(char[] chars, int strPos1, int strLen1, string str2)
{
    if (strLen1 != str2.Length)
    {
        return false;
    }

    Debug.Assert(chars != null);

    int i = 0;
    while (i < strLen1 && chars[strPos1 + i] == str2[i])
    {
        i++;
    }

    return i == strLen1;
}

internal static bool StrEqual2(char[] chars, int strPos1, int strLen1, string str2)
{
    if (strLen1 != str2.Length)
    {
        return false;
    }

    Debug.Assert(chars != null);

    return chars.AsSpan(strPos1, strLen1).StartsWith(str2);
}

produces following results:
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19043.1889/21H1/May2021Update)
11th Gen Intel Core i9-11900K 3.50GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.2.22153.17
[Host] : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
DefaultJob : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2

Method Mean Ratio Code Size Allocated Alloc Ratio
Before 3.599 ns 1.00 74 B - NA
After 2.046 ns 0.57 391 B - NA

@ghost ghost added area-System.Xml community-contribution Indicates that the PR has been added by a community member labels Sep 1, 2022
@ghost
Copy link

ghost commented Sep 1, 2022

Tagging subscribers to this area: @dotnet/area-system-xml
See info in area-owners.md if you want to be subscribed.

Issue Details

Replace loop with Span.SequenceEqual.

On my machine following benchmark:

 private char[] data = " DOCTYPE abc".ToCharArray();

[Benchmark(Baseline = true)]
public bool Before() => StrEqual(data, 1, 7, "DOCTYPE");

[Benchmark]
public bool After() => StrEqual2(data, 1, 7, "DOCTYPE");

internal static bool StrEqual(char[] chars, int strPos1, int strLen1, string str2)
{
    if (strLen1 != str2.Length)
    {
        return false;
    }

    Debug.Assert(chars != null);

    int i = 0;
    while (i < strLen1 && chars[strPos1 + i] == str2[i])
    {
        i++;
    }

    return i == strLen1;
}

internal static bool StrEqual2(char[] chars, int strPos1, int strLen1, string str2)
{
    if (strLen1 != str2.Length)
    {
        return false;
    }

    Debug.Assert(chars != null);

    return chars.AsSpan(strPos1, strLen1).SequenceEqual(str2.AsSpan());
}

produces following results:
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19043.1889/21H1/May2021Update)
11th Gen Intel Core i9-11900K 3.50GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.2.22153.17
[Host] : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
DefaultJob : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2

Method Mean Ratio Code Size Allocated Alloc Ratio
Before 4.057 ns 1.00 80 B - NA
After 2.385 ns 0.59 395 B - NA
Author: TrayanZapryanov
Assignees: -
Labels:

area-System.Xml

Milestone: -

Replace with StartsWith as suggested.
@TrayanZapryanov TrayanZapryanov changed the title Use Span<char>.SequenceEqual instead if manual loop in XmlConvert.StrEqual Remove XmlConvert.StrEqual and use Span<char>.StartsWith() instead. Sep 1, 2022
@TrayanZapryanov
Copy link
Contributor Author

I think changes are ready for review. Failing tests are not related with changes.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Do we have tests that appropriately exercise these code paths? e.g. If you do a code coverage run, are all of these hit?

@TrayanZapryanov
Copy link
Contributor Author

It looks good. Do we have tests that appropriately exercise these code paths? e.g. If you do a code coverage run, are all of these hit?

All that I know is that on first commit several tests failed.
When I try VS "Analyze Code Coverage" menu from tests - cannot see code coverage of System.Xml.dll
image

If somebody tell me how to receive results, I can check.

@TrayanZapryanov
Copy link
Contributor Author

@stephentoub If possible to ask something not related to this PR?
I saw that there is NameTable class which is behaving as standard HashSet, but can search by string or char[].
My attempt was to replace it with HashSet<Node> and play with Node's equal methods.
It works, but when I benchmark - it says regression with ~ 30%.
The question is : Do you know any other trick that can squeeze few more nanoseconds :)

Here it is how my table looks like :

public class ImprovedNameTable
	{
		private readonly struct Node : IEquatable<Node>
		{
			public Node(string data)
			{
				Data = data;
				_dataAsArray = null;
				_start = 0;
				_len = 0;
			}

			//Search ctor only
			public Node(char[] dataAsArray, int start, int len)
			{
				Data = null;
				_dataAsArray = dataAsArray;
				_start = start;
				_len = len;
			}

			public string Data { get; }

			private readonly char[] _dataAsArray;
			private readonly int _start;
			private readonly int _len;

			public bool Equals(Node other)
			{
				return other.Data?.Equals(Data) ?? other._dataAsArray.AsSpan(other._start, other._len).SequenceEqual(Data.AsSpan());
			}

			public override bool Equals(object obj)
			{
				return obj is Node n && Equals(n);
			}

			public override int GetHashCode()
			{
				return string.GetHashCode(Data != null ? Data.AsSpan() : _dataAsArray.AsSpan(_start, _len));
			}
		}

		private readonly HashSet<Node> _items = new HashSet<Node>();

		public string Add(string key)
		{
			ArgumentNullException.ThrowIfNull(key);

			int len = key.Length;
			if (len == 0)
			{
				return string.Empty;
			}

			var checkItem = new Node(key);
			if (_items.TryGetValue(checkItem, out var actualValue))
				return actualValue.Data;

			_items.Add(checkItem);

			return checkItem.Data;
		}

		public string Add(char[] key, int start, int len)
		{
			if (len == 0)
			{
				return string.Empty;
			}

			// Compatibility check to ensure same exception as previous versions
			// independently of any exceptions throw by the hashing function.
			// note that NullReferenceException is the first one if key is null.
			if (start >= key.Length || start < 0 || (long)start + len > (long)key.Length)
			{
				throw new IndexOutOfRangeException();
			}

			// Compatibility check for len < 0, just throw the same exception as new string(key, start, len)
			if (len < 0)
			{
				throw new ArgumentOutOfRangeException(nameof(len));
			}

			var checkItem = new Node(key, start, len);
			if (_items.TryGetValue(checkItem, out var actualValue))
				return actualValue.Data;

			checkItem = new Node(new string(key, start, len));
			_items.Add(checkItem);

			return checkItem.Data;
		}
	}

@stephentoub
Copy link
Member

When I try VS "Analyze Code Coverage" menu from tests - cannot see code coverage of System.Xml.dll

dotnet build /t:test /p:Coverage=true

inside of the relevant test directory. Unfortunately the System.Private.Xml tests are a bit of a mess and are spread out over many test projects (I'd really like to see us consolidate them all into a single project, but that's not your problem 😄). @krwq, which is the right test project for this functionality?

The question is : Do you know any other trick that can squeeze few more nanoseconds :)

I'd need to profile it to see where the time is being spent :) But for starters in the case of it needing to add to the table, it's doing two lookups, one in TryGetValue and one in Add; I'm not sure what your test is actually exercising, but if it's the add path, I'd start there and find a way to reduce that to a single lookup rather than two.

@TrayanZapryanov
Copy link
Contributor Author

which is the right test project for this functionality?

I've check code coverage with System.Private.Xml\tests\XmlSerializer and System.Private.Xml\tests\XmlDocument and was able to find most of code paths covered, but for example changes in XmlTextReaderImpl.IncrementalRead are not executed there.

@stephentoub
Copy link
Member

but for example changes in XmlTextReaderImpl.IncrementalRead are not executed there

Thanks for checking. Is it possible to add tests that ensures they are covered?

@TrayanZapryanov
Copy link
Contributor Author

dotnet build /t:test /p:Coverage=true

I've tried to throw exception in

and then to run all tests in System.Private.Xml, but none of them failed. Looks like this is not covered or maybe some other module is using it. Unfortunately I am not so deep in the code to understand use case of it. If possible somebody to give me a hint of sample xml , and I will create tests for it. I suspect that this is something connected with binary data inside xml, or DataContract serializer is using it.

@TrayanZapryanov
Copy link
Contributor Author

@stephentoub First of all - thanks for consolidating xml tests.
I've run them and verified that IncrementalRead() is not covered.

private int IncrementalRead(Array array, int index, int count)

I will be glad to add tests, just need help from somebody to give me hint - when it is expected this method to be used.

Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Thank you for your contribution! Couple of remarks:

  • Would it be possible to re-run your benchmark, just to validate that perf gains are preserved after review feedback changes?
  • Not very familiar with the codebase to suggest improvements to code coverage, perhaps @krwq can help. But I think we can merge this change without that.

@TrayanZapryanov
Copy link
Contributor Author

I've tried all benchmarks with Xml in it's name from performance repo and here it is the result:

Statistics

Total: 69
Same: 71.01 %
Slower: 1.45 %
Faster: 18.84 %
Noise: 8.70 %
Unknown: 0.00 %

Statistics per Architecture

Architecture Same Slower Faster Noise Unknown
X64 71.01 % 1.45 % 18.84 % 8.70 % 0.00 %

Statistics per Operating System

Operating System Same Slower Faster Noise Unknown
Windows 10 71.01 % 1.45 % 18.84 % 8.70 % 0.00 %

Statistics per Namespace

Namespace Same Slower Faster Noise Unknown
MicroBenchmarks.Serializers 70.59 % 0.00 % 29.41 % 0.00 % 0.00 %
Microsoft.Extensions.Configuration.Xml 50.00 % 0.00 % 50.00 % 0.00 % 0.00 %
System.Xml.Linq 68.42 % 5.26 % 5.26 % 21.05 % 0.00 %
System.Xml.Tests 100.00 % 0.00 % 0.00 % 0.00 % 0.00 %
XmlDocumentTests.XmlDocumentTests 100.00 % 0.00 % 0.00 % 0.00 % 0.00 %
XmlDocumentTests.XmlNodeListTests 50.00 % 0.00 % 0.00 % 50.00 % 0.00 %
XmlDocumentTests.XmlNodeTests 50.00 % 0.00 % 0.00 % 50.00 % 0.00 %

System.Xml.Linq.Perf_XElement.GetValue

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Slower 33.48 36.97 0.91 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_FromStream.DataContractSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Same 1155.46 1169.64 0.99 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_FromStream.XmlSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Same 1525.68 1552.64 0.98 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_FromStream.DataContractSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Same 6778.21 6722.03 1.01 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_ToStream.DataContractSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 1113.15 1079.30 1.03 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_FromStream.XmlSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 2457.71 2361.26 1.04 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_FromStream.XmlSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 1806.91 1735.98 1.04 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_ToStream.DataContractSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 37360.91 35667.79 1.05 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

Microsoft.Extensions.Configuration.Xml.XmlConfigurationProviderBenchmarks.Load(FileName: "names.xml")

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 83758.74 78715.64 1.06 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_FromStream.XmlSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 348408.30 326648.50 1.07 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_ToStream.DataContractSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 423996.80 397174.84 1.07 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

System.Xml.Linq.Perf_XElementList.Enumerator

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 142.68 133.26 1.07 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_ToStream.DataContractSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 528.73 493.30 1.07 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz several?

MicroBenchmarks.Serializers.Xml_FromStream.XmlSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 5321.29 4937.07 1.08 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_FromStream.XmlSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 50184.23 46527.30 1.08 -624 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

Microsoft.Extensions.Configuration.Xml.XmlConfigurationProviderBenchmarks.Load(FileName: "repeated.xml")

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 65864.41 60819.09 1.08 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

MicroBenchmarks.Serializers.Xml_FromStream.XmlSerializer_

Result Base Diff Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 502496.12 460189.70 1.09 +0 Windows 10 X64 11th Gen Intel Core i9-11900K 3.50GHz

@eiriktsarpalis
Copy link
Member

Thanks, I think this looks good. @stephentoub any objection to merging without added coverage?

@stephentoub
Copy link
Member

any objection to merging without added coverage?

How do we know it didn't break something?

@stephentoub
Copy link
Member

stephentoub commented Sep 21, 2022

just need help from somebody to give me hint - when it is expected this method to be used

If you start from https://source.dot.net/#q=incrementalread and click on IncrementalRead in the left list, that will bring you to the source for that method. Then click on the method name in the code pane on the right, and the pane on the left will show you all the call sites. You can then click on one of those to navigate the right pane to that usage, and continue that process to look for all the ways to get to this function.

@TrayanZapryanov
Copy link
Contributor Author

@eiriktsarpalis, @stephentoub
XmlTextReader.IncrementalRead method was used by XmlTextReader.ReadChars.
I've added one test with sample xml found in the solution which contains a lot of different cases and validating output.
Now all changes are covered + we have better coverage.
Before(based on consolidating tests commit) :
+--------------------+--------+--------+--------+
| Module | Line | Branch | Method |
+--------------------+--------+--------+--------+
| System.Private.Xml | 52.68% | 43.27% | 60.92% |
+--------------------+--------+--------+--------+

Now:
+--------------------+--------+--------+--------+
| Module | Line | Branch | Method |
+--------------------+--------+--------+--------+
| System.Private.Xml | 52.84% | 43.44% | 61.01% |
+--------------------+--------+--------+--------+

}

private static string GenerateTestXml(out string expected)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider just using const and and possibly string.Replace for line ending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found this way of generating test xmls in here:

public static void CreateGenericTestFile(string strFileName)
, and decided that this is the preferred way in this repo.
If you like - I can change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code base is old and ugly, let's at least make new code prettier since we don't have time on improving the entire code base

// ParseQName can flush the buffer, so we need to update the startPos, pos and chars after calling it
int endPos = ParseQName(true, 1, out _);
if (XmlConvert.StrEqual(_ps.chars, _ps.charPos + 1, endPos - _ps.charPos - 1, _curNode.localName) &&
if (endPos - _ps.charPos - 1 == _curNode.localName.Length && _ps.chars.AsSpan(_ps.charPos + 1).StartsWith(_curNode.localName) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not slice exact length (endPos - _ps.charPos - 1) and check with equals rather than doing check separately?

Copy link
Member

@krwq krwq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eiriktsarpalis eiriktsarpalis merged commit abffaf8 into dotnet:main Oct 7, 2022
@eiriktsarpalis
Copy link
Member

Thanks @TrayanZapryanov!

@ghost ghost locked as resolved and limited conversation to collaborators Nov 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Xml community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants