KEMBAR78
Add tests and improve binary XmlDictionaryReader performance by Daniel-Svensson · Pull Request #73332 · dotnet/runtime · GitHub
Skip to content

Conversation

@Daniel-Svensson
Copy link
Contributor

@Daniel-Svensson Daniel-Svensson commented Aug 3, 2022

Summary:

  • Add tests for Binary XmlDictionaryReader
  • Replace unsafe pointer code with Span
  • Performance improvements for arrays (there looks to be small improvements to larger primitives but the difference seems to only be consistent when reading from an array and not a stream)
  • Read primitives via an helper similar to to writer
  • Fixes Reading of Decimal broken for binary XmlDictionaryReader #73934
    • Don't forget to call Advance
    • Read bytes in correct order

This is a follow up on #71478 with the corresponding changes to the reader side.
The main motivation is to add tests and keep the reader and writer somewhat similar which gave some nice array performance improvements.

Note: The introduction of XmlBinaryNodeType.cs will lead to merge conflicts with #71478 which I can fix if either of these are merged. I think the approach in this PR with a separate file with hardcoded values might be better. I updated the other PR with identical changes as here so it might be possible to merge them without conflict.

Benchmarks

In short reader is initialized and then a large number of elements <a>VALUE</a> are read (a a large part of the test is spend reading start/end elements and not just the content)

Source on github

PR

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 5 5600U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.5.22307.18
  [Host] : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  span   : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT

Job=span  MaxRelativeError=0.01  BuildConfiguration=span  
IterationTime=250.0000 ms  
Method Source Mean Error StdDev
ReadInt64 MemoryStream 75.00 ns 0.695 ns 0.683 ns
ReadInt16 MemoryStream 75.33 ns 0.659 ns 0.617 ns
ReadDoubleArray MemoryStream 379.81 ns 3.769 ns 5.977 ns
ReadGuidArray MemoryStream 1,001.54 ns 9.922 ns 11.426 ns
ReadInt64 Bytes 38.70 ns 0.347 ns 0.371 ns
ReadInt16 Bytes 39.02 ns 0.377 ns 0.564 ns
ReadDoubleArray Bytes 215.22 ns 2.124 ns 3.937 ns
ReadGuidArray Bytes 678.15 ns 6.610 ns 7.073 ns

Main

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 5 5600U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.5.22307.18
  [Host] : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  Before : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  main   : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT

MaxRelativeError=0.01  IterationTime=250.0000 ms  
Method Source Mean Error StdDev
ReadInt64 MemoryStream 76.12 ns 0.530 ns 0.521 ns
ReadInt16 MemoryStream 73.53 ns 0.690 ns 0.678 ns
ReadDoubleArray MemoryStream 5,086.60 ns 41.306 ns 34.493 ns
ReadGuidArray MemoryStream 22,419.31 ns 224.082 ns 230.116 ns
ReadInt64 Bytes 39.75 ns 0.375 ns 0.432 ns
ReadInt16 Bytes 41.84 ns 0.369 ns 0.308 ns
ReadDoubleArray Bytes 3,909.31 ns 8.160 ns 6.814 ns
ReadGuidArray Bytes 15,829.02 ns 31.039 ns 24.233 ns

@ghost ghost added area-Serialization community-contribution Indicates that the PR has been added by a community member labels Aug 3, 2022
{
const int chunk = 256;
while (length >= chunk)
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the chunked read of at most 256bytes at a time for the stream version since I belive that the approach was partly due to reduce risk of DOS attacks where large array sizes are specified, however GetBuffer (actually TryEnsureBytes) now contains a fix with mitigations against that.

If you think it is better to keep the chunked read I can add it back, but then maybe the size can be increased to something like 4k or even 8k (to almost read a jumbo frame) ?

int actual = Math.Min(count, _arrayCount);
for (int i = 0; i < actual; i++)
// Try to read in whole array, but don't fail if not possible
BufferReader.GetBuffer(actual * ValueHandleLength.DateTime, out _, out _);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Se comment on UnsafeReadarray/ReadRawArrayBytes about chunked reads, if it should be added back then these GetBuffer calls should probably be removed
  2. Is it better to use overload with a single out parameter that throws an exception directly if the file ends prematurely ? The current approach keeps the old behaviour where these "special" types will partially fill arrays in case of premature end-of-file

@Daniel-Svensson Daniel-Svensson marked this pull request as ready for review August 4, 2022 07:07
@danmoseley
Copy link
Member

@HongGit who is the right reviewer here also?

BinaryPrimitives.ReadInt32LittleEndian(bytes.Slice(8)),
BinaryPrimitives.ReadInt32LittleEndian(bytes.Slice(12))
};

Copy link
Contributor Author

@Daniel-Svensson Daniel-Svensson Aug 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to parsing was tested against unit tests before adding back fast path for LittleEndian case.

Without fixing this the method will throw exception and abort when deserializing decimal from binary xml.

Note to self:

Copy link
Contributor Author

@Daniel-Svensson Daniel-Svensson Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the decimal constructor can throw a different exception for invalid decimal than what happens in net framework code.

If it is important to add checking for invalid decimal then msybe it is best to do something similar to net framework here instead (and do it before decimal ctor) https://referencesource.microsoft.com/#System.Runtime.Serialization/System/Xml/XmlBufferReader.cs,469

I can follow up with such a commit if important,.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephenMolloy I did a separate commit to add the decimal flag validation from net48 but it was not pushed. If you find that check is important and worth the time I can open up a PR. (It is not important to me)

Copy link
Member

@StephenMolloy StephenMolloy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@StephenMolloy StephenMolloy merged commit 5dbdde9 into dotnet:main Aug 15, 2022
@uweigand
Copy link
Contributor

This seems to have added tests that are failing on s390x (a big-endian platform):

      <test name="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_Array_RoundTrip" type="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests" method="BinaryXml_Array_RoundTrip" time="0.0221026" result="Fail">
        <failure exception-type="Xunit.Sdk.XunitException">
          <message><![CDATA[Showing first 10 differences\n  Position 1: Expected: 16909060, Actual: 67305985\n  Position 2: Expected: 287454020, Actual: 1144201745\nTotal number of differences: 2 out of 4]]></message>
          <stack-trace><![CDATA[   at System.AssertExtensions.SequenceEqual[Int32](ReadOnlySpan`1 expected, ReadOnlySpan`1 actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 476
   at System.AssertExtensions.SequenceEqual[Int32](Span`1 expected, Span`1 actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 491
   at System.AssertExtensions.SequenceEqual[Int32](Int32[] expected, Int32[] actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 493
   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_Array_RoundTrip() in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 241
   at System.Reflection.MethodInvoker.InterpretedInvoke(Object obj, Span`1 args, BindingFlags invokeAttr) in /home/uweigand/runtime/src/mono/System.Private.CoreLib/src/System/Reflection/MethodInvoker.Mono.cs:line 33]]></stack-trace>
        </failure>
      </test>

and

      <test name="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_ReadPrimitiveTypes" type="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests" method="BinaryXml_ReadPrimitiveTypes" time="0.0561528" result="Fail">
        <failure exception-type="Xunit.Sdk.EqualException">
          <message><![CDATA[Assert.Equal() Failure\nExpected: 1.23456788\nActual:   1.44545137E+11]]></message>
          <stack-trace><![CDATA[   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.AssertReadContentFromBinary[Single](Single expected, XmlBinaryNodeType nodeType, ReadOnlySpan`1 bytes) in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 262
   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_ReadPrimitiveTypes() in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 192
   at System.Reflection.MethodInvoker.InterpretedInvoke(Object obj, Span`1 args, BindingFlags invokeAttr) in /home/uweigand/runtime/src/mono/System.Private.CoreLib/src/System/Reflection/MethodInvoker.Mono.cs:line 33]]></stack-trace>
        </failure>
      </test>

Both of these fail the same way in System.Runtime.Serialization.Xml.Tests and System.Runtime.Serialization.Xml.ReflectionOnly.Tests.

@Daniel-Svensson
Copy link
Contributor Author

Daniel-Svensson commented Aug 17, 2022

This seems to have added tests that are failing on s390x (a big-endian platform):

      <test name="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_Array_RoundTrip" type="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests" method="BinaryXml_Array_RoundTrip" time="0.0221026" result="Fail">
        <failure exception-type="Xunit.Sdk.XunitException">
          <message><![CDATA[Showing first 10 differences\n  Position 1: Expected: 16909060, Actual: 67305985\n  Position 2: Expected: 287454020, Actual: 1144201745\nTotal number of differences: 2 out of 4]]></message>
          <stack-trace><![CDATA[   at System.AssertExtensions.SequenceEqual[Int32](ReadOnlySpan`1 expected, ReadOnlySpan`1 actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 476
   at System.AssertExtensions.SequenceEqual[Int32](Span`1 expected, Span`1 actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 491
   at System.AssertExtensions.SequenceEqual[Int32](Int32[] expected, Int32[] actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 493
   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_Array_RoundTrip() in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 241
   at System.Reflection.MethodInvoker.InterpretedInvoke(Object obj, Span`1 args, BindingFlags invokeAttr) in /home/uweigand/runtime/src/mono/System.Private.CoreLib/src/System/Reflection/MethodInvoker.Mono.cs:line 33]]></stack-trace>
        </failure>
      </test>

and

      <test name="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_ReadPrimitiveTypes" type="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests" method="BinaryXml_ReadPrimitiveTypes" time="0.0561528" result="Fail">
        <failure exception-type="Xunit.Sdk.EqualException">
          <message><![CDATA[Assert.Equal() Failure\nExpected: 1.23456788\nActual:   1.44545137E+11]]></message>
          <stack-trace><![CDATA[   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.AssertReadContentFromBinary[Single](Single expected, XmlBinaryNodeType nodeType, ReadOnlySpan`1 bytes) in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 262
   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_ReadPrimitiveTypes() in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 192
   at System.Reflection.MethodInvoker.InterpretedInvoke(Object obj, Span`1 args, BindingFlags invokeAttr) in /home/uweigand/runtime/src/mono/System.Private.CoreLib/src/System/Reflection/MethodInvoker.Mono.cs:line 33]]></stack-trace>
        </failure>
      </test>

Both of these fail the same way in System.Runtime.Serialization.Xml.Tests and System.Runtime.Serialization.Xml.ReflectionOnly.Tests.

The test can be reverted to succeed by partially reverting commit 51b4957 or skip tests on big endian. it might actually be almost as easy to actually fix the implementation for big endian platforms. I think the serialization team must make the decision since it would be a breaking change to fix the big endian implementation.

If it should not be fixed then the decimal reading code might need to be changed so it just calls ReadRawBytes and remove the new code for bigendian

@uweigand
Copy link
Contributor

Hmm. For the ReadPrimitiveTypes failure the underlying cause is that XmlBufferReader.ReadSingle and ReadDouble do not byte-swap their input (i.e. expect the binary format to be in native byte order). However, ReadInt16 etc. do byte-swap their input (i.e. expect the binary format to be always in little-endian byte order). This seems to have been the same before this PR, so it's not a regression (but probably still a bug).

I think the main question is how the binary serialization format is defined. Should this use native (platform) byte order, or always little-endian byte order? Most binary formats in .NET are always little-endian, to enable portability across platforms. This is probably preferable here as well, but I'm not completely sure what this format is used for ... In any case, mixing this up so that some types use native byte order while others always use little-endian seems a bug to me.

For the Array_RoundTrip failure, I think the problem is that reading the whole array via one large ReadRawArrayBytes doesn't work if the native byte order does not match the byte order in the buffer. You'll have to read and byte-swap each element separately. The current code does this for a few types (DateTime, Guid, TimeSpan), but it really should be done for all types. (This is also not a regression as the old code didn't do it either, but it still seems to be a bug anyway.)

@jkotas
Copy link
Member

jkotas commented Aug 17, 2022

I think the main question is how the binary serialization format is defined. Should this use native (platform) byte order, or always little-endian byte order?

It should be always little-endian byte order.

@Daniel-Svensson
Copy link
Contributor Author

@uweigand I've asked if it would be appropriate to update my old exising PR with big endian support, it would avoid merge conflicts but would give slughtly more work to reviewes since it is mostly already reviewed. Maybe you should see if there is a need for an issue to cover the big endian support for binaryxml.

@uweigand
Copy link
Contributor

@uweigand I've asked if it would be appropriate to update my old exising PR with big endian support, it would avoid merge conflicts but would give slughtly more work to reviewes since it is mostly already reviewed. Maybe you should see if there is a need for an issue to cover the big endian support for binaryxml.

@Daniel-Svensson I've opened an issue now covering big-endian support in both reader and writer. Thanks!

@StephenMolloy
Copy link
Member

I think the performance PR's should remain focused on performance. (We super-appreciate the contributions btw. Our small team just a little overwhelmed with all of our many responsibilities at the moment, so we're a little slow on some things.)

This big-endian issue should be addressed sooner rather than later. And it sounds like it's been a problem from the start, so we'll probably want to look at it separately so we can track it for servicing. We can use @uweigand's new issue (#74494) to track.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-Serialization community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reading of Decimal broken for binary XmlDictionaryReader

5 participants