Add tests and improve binary XmlDictionaryReader performance #73332

Daniel-Svensson · 2022-08-03T20:53:51Z

Summary:

Add tests for Binary XmlDictionaryReader
Replace unsafe pointer code with Span
Performance improvements for arrays (there looks to be small improvements to larger primitives but the difference seems to only be consistent when reading from an array and not a stream)
Read primitives via an helper similar to to writer
Fixes Reading of Decimal broken for binary XmlDictionaryReader #73934
- Don't forget to call Advance
- Read bytes in correct order

This is a follow up on #71478 with the corresponding changes to the reader side.
The main motivation is to add tests and keep the reader and writer somewhat similar which gave some nice array performance improvements.

Note: The introduction of XmlBinaryNodeType.cs will lead to merge conflicts with #71478 which I can fix if either of these are merged. I think the approach in this PR with a separate file with hardcoded values might be better. I updated the other PR with identical changes as here so it might be possible to merge them without conflict.

Benchmarks

In short reader is initialized and then a large number of elements <a>VALUE</a> are read (a a large part of the test is spend reading start/end elements and not just the content)

Source on github

PR

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 5 5600U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.5.22307.18
  [Host] : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  span   : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT

Job=span  MaxRelativeError=0.01  BuildConfiguration=span  
IterationTime=250.0000 ms

Method	Source	Mean	Error	StdDev
ReadInt64	MemoryStream	75.00 ns	0.695 ns	0.683 ns
ReadInt16	MemoryStream	75.33 ns	0.659 ns	0.617 ns
ReadDoubleArray	MemoryStream	379.81 ns	3.769 ns	5.977 ns
ReadGuidArray	MemoryStream	1,001.54 ns	9.922 ns	11.426 ns
ReadInt64	Bytes	38.70 ns	0.347 ns	0.371 ns
ReadInt16	Bytes	39.02 ns	0.377 ns	0.564 ns
ReadDoubleArray	Bytes	215.22 ns	2.124 ns	3.937 ns
ReadGuidArray	Bytes	678.15 ns	6.610 ns	7.073 ns

Main

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 5 5600U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.5.22307.18
  [Host] : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  Before : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  main   : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT

MaxRelativeError=0.01  IterationTime=250.0000 ms

Method	Source	Mean	Error	StdDev
ReadInt64	MemoryStream	76.12 ns	0.530 ns	0.521 ns
ReadInt16	MemoryStream	73.53 ns	0.690 ns	0.678 ns
ReadDoubleArray	MemoryStream	5,086.60 ns	41.306 ns	34.493 ns
ReadGuidArray	MemoryStream	22,419.31 ns	224.082 ns	230.116 ns
ReadInt64	Bytes	39.75 ns	0.375 ns	0.432 ns
ReadInt16	Bytes	41.84 ns	0.369 ns	0.308 ns
ReadDoubleArray	Bytes	3,909.31 ns	8.160 ns	6.814 ns
ReadGuidArray	Bytes	15,829.02 ns	31.039 ns	24.233 ns

Daniel-Svensson · 2022-08-03T21:00:06Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBufferReader.cs

            {
-                const int chunk = 256;
-                while (length >= chunk)
-                {


I removed the chunked read of at most 256bytes at a time for the stream version since I belive that the approach was partly due to reduce risk of DOS attacks where large array sizes are specified, however GetBuffer (actually TryEnsureBytes) now contains a fix with mitigations against that.

If you think it is better to keep the chunked read I can add it back, but then maybe the size can be increased to something like 4k or even 8k (to almost read a jumbo frame) ?

Daniel-Svensson · 2022-08-03T21:05:23Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBinaryReader.cs

            int actual = Math.Min(count, _arrayCount);
-            for (int i = 0; i < actual; i++)
+            // Try to read in whole array, but don't fail if not possible
+            BufferReader.GetBuffer(actual * ValueHandleLength.DateTime, out _, out _);


Se comment on UnsafeReadarray/ReadRawArrayBytes about chunked reads, if it should be added back then these GetBuffer calls should probably be removed

Is it better to use overload with a single out parameter that throws an exception directly if the file ends prematurely ? The current approach keeps the old behaviour where these "special" types will partially fill arrays in case of premature end-of-file

danmoseley · 2022-08-10T21:33:05Z

@HongGit who is the right reviewer here also?

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBufferReader.cs

* Add missing Advance to prevent corruption of reader * Correctly read decimal values

Daniel-Svensson · 2022-08-14T21:31:39Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBufferReader.cs

-                BinaryPrimitives.ReadInt32LittleEndian(bytes.Slice(8)),
-                BinaryPrimitives.ReadInt32LittleEndian(bytes.Slice(12))
-            };
-


The changes to parsing was tested against unit tests before adding back fast path for LittleEndian case.

Without fixing this the method will throw exception and abort when deserializing decimal from binary xml.

Note to self:

file bug with that the decimal reading got broken when looking up case for when method without offset is called. Reading of Decimal broken for binary XmlDictionaryReader #73934

Note: the decimal constructor can throw a different exception for invalid decimal than what happens in net framework code.

If it is important to add checking for invalid decimal then msybe it is best to do something similar to net framework here instead (and do it before decimal ctor) https://referencesource.microsoft.com/#System.Runtime.Serialization/System/Xml/XmlBufferReader.cs,469

I can follow up with such a commit if important,.

@StephenMolloy I did a separate commit to add the decimal flag validation from net48 but it was not pushed. If you find that check is important and worth the time I can open up a PR. (It is not important to me)

StephenMolloy

lgtm

uweigand · 2022-08-17T13:52:01Z

This seems to have added tests that are failing on s390x (a big-endian platform):

      <test name="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_Array_RoundTrip" type="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests" method="BinaryXml_Array_RoundTrip" time="0.0221026" result="Fail">
        <failure exception-type="Xunit.Sdk.XunitException">
          <message><![CDATA[Showing first 10 differences\n  Position 1: Expected: 16909060, Actual: 67305985\n  Position 2: Expected: 287454020, Actual: 1144201745\nTotal number of differences: 2 out of 4]]></message>
          <stack-trace><![CDATA[   at System.AssertExtensions.SequenceEqual[Int32](ReadOnlySpan`1 expected, ReadOnlySpan`1 actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 476
   at System.AssertExtensions.SequenceEqual[Int32](Span`1 expected, Span`1 actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 491
   at System.AssertExtensions.SequenceEqual[Int32](Int32[] expected, Int32[] actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 493
   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_Array_RoundTrip() in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 241
   at System.Reflection.MethodInvoker.InterpretedInvoke(Object obj, Span`1 args, BindingFlags invokeAttr) in /home/uweigand/runtime/src/mono/System.Private.CoreLib/src/System/Reflection/MethodInvoker.Mono.cs:line 33]]></stack-trace>
        </failure>
      </test>

and

      <test name="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_ReadPrimitiveTypes" type="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests" method="BinaryXml_ReadPrimitiveTypes" time="0.0561528" result="Fail">
        <failure exception-type="Xunit.Sdk.EqualException">
          <message><![CDATA[Assert.Equal() Failure\nExpected: 1.23456788\nActual:   1.44545137E+11]]></message>
          <stack-trace><![CDATA[   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.AssertReadContentFromBinary[Single](Single expected, XmlBinaryNodeType nodeType, ReadOnlySpan`1 bytes) in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 262
   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_ReadPrimitiveTypes() in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 192
   at System.Reflection.MethodInvoker.InterpretedInvoke(Object obj, Span`1 args, BindingFlags invokeAttr) in /home/uweigand/runtime/src/mono/System.Private.CoreLib/src/System/Reflection/MethodInvoker.Mono.cs:line 33]]></stack-trace>
        </failure>
      </test>

Both of these fail the same way in System.Runtime.Serialization.Xml.Tests and System.Runtime.Serialization.Xml.ReflectionOnly.Tests.

Daniel-Svensson · 2022-08-17T14:12:49Z

This seems to have added tests that are failing on s390x (a big-endian platform):

      <test name="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_Array_RoundTrip" type="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests" method="BinaryXml_Array_RoundTrip" time="0.0221026" result="Fail">
        <failure exception-type="Xunit.Sdk.XunitException">
          <message><![CDATA[Showing first 10 differences\n  Position 1: Expected: 16909060, Actual: 67305985\n  Position 2: Expected: 287454020, Actual: 1144201745\nTotal number of differences: 2 out of 4]]></message>
          <stack-trace><![CDATA[   at System.AssertExtensions.SequenceEqual[Int32](ReadOnlySpan`1 expected, ReadOnlySpan`1 actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 476
   at System.AssertExtensions.SequenceEqual[Int32](Span`1 expected, Span`1 actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 491
   at System.AssertExtensions.SequenceEqual[Int32](Int32[] expected, Int32[] actual) in /home/uweigand/runtime/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs:line 493
   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_Array_RoundTrip() in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 241
   at System.Reflection.MethodInvoker.InterpretedInvoke(Object obj, Span`1 args, BindingFlags invokeAttr) in /home/uweigand/runtime/src/mono/System.Private.CoreLib/src/System/Reflection/MethodInvoker.Mono.cs:line 33]]></stack-trace>
        </failure>
      </test>

and

      <test name="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_ReadPrimitiveTypes" type="System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests" method="BinaryXml_ReadPrimitiveTypes" time="0.0561528" result="Fail">
        <failure exception-type="Xunit.Sdk.EqualException">
          <message><![CDATA[Assert.Equal() Failure\nExpected: 1.23456788\nActual:   1.44545137E+11]]></message>
          <stack-trace><![CDATA[   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.AssertReadContentFromBinary[Single](Single expected, XmlBinaryNodeType nodeType, ReadOnlySpan`1 bytes) in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 262
   at System.Runtime.Serialization.Xml.Tests.XmlDictionaryReaderTests.BinaryXml_ReadPrimitiveTypes() in /home/uweigand/runtime/src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs:line 192
   at System.Reflection.MethodInvoker.InterpretedInvoke(Object obj, Span`1 args, BindingFlags invokeAttr) in /home/uweigand/runtime/src/mono/System.Private.CoreLib/src/System/Reflection/MethodInvoker.Mono.cs:line 33]]></stack-trace>
        </failure>
      </test>

Both of these fail the same way in System.Runtime.Serialization.Xml.Tests and System.Runtime.Serialization.Xml.ReflectionOnly.Tests.

The test can be reverted to succeed by partially reverting commit 51b4957 or skip tests on big endian. it might actually be almost as easy to actually fix the implementation for big endian platforms. I think the serialization team must make the decision since it would be a breaking change to fix the big endian implementation.

If it should not be fixed then the decimal reading code might need to be changed so it just calls ReadRawBytes and remove the new code for bigendian

uweigand · 2022-08-17T14:52:09Z

Hmm. For the ReadPrimitiveTypes failure the underlying cause is that XmlBufferReader.ReadSingle and ReadDouble do not byte-swap their input (i.e. expect the binary format to be in native byte order). However, ReadInt16 etc. do byte-swap their input (i.e. expect the binary format to be always in little-endian byte order). This seems to have been the same before this PR, so it's not a regression (but probably still a bug).

I think the main question is how the binary serialization format is defined. Should this use native (platform) byte order, or always little-endian byte order? Most binary formats in .NET are always little-endian, to enable portability across platforms. This is probably preferable here as well, but I'm not completely sure what this format is used for ... In any case, mixing this up so that some types use native byte order while others always use little-endian seems a bug to me.

For the Array_RoundTrip failure, I think the problem is that reading the whole array via one large ReadRawArrayBytes doesn't work if the native byte order does not match the byte order in the buffer. You'll have to read and byte-swap each element separately. The current code does this for a few types (DateTime, Guid, TimeSpan), but it really should be done for all types. (This is also not a regression as the old code didn't do it either, but it still seems to be a bug anyway.)

jkotas · 2022-08-17T14:58:39Z

I think the main question is how the binary serialization format is defined. Should this use native (platform) byte order, or always little-endian byte order?

It should be always little-endian byte order.

Daniel-Svensson · 2022-08-24T10:20:50Z

@uweigand I've asked if it would be appropriate to update my old exising PR with big endian support, it would avoid merge conflicts but would give slughtly more work to reviewes since it is mostly already reviewed. Maybe you should see if there is a need for an issue to cover the big endian support for binaryxml.

uweigand · 2022-08-24T10:53:38Z

@uweigand I've asked if it would be appropriate to update my old exising PR with big endian support, it would avoid merge conflicts but would give slughtly more work to reviewes since it is mostly already reviewed. Maybe you should see if there is a need for an issue to cover the big endian support for binaryxml.

@Daniel-Svensson I've opened an issue now covering big-endian support in both reader and writer. Thanks!

StephenMolloy · 2022-08-24T23:27:41Z

I think the performance PR's should remain focused on performance. (We super-appreciate the contributions btw. Our small team just a little overwhelmed with all of our many responsibilities at the moment, so we're a little slow on some things.)

This big-endian issue should be addressed sooner rather than later. And it sounds like it's been a problem from the start, so we'll probably want to look at it separately so we can track it for servicing. We can use @uweigand's new issue (#74494) to track.

Daniel-Svensson added 15 commits July 26, 2022 08:38

Read primitive types more efficient

472e327

Allow stream to read more than minimum number of required bytes

64ee4ba

Remove pinning and pointer code from array reads

b002d7c

Remove bounds checks from remaing array reads

3b6ece8

remove extra unchecked

3b8e2e8

cleanup code

53d26a7

merge upstream/main

068da74

Start adding tests

110346e

Add tests for binary XmlDictionaryReader

62467b7

cleanup

fad0457

revert back to old byte read

d5ce3a7

add test for arrays using "ref" enumeration

5aee35c

add guid test

929ca23

Try to read while arrays from stream before processing

c689762

Write guid arrays as memory on LittleEndian platforms

bfea922

ghost added area-Serialization community-contribution Indicates that the PR has been added by a community member labels Aug 3, 2022

Daniel-Svensson commented Aug 3, 2022

View reviewed changes

Daniel-Svensson marked this pull request as ready for review August 4, 2022 07:07

StephenMolloy requested review from jkotas and stephentoub August 11, 2022 22:15

danmoseley reviewed Aug 12, 2022

View reviewed changes

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBufferReader.cs Show resolved Hide resolved

Daniel-Svensson added 3 commits August 14, 2022 22:04

Merge remote-tracking branch 'upstream/main' into binaryxmlreader

98b53cc

Update tests to be independent of byte order

51b4957

FIx bugs introduced in dotnet#71752

9a2f8bd

* Add missing Advance to prevent corruption of reader * Correctly read decimal values

Daniel-Svensson commented Aug 14, 2022

View reviewed changes

This was referenced Aug 15, 2022

Reading of Decimal broken for binary XmlDictionaryReader #73934

Closed

add binary xml version to xml benchmarks dotnet/performance#2557

Merged

StephenMolloy self-assigned this Aug 15, 2022

StephenMolloy approved these changes Aug 15, 2022

View reviewed changes

StephenMolloy merged commit 5dbdde9 into dotnet:main Aug 15, 2022

Daniel-Svensson mentioned this pull request Aug 24, 2022

Improve Binary Xml (XmlDictionaryWriter) performance #71478

Merged

uweigand mentioned this pull request Aug 24, 2022

Binary Xml format broken on big-endian machines #74494

Closed

ghost locked as resolved and limited conversation to collaborators Sep 24, 2022

Add tests and improve binary XmlDictionaryReader performance #73332

Add tests and improve binary XmlDictionaryReader performance #73332

Uh oh!

Conversation

Daniel-Svensson commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

PR

Main

Uh oh!

Daniel-Svensson Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

Daniel-Svensson Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

danmoseley commented Aug 10, 2022

Uh oh!

Uh oh!

Daniel-Svensson Aug 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Daniel-Svensson Aug 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Daniel-Svensson Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

StephenMolloy left a comment

Choose a reason for hiding this comment

Uh oh!

uweigand commented Aug 17, 2022

Uh oh!

Daniel-Svensson commented Aug 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uweigand commented Aug 17, 2022

Uh oh!

jkotas commented Aug 17, 2022

Uh oh!

Daniel-Svensson commented Aug 24, 2022

Uh oh!

uweigand commented Aug 24, 2022

Uh oh!

StephenMolloy commented Aug 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Daniel-Svensson commented Aug 3, 2022 •

edited

Loading

Daniel-Svensson Aug 14, 2022 •

edited

Loading

Daniel-Svensson Aug 15, 2022 •

edited

Loading

Daniel-Svensson commented Aug 17, 2022 •

edited

Loading