-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add tests and improve binary XmlDictionaryReader performance #73332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| { | ||
| const int chunk = 256; | ||
| while (length >= chunk) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the chunked read of at most 256bytes at a time for the stream version since I belive that the approach was partly due to reduce risk of DOS attacks where large array sizes are specified, however GetBuffer (actually TryEnsureBytes) now contains a fix with mitigations against that.
If you think it is better to keep the chunked read I can add it back, but then maybe the size can be increased to something like 4k or even 8k (to almost read a jumbo frame) ?
| int actual = Math.Min(count, _arrayCount); | ||
| for (int i = 0; i < actual; i++) | ||
| // Try to read in whole array, but don't fail if not possible | ||
| BufferReader.GetBuffer(actual * ValueHandleLength.DateTime, out _, out _); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Se comment on UnsafeReadarray/ReadRawArrayBytes about chunked reads, if it should be added back then these GetBuffer calls should probably be removed
- Is it better to use overload with a single out parameter that throws an exception directly if the file ends prematurely ? The current approach keeps the old behaviour where these "special" types will partially fill arrays in case of premature end-of-file
|
@HongGit who is the right reviewer here also? |
src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBufferReader.cs
Show resolved
Hide resolved
* Add missing Advance to prevent corruption of reader * Correctly read decimal values
| BinaryPrimitives.ReadInt32LittleEndian(bytes.Slice(8)), | ||
| BinaryPrimitives.ReadInt32LittleEndian(bytes.Slice(12)) | ||
| }; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to parsing was tested against unit tests before adding back fast path for LittleEndian case.
Without fixing this the method will throw exception and abort when deserializing decimal from binary xml.
Note to self:
- file bug with that the decimal reading got broken when looking up case for when method without offset is called. Reading of Decimal broken for binary XmlDictionaryReader #73934
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the decimal constructor can throw a different exception for invalid decimal than what happens in net framework code.
If it is important to add checking for invalid decimal then msybe it is best to do something similar to net framework here instead (and do it before decimal ctor) https://referencesource.microsoft.com/#System.Runtime.Serialization/System/Xml/XmlBufferReader.cs,469
I can follow up with such a commit if important,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StephenMolloy I did a separate commit to add the decimal flag validation from net48 but it was not pushed. If you find that check is important and worth the time I can open up a PR. (It is not important to me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
This seems to have added tests that are failing on s390x (a big-endian platform): and Both of these fail the same way in |
The test can be reverted to succeed by partially reverting commit 51b4957 or skip tests on big endian. it might actually be almost as easy to actually fix the implementation for big endian platforms. I think the serialization team must make the decision since it would be a breaking change to fix the big endian implementation. If it should not be fixed then the decimal reading code might need to be changed so it just calls ReadRawBytes and remove the new code for bigendian |
|
Hmm. For the I think the main question is how the binary serialization format is defined. Should this use native (platform) byte order, or always little-endian byte order? Most binary formats in .NET are always little-endian, to enable portability across platforms. This is probably preferable here as well, but I'm not completely sure what this format is used for ... In any case, mixing this up so that some types use native byte order while others always use little-endian seems a bug to me. For the |
It should be always little-endian byte order. |
|
@uweigand I've asked if it would be appropriate to update my old exising PR with big endian support, it would avoid merge conflicts but would give slughtly more work to reviewes since it is mostly already reviewed. Maybe you should see if there is a need for an issue to cover the big endian support for binaryxml. |
@Daniel-Svensson I've opened an issue now covering big-endian support in both reader and writer. Thanks! |
|
I think the performance PR's should remain focused on performance. (We super-appreciate the contributions btw. Our small team just a little overwhelmed with all of our many responsibilities at the moment, so we're a little slow on some things.) This big-endian issue should be addressed sooner rather than later. And it sounds like it's been a problem from the start, so we'll probably want to look at it separately so we can track it for servicing. We can use @uweigand's new issue (#74494) to track. |
Summary:
This is a follow up on #71478 with the corresponding changes to the reader side.
The main motivation is to add tests and keep the reader and writer somewhat similar which gave some nice array performance improvements.
Note: The introduction ofI updated the other PR with identical changes as here so it might be possible to merge them without conflict.XmlBinaryNodeType.cswill lead to merge conflicts with #71478 which I can fix if either of these are merged. I think the approach in this PR with a separate file with hardcoded values might be better.Benchmarks
In short reader is initialized and then a large number of elements
<a>VALUE</a>are read (a a large part of the test is spend reading start/end elements and not just the content)Source on github
PR
Main