-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Update the JIT to support rewriting more complex intrinsics as user calls #102702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the JIT to support rewriting more complex intrinsics as user calls #102702
Conversation
86e1b14
to
39e424c
Compare
CC. @dotnet/jit-contrib, @EgorBo |
Can we run some outerloops? namely outerloop, jitstress and I would also run some R2R job |
/azp run runtime-coreclr outerloop, runtime-coreclr jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
GenTree* gtInlineOperands[2]; | ||
union | ||
{ | ||
GenTree* gtInlineOperands[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow what gtInlineOperands is (and the logic inside SetMethodHandle does, can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gtInlineOperands
is an existing field that allows GenTreeMultiOp
to avoid allocating for the common case where there is 2 or less operands and instead allows us to carry the fields within the main allocation instead.
In order to facilitate rewriting the hwintrinsic back to a call, however, we need to be able to track the CORINFO_METHOD_HANDLE
and we don't have enough free space to do that. Adding the field directly ends up making the node larger than TREE_NODE_SZ_LARGE
as well, so to avoid pessimizing the rest of the JIT I made this into a union. SetMethodHandle
will then force an allocation if gtInlineOperands
was being used and set a flag that indicates that gtMethodHandle
is set instead.
It's worth noting the actual gtInlineOperands
field isn't ever read directly either, it's address is just handed down to GenTreeMultiOp
as part of construction or ResetOperandArray
call. So this ends up working fairly well and ensures that the allocation is "pay for play" and only used when actually necessary (we have an intrinsic with 1 or 2 operands that requires a constant and needs to be carried through to a later phase).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- Notably the reason we don't have enough free space is primarily because of padding bytes caused by inheritance. There's a few places where nodes are wasting 4-7 bytes of space to maintain 8-byte alignment and that repeats several times.
We could avoid tricks like the one being employed here if we had a better mechanism for avoiding such wasted padding for derived node kinds. But that's a much more complex and independent work item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't notice it was a pre-existing field - I though you added it in this PR 🙂
@jakobbotsch PTAL the rationalization changes |
One thing to be careful about - what if some Morph applies some transformation and changes node to be a different intrinsic now? (via e.g. ChangeOper) - will it leave the node in a bad state? |
The same concern would be true for GT_INTRINSIC which has done this same thing since .NET Framework In this case the only transform we can really do is if the input is constant changing it to the actual node produced by |
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…alls (dotnet#102702) * Update the JIT to support rewriting more complex intrinsics as user calls * Updating the shuffle hwintrinsic to always be imported as an intrinsic * Ensure multi-reg returns are initialized for rewritten hwintrinsics * Apply suggestions from code review Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com> * Adding function headers to SetMethodHandle and SetEntryPoint * Apply formatting patch --------- Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
This unblocks #11062 which should in turn unblock #102275
It essentially just fixes rationalization to support non-primitive arguments and return types by doing the necessary handling as part of the rewrite. This allows us to carry intrinsics that require a constant input all the way through the JIT, allowing VN, CSE, Constant Propagation, and other optimizations to still kick in. This in turn means that we can identify many more types of constants and thus allows us to avoid pessimizing as many cases to being calls.
This does change
Shuffle
to be carried through as an intrinsic node, but doesn't yet add the support to transform it to a proper shuffle intrinsic as I plan on doing that in a separate PR if this one is accepted. Once APIs likeShuffle
which expect a constant are handled as an intrinsic we can resolve #11062 and that will unblock our ability to move a large chunk of the xplat implementation (particularly for System.Numerics vector types currently handled via simdashwintrinsic) out of the JIT and into managed code instead.