-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Optimise long multiply + add/sub/neg on arm64. #91886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis PR optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single The Minor regressions seen here are due to additional
|
|
In the diffs, I see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems adding GenTreeTernOp can be avoided. In the past, we directly recognized the pattern in codegen e.g. madd, msub. Did you try that approach and faced any problem?
|
@dotnet/jit-contrib @TIHan |
I guess that would be possible, but it feels like it should be done during lowering (I have no overly strong opinion though). My thought for |
|
Rather than We can just introduce an |
|
@tannergooding Thank you for the comment, I have reworked the patch to utilise Updated spmidiff results are here (re-run on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to resolve merge conflict and also fix the test failure:
Assert failure(PID 275 [0x00000113], Thread: 275 [0x0113]): Assertion failed 'compiler->compIsaSupportedDebugOnly(HWIntrinsicInfo::lookupIsa(intrin.id))' in 'System.Collections.Generic.Dictionary`2[System.__Canon,System.__Canon]:TryInsert(System.__Canon,System.__Canon,ubyte):ubyte:this' during 'Generate code' (IL size 668; hash 0x26ab038f; FullOpts)
File: /__w/1/s/src/coreclr/jit/hwintrinsiccodegenarm64.cpp Line: 210
Image: /root/helix/work/correlation/corerun
This patch optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single [s/u]maddl, [s/u]subl or smnegl instruction, represented as a GT_HWINTRINSIC.
Change-Id: Id0befd1a643620a1f4e0dcb0298f70cee296445d
Change-Id: Iffcd54d6f2a3a25474e31f3a3cbf4072a8efcff8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct to me. It should get sign off from someone from @dotnet/jit-contrib before being merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single
[s/u]maddl,[s/u]sublorsmneglinstruction. It also adds a generic ternary operator which can be used in the future to support other three argument nodes.The
spmidiffresults for this patch can be found at the below gist (performed onwin-arm64sincelinux-arm64is currently broken as per #91257):https://gist.github.com/c272/944721d2d083660e0ba7ec9fddb24de4
Minor regressions seen here are due to additional
movinstructions changing the register allocation in cases where the add value is containable, preventing some previously performedldpoptimisations. Eliminating all cases of containable adds also removes many valid optimisations, so I have currently left it as is.Contributes to #68028