Optimise long multiply + add/sub/neg on arm64. #91886

c272 · 2023-09-11T15:37:47Z

This PR optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single [s/u]maddl, [s/u]subl or smnegl instruction. It also adds a generic ternary operator which can be used in the future to support other three argument nodes.

The spmidiff results for this patch can be found at the below gist (performed on win-arm64 since linux-arm64 is currently broken as per #91257):
https://gist.github.com/c272/944721d2d083660e0ba7ec9fddb24de4

Minor regressions seen here are due to additional mov instructions changing the register allocation in cases where the add value is containable, preventing some previously performed ldp optimisations. Eliminating all cases of containable adds also removes many valid optimisations, so I have currently left it as is.

Contributes to #68028

ghost · 2023-09-11T15:37:59Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single [s/u]maddl, [s/u]subl or smnegl instruction. It also adds a generic ternary operator which can be used in the future to support other three argument nodes.

The spmidiff results for this patch can be found at the below gist (performed on win-arm64 since linux-arm64 is currently broken as per #91257):
https://gist.github.com/c272/944721d2d083660e0ba7ec9fddb24de4

Minor regressions seen here are due to additional mov instructions changing the register allocation in cases where the add value is containable, preventing some previously performed ldp optimisations. Eliminating all cases of containable adds also removes many valid optimisations, so I have currently left it as is.

Author:	c272
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `community-contribution`
Milestone:	-

kunalspathak · 2023-09-12T12:27:37Z

In the diffs, I see smull+add is combined into umaddl instead of smaddl. Also, can you please fix the build errors?

kunalspathak

It seems adding GenTreeTernOp can be avoided. In the past, we directly recognized the pattern in codegen e.g. madd, msub. Did you try that approach and faced any problem?

src/coreclr/jit/codegenarm64.cpp

kunalspathak · 2023-09-12T13:06:07Z

@dotnet/jit-contrib @TIHan

a74nh · 2023-09-12T13:54:35Z

It seems adding GenTreeTernOp can be avoided. In the past, we directly recognized the pattern in codegen e.g. madd, msub. Did you try that approach and faced any problem?

I guess that would be possible, but it feels like it should be done during lowering (I have no overly strong opinion though).

My thought for GenTreeTernOp is that a follow on patch could move SELECT, CMPXCHG over to use it, which should simplify things a little.

tannergooding · 2023-09-12T15:39:33Z

Rather than GenTreeTernOp, we already have GenTreeHWIntrinsic which supports multi-op and which is really oriented around these types of specialized instructions for both SIMD and Scalar code.

We can just introduce an ArmBase_LongMultiplyAdd or similar for internal use only and have lowering generate that node. The existing metadata table will make this very simple to support and likely require less overall code changes.

c272 · 2023-09-13T15:04:06Z

@tannergooding Thank you for the comment, I have reworked the patch to utilise GT_HWINTRINSIC instead of creating a new ternary node. The updated patch should also resolve the testing failures.

Updated spmidiff results are here (re-run on win-arm64):
https://gist.github.com/c272/5f7275fdff6b829cdc6d002b29f5bb9e

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

kunalspathak

Need to resolve merge conflict and also fix the test failure:

Assert failure(PID 275 [0x00000113], Thread: 275 [0x0113]): Assertion failed 'compiler->compIsaSupportedDebugOnly(HWIntrinsicInfo::lookupIsa(intrin.id))' in 'System.Collections.Generic.Dictionary`2[System.__Canon,System.__Canon]:TryInsert(System.__Canon,System.__Canon,ubyte):ubyte:this' during 'Generate code' (IL size 668; hash 0x26ab038f; FullOpts)

    File: /__w/1/s/src/coreclr/jit/hwintrinsiccodegenarm64.cpp Line: 210
    Image: /root/helix/work/correlation/corerun

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

src/coreclr/jit/lowerarmarch.cpp

This patch optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single [s/u]maddl, [s/u]subl or smnegl instruction, represented as a GT_HWINTRINSIC.

Change-Id: Id0befd1a643620a1f4e0dcb0298f70cee296445d

Change-Id: Iffcd54d6f2a3a25474e31f3a3cbf4072a8efcff8

tannergooding

This looks correct to me. It should get sign off from someone from @dotnet/jit-contrib before being merged

kunalspathak

LGTM

ghost added community-contribution Indicates that the PR has been added by a community member area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Sep 11, 2023

kunalspathak reviewed Sep 12, 2023

View reviewed changes

src/coreclr/jit/codegenarm64.cpp Outdated Show resolved Hide resolved

c272 force-pushed the longmul branch from 0957016 to 547f104 Compare September 13, 2023 15:02

kunalspathak reviewed Sep 17, 2023

View reviewed changes

src/coreclr/jit/hwintrinsiccodegenarm64.cpp Outdated Show resolved Hide resolved

kunalspathak suggested changes Sep 17, 2023

View reviewed changes

src/coreclr/jit/hwintrinsiccodegenarm64.cpp Outdated Show resolved Hide resolved

src/coreclr/jit/lowerarmarch.cpp Outdated Show resolved Hide resolved

src/coreclr/jit/lowerarmarch.cpp Outdated Show resolved Hide resolved

src/coreclr/jit/lowerarmarch.cpp Outdated Show resolved Hide resolved

ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Sep 17, 2023

SingleAccretion reviewed Sep 17, 2023

View reviewed changes

src/coreclr/jit/lowerarmarch.cpp Outdated Show resolved Hide resolved

Optimise long multiply + add/sub/neg on arm64.

febd345

This patch optimises an extending multiply of two integers out to a long followed by an addition, subtraction or negation operation down to a single [s/u]maddl, [s/u]subl or smnegl instruction, represented as a GT_HWINTRINSIC.

c272 force-pushed the longmul branch from 1da7e2e to febd345 Compare September 19, 2023 12:14

ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Sep 19, 2023

c272 added 2 commits September 19, 2023 13:35

Formatting fixup.

c6bbcf1

Change-Id: Id0befd1a643620a1f4e0dcb0298f70cee296445d

Disable generation on HW intrinsics disabled.

97ffd5b

Change-Id: Iffcd54d6f2a3a25474e31f3a3cbf4072a8efcff8

This was referenced Sep 19, 2023

NuGet failing with Response status code does not indicate success: 503 (Service Unavailable) dotnet/arcade#11723

Open

Tracking issue for CI build timeouts #76454

Closed

tannergooding approved these changes Sep 19, 2023

View reviewed changes

c272 requested a review from kunalspathak September 20, 2023 12:20

kunalspathak approved these changes Sep 20, 2023

View reviewed changes

JulieLeeMSFT merged commit 736dabe into dotnet:main Sep 20, 2023

kunalspathak mentioned this pull request Sep 21, 2023

Review the multi-op instruction usage for Arm64 #68028

Closed

29 tasks

c272 deleted the longmul branch September 21, 2023 08:05

jakobbotsch mentioned this pull request Sep 23, 2023

JIT: Invalid smnegl transformation #92537

Closed

BruceForstall mentioned this pull request Sep 25, 2023

Test failure: JIT/opt/Multiply/MultiplyLongOps/MultiplyLongOps.cmd #92571

Closed

EgorBo mentioned this pull request Sep 28, 2023

[Perf] Windows/arm64: 3 Improvements on 9/20/2023 11:58:56 PM dotnet/perf-autofiling-issues#22357

Closed

EgorBo mentioned this pull request Sep 28, 2023

[Perf] Linux/arm64: 3 Improvements on 9/20/2023 11:58:56 PM dotnet/perf-autofiling-issues#22360

Closed

kunalspathak mentioned this pull request Oct 4, 2023

JIT: Fix invalid smnegl transformation #93003

Merged

jakobbotsch mentioned this pull request Oct 10, 2023

Assertion failed 'compiler->compIsaSupportedDebugOnly(HWIntrinsicInfo::lookupIsa(intrin.id))' in 'Runtime_92537:Mul #93299

Closed

ghost locked as resolved and limited conversation to collaborators Oct 21, 2023

Optimise long multiply + add/sub/neg on arm64. #91886

Optimise long multiply + add/sub/neg on arm64. #91886

Uh oh!

Conversation

c272 commented Sep 11, 2023 • edited by kunalspathak Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Sep 11, 2023

Uh oh!

kunalspathak commented Sep 12, 2023

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kunalspathak commented Sep 12, 2023

Uh oh!

a74nh commented Sep 12, 2023

Uh oh!

tannergooding commented Sep 12, 2023

Uh oh!

c272 commented Sep 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tannergooding left a comment

Choose a reason for hiding this comment

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

c272 commented Sep 11, 2023 •

edited by kunalspathak

Loading

c272 commented Sep 13, 2023 •

edited

Loading