-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Local heap optimizations on Arm64 #64481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue Detailsnull
|
1. When not required to zero the allocated space for local heap (for sizes up to 64 bytes) do not zero. 2. For sizes less than one PAGE_SIZE and when the size is an encodable offset use ldp tmpReg, xzr, [sp], #-amount that does probing at [sp] and allocates the space at the same time. 3. Allow non-loop zeroing (i.e. unrolled sequence) for sizes up to 128 bytes (i.e. up to LCLHEAP_UNROLL_LIMIT) 4. Do such zeroing in ascending order of effective address.
ca29abf
to
b4ce794
Compare
@dotnet/jit-contrib PTAL |
src/coreclr/jit/codegenarm64.cpp
Outdated
{ | ||
// The following probes the page and allocates the local heap. | ||
// ldp tmpReg, xzr, [sp], #-amount | ||
// Note that behaviour of ldp where two source registers are the same is unpredictable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I follow this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment, is it better now?
coreclr_tests.pmi.Linux.arm64.checked.mch:
libraries_tests.pmi.Linux.arm64.checked.mch:
Did you get chance to understand these regressions? |
…) post-index range and fix an error in src/coreclr/jit/codegenarm64.cpp
Yes, these come from increasing |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
@dotnet/jit-contrib ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Failures in runtime-coreclr outerloop (CoreCLR Pri1 Runtime Tests Run R2R_CG2 Linux arm64 checked) and runtime-coreclr outerloop (CoreCLR Pri1 Runtime Tests Run R2R_CG2 Linux_musl arm64 checked) are due to #64936. Failure is runtime-coreclr outerloop (CoreCLR Pri1 Runtime Tests Run R2R_CG2 windows x86 checked) is due to #61825 |
@echesakovMSFT - The asmdiff was hitting with an assert. Do you know if this is related to this PR or is it already present in main?
|
Ah, never mind, Seems to be #64936. |
@BruceForstall - how easy it is to surface the "compilation failures" on the Extensions page where we show the diff summary? |
We depend on jit-analyze to create diff_summary.md files for each individual asm diff. If it could also spit out a diff_errors.md file then the pipeline could collect those and add them to the overall summary. Seems like it wouldn't be too hard to do that; jit-analyze already parses the output (although I guess asserts maybe go to stderr instead of stdout?), and adding a new |
PAGE_SIZE
useldr wzr, [sp], #-amount
that does probing at[sp]
and allocates the space at the same time:or use
ldp tmpReg, xzr, [sp], #-amount
(when it's not encodable by post-index variant ofldr
)LCLHEAP_UNROLL_LIMIT
). This has also allowed to free up two internal integer registers for such cases.This will show up as a code size regression.
In the last example, the zeroing is done at
[initialSp-16], [initialSp-96], [initialSp-80], [initialSp-64], [initialSp-48], [initialSp-32]
order. The idea here is to allow a CPU to detect the sequentialmemset
to0
and switch into write streaming mode.benchmarks.run.windows.arm64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.arm64.checked.mch:
Detail diffs
libraries.crossgen2.windows.arm64.checked.mch:
Detail diffs
libraries.pmi.windows.arm64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.arm64.checked.mch:
Detail diffs