-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Adding EVEX encoding logic for RR/AM pathways #77419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis is a draft PR created for verifying some configs.
|
bf2025a to
7468f73
Compare
|
@tannergooding @BruceForstall @anthonycanino These are the next set of changes for enabling EVEX encoding paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor questions and suggestions
src/coreclr/jit/instrsxarch.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment for tt in the comment section at the top of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/instr.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // This is used in determining factor N while calculated compressed displacement in EVEX encoding | |
| // This is used in determining factor N while calculating compressed displacement in EVEX encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/instr.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why the 0xF00 bits are not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops. Yeah that's a mistake. Fixed
src/coreclr/jit/instr.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document attr? (Best would be to add a new-style function header comment that documents all the arguments and the function behavior)
Is this newly required for AVX-512? Does it change any existing behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It's required now because we divided INS_cvtsi2ss to INS_cvtsi2ss32 and INS_cvtsi2ss64. Similar for INS_cvtsi2sd. This is required because these instruction have some specific characteristic depending on if input is m32 or m64(W bit set or not)
src/coreclr/jit/instr.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be best to ifdef this function declaration so you don't touch the ARM implementation which doesn't use this argument and for which it might be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/codegenxarch.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other cases use emitTypeSize(dstType) but this uses srcType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. should be srcType . We are trying to differentiate between VCVTSI2SS xmm1, xmm2, m32 and VCVTSI2SS xmm1, xmm2, m64
src/coreclr/jit/emitxarch.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| return ((insTupleTypeInfos[ins] != INS_TT_NONE)); | |
| return (insTupleTypeInfos[ins] != INS_TT_NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/emitxarch.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| assert((insTupleTypeInfos[ins] != INS_TT_NONE)); | |
| assert(insTupleTypeInfos[ins] != INS_TT_NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/emitxarch.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits
| // Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doind so currently | |
| // since we cannot differentiate EVEX vs VEX without 'code' untill all paths have EVEX support. | |
| // Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doing so currently | |
| // since we cannot differentiate EVEX vs VEX without 'code' until all paths have EVEX support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/emitxarch.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| // `true` if the instruction does embeddded broadcast. | |
| // `true` if the instruction does embedded broadcast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
7876962 to
c24f908
Compare
c24f908 to
77aa41e
Compare
77aa41e to
0a1d2b9
Compare
|
No diffs, as expected. Throughput is slightly improved on x64 (about 0.09%), slightly regressed on x86 (minimal on full opts, 0.18% on MinOpts) |
| INS_TT_NONE = 0x00000, | ||
| INS_TT_FULL = 0x00001, | ||
| INS_TT_HALF = 0x00002, | ||
| INS_TT_IS_BROADCAST = 0x00003, | ||
| INS_TT_FULL_MEM = 0x00010, | ||
| INS_TT_TUPLE1_SCALAR = 0x00020, | ||
| INS_TT_TUPLE1_FIXED = 0x00040, | ||
| INS_TT_TUPLE2 = 0x00080, | ||
| INS_TT_TUPLE4 = 0x00100, | ||
| INS_TT_TUPLE8 = 0x00200, | ||
| INS_TT_HALF_MEM = 0x00400, | ||
| INS_TT_QUARTER_MEM = 0x00800, | ||
| INS_TT_EIGHTH_MEM = 0x01000, | ||
| INS_TT_MEM128 = 0x02000, | ||
| INS_TT_MOVDDUP = 0x04000, | ||
| INS_TT_IS_NON_BROADCAST = 0x7FFFC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor nits to consider for later:
- The high zero isn't needed, especially since there aren't any cases which set it. Remove it?
- Thus, the "7" in INS_TT_IS_NON_BROADCAST is not needed
- I was confused at first by INS_TT_IS_BROADCAST being 3 and not 4. Maybe rename INS_TT_IS_BROADCAST to INS_TT_BROADCAST_MASK and INS_TT_IS_NON_BROADCAST to INS_TT_NON_BROADCAST_MASK to make it clear they are bit masks? (I notice they currently aren't used)
This PR enables EVEX encoding for the following paths