-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Now, certain hardware intrinsics that accept an imm8 argument would be replaced by a function call (usually the function body is big jump-table) if the imm8 argument is not a JIT time constant.
This feature provides more stable runtime behaviors instead of throwing exceptions, but it may cause the significant performance regression, so we should avoid the fallback-replacement if possible.
For example, the code below is not allowed in C++ but legal in C#.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static sbyte Extract(Vector256<sbyte> value, byte index)
{
index &= 0x1F;
if (index > 15)
{
return Sse41.Extract(Avx.ExtractVector128(value, 1), (byte)(index - 16));
}
else
{
return Sse41.Extract(Avx.GetLowerHalf(value), index);
}
}In the first return statement, Sse41.Extract gets an expression (byte)(index - 16) that is not a static constant, locally. However, once the function is called with a literal argument of index and inlined at the call-site, (byte)(index - 16) could be a JIT time constant.
The current problem is that we check if the imm8 argument is constant in the importer, which is too early for some situations (e.g., casted argument).
In this example, (byte)(index - 16) is not a constant in the importer, but the expression could finally be a constant at the backend of RyuJIT. If we expand the fallback again after the mid-end optimizations (e.g., CSE, conditional constant propagation, integer-promotion elimination, etc.) the CQ of imm-intrinsics would be much better.
cc @CarolEidt @AndyAyersMS @mikedn @tannergooding
category:cq
theme:hardware-intrinsics
skill-level:expert
cost:medium