-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
I'm working on comparison optimization for the F# compiler to implement branchless compare dotnet/fsharp#13098
The following implementation improves average performance, using only a cgt - clt:
However, I noticed that the same cmp ecx edx instruction is issued twice, and this is unnecessary as setg and movzx don't change flags.
I also thought that the following version would be even shorter:
but it's actually longer. The result from cgt/clt is zero extended with movzx even if we use only the lower byte part for the subtraction, then is sign extended to 32bits.
The result code could be:
If someone is ok to guide me through this kind of JIT optimization, I'd happily implement it.
category:cq
theme:codegen
skill-level:intermediate
cost:medium
impact:medium




