-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Closed
Labels
module: armRelated to ARM architectures builds of PyTorch. Includes Apple M1Related to ARM architectures builds of PyTorch. Includes Apple M1module: correctness (silent)issue that returns an incorrect result silentlyissue that returns an incorrect result silentlymodule: vectorizationRelated to SIMD vectorization, e.g., Vec256Related to SIMD vectorization, e.g., Vec256triage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Description
🐛 Describe the bug
In vec_base.h, the fmsub function implements the functionality of a * b - c. However, in vec128_float_neon.h, it uses vfmsq_f32. According to the manual, for the input order c, a, b, it implements c - a * b, which results in the opposite outcome.
Below is the testing environment:
# lscpu
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: HiSilicon
BIOS Vendor ID: QEMU
Model name: Kunpeng-920
BIOS Model name: virt-rhel8.2.0 CPU @ 2.0GHz
BIOS CPU family: 1
Model: 0
Thread(s) per core: 1
Core(s) per socket: 16
Socket(s): 1
Stepping: 0x1
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jsc
vt fcma dcpop asimddp asimdfhm
The running result of ./build/bin/vec_test_all_types_DEFAULT:
[ RUN ] BitwiseFloatsAdditional/0.Fmsub
/root/pytorch/aten/src/ATen/test/vec_test_all_types.h:880: Failure
Expected equality of these values:
nearlyEqual<UVT>(expArr[i], actArr[i], absErr)
Which is: false
true
24127.314453125!=-24127.314453125
Failure Details:
fmsub "/root/pytorch/aten/src/ATen/test/vec_test_all_types.cpp":940
Test Seed to reproduce: 1742184039034921432
Arguments:
# vec[-108.478317, 430.048676, 439.19342, 111.896461]
# vec[443.884338, 151.219467, -189.899826, -492.905579]
Expected:
# vec[24127.3145, 257714.359, -55222.8984, 15263.2383]
Actual:
# vec[-24127.3145, -257714.359, 55222.8945, -15263.2383]
First mismatch Index: 0
Modify the NEON implementation to:
template <>
Vectorized<float> inline fmsub(const Vectorized<float>& a, const Vectorized<float>& b, const Vectorized<float>& c) {
return Vectorized<float>(vnegq_f32(vfmsq_f32(c, a, b)));
}The running result of ./build/bin/vec_test_all_types_DEFAULT after the modification:
[----------] 6 tests from BitwiseFloatsAdditional/0, where TypeParam = at::vec::DEFAULT::Vectorized<float>
[ RUN ] BitwiseFloatsAdditional/0.ZeroMask
[ OK ] BitwiseFloatsAdditional/0.ZeroMask (0 ms)
[ RUN ] BitwiseFloatsAdditional/0.Convert
[ OK ] BitwiseFloatsAdditional/0.Convert (0 ms)
[ RUN ] BitwiseFloatsAdditional/0.Fmadd
[ OK ] BitwiseFloatsAdditional/0.Fmadd (78 ms)
[ RUN ] BitwiseFloatsAdditional/0.Fmsub
[ OK ] BitwiseFloatsAdditional/0.Fmsub (78 ms)
[ RUN ] BitwiseFloatsAdditional/0.FmaddVecN
[ OK ] BitwiseFloatsAdditional/0.FmaddVecN (79 ms)
[ RUN ] BitwiseFloatsAdditional/0.Blendv
[ OK ] BitwiseFloatsAdditional/0.Blendv (0 ms)
[----------] 6 tests from BitwiseFloatsAdditional/0 (236 ms total)
Versions
main
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @malfet @snadampal @milpuz01 @aditew01 @nikhil-arm @fadara01
malfetmalfet
Metadata
Metadata
Assignees
Labels
module: armRelated to ARM architectures builds of PyTorch. Includes Apple M1Related to ARM architectures builds of PyTorch. Includes Apple M1module: correctness (silent)issue that returns an incorrect result silentlyissue that returns an incorrect result silentlymodule: vectorizationRelated to SIMD vectorization, e.g., Vec256Related to SIMD vectorization, e.g., Vec256triage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module