-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Description
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
register_backend: registered backend CUDA (1 devices)
register_device: registered device CUDA0 (NVIDIA A100-PCIE-40GB)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz)
version: 4667 (d2fe216)
built with gcc (GCC) 12.2.0 for x86_64-pc-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
Test code
Command line
`./bin/test-backend-ops`
Problem description & steps to reproduce
Test failure was encountered while running MUL_MAT trough test-backend-ops
.
- The failing mulmat configuration was identified as
MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3])
Test case created Here - Failures seemed random, consecutive runs of
test-backend-ops
did not reproduce the error. Modifying thetest-backend-ops.cpp
by adding the mul_mat test case 1000 times was able to reproduce the failing test consistently (At least a few out of the 1000 cases would fail)
// Example of adding failing mul_mat case
for (int i = 0; i < 1000; i++) {
test_cases.emplace_back(new test_mul_mat(GGML_TYPE_Q5_1, GGML_TYPE_F32, 16, 1, 256, {1, 1}, {1, 1}));
}
- The test fails due to NMSE being over the maximum error threshold.
- Example error output:
MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.000508874 > 0.000500000
0 0.948417 1.035245, diff = -0.086828
1 -2.924956 -2.844111, diff = -0.080845
2 -1.777758 -1.695090, diff = -0.082667
3 0.450649 0.537106, diff = -0.086457
4 -4.114096 -4.030904, diff = -0.083191
5 -0.682358 -0.596930, diff = -0.085428
6 -8.252451 -8.167437, diff = -0.085014
7 -0.692235 -0.606851, diff = -0.085384
8 -5.382234 -5.304606, diff = -0.077628
9 3.467584 3.552903, diff = -0.085320
10 -7.941753 -7.861615, diff = -0.080138
11 3.101702 3.186424, diff = -0.084722
12 0.954475 1.037351, diff = -0.082876
13 2.353770 2.437956, diff = -0.084186
14 -1.223359 -1.139174, diff = -0.084185
15 0.853322 0.939753, diff = -0.086431
-
The nvidia backend seems to convert the
src1
to aQ8_1
type and then run mul_mat with inputsQ5_1
andQ8_1
. Could this be causing the precision issue? -
The largest encountered NMSE from 20000 runs was identified as
0.001409
-
Is the loss of precision expected to this degree? The max error for the mul_mat tests is set to
5e-4
. Should this be modified?
First Bad Commit
Due to the sporadic nature of the test failure, the commit (d2fe216) was the first one where the failure was encountered, and currently the origin is not identified. Latest commit that was tested and error was reproduced is (4806498)
Relevant log output
MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.000508874 > 0.000500000
0 0.948417 1.035245, diff = -0.086828
1 -2.924956 -2.844111, diff = -0.080845
2 -1.777758 -1.695090, diff = -0.082667
3 0.450649 0.537106, diff = -0.086457
4 -4.114096 -4.030904, diff = -0.083191
5 -0.682358 -0.596930, diff = -0.085428
6 -8.252451 -8.167437, diff = -0.085014
7 -0.692235 -0.606851, diff = -0.085384
8 -5.382234 -5.304606, diff = -0.077628
9 3.467584 3.552903, diff = -0.085320
10 -7.941753 -7.861615, diff = -0.080138
11 3.101702 3.186424, diff = -0.084722
12 0.954475 1.037351, diff = -0.082876
13 2.353770 2.437956, diff = -0.084186
14 -1.223359 -1.139174, diff = -0.084185
15 0.853322 0.939753, diff = -0.086431