KEMBAR78
Use fused multiply add to improve logarithms of bigints · Issue #140443 · python/cpython · GitHub
Skip to content

Use fused multiply add to improve logarithms of bigints #140443

@tim-one

Description

@tim-one

Feature or enhancement

Proposal:

This line from mathmodule.c's loghelper():

            result = func(x) + func(2.0) * e;

is begging to be written to use a HW fma instruction. There isn't massive cancellation here, where fma can "do magic", just the ordinary benefit of suffering one rounding error instead of 2. A quick Python prototype suggests it would do real good.

Here are summaries of two runs.

  • Testing log10() on "random" bigints of up to 100 million decimal digits.
  • Both runs use the same inputs.
  • The result of mpmath.log10(), with prec=100, is taken to be correct.
  • A line starting with, say, 0.4, records the number of cases where the computed result was <= 0.4 ULP too large, but greater than 0.3 ULP too large.
Before (CPython 3.14):
-0.8546426777998732 ULP to 0.9945997134441882 ULP
-0.8    9  0.2% **
-0.7   52  1.0% *******
-0.6  118  2.4% **************
-0.5  233  4.7% ****************************
-0.4  261  5.2% *******************************
-0.3  299  6.0% ***********************************
-0.2  318  6.4% **************************************
-0.1  348  7.0% *****************************************
 0.0  480  9.6% *********************************************************
 0.1  514 10.3% ************************************************************
 0.2  506 10.1% ************************************************************
 0.3  419  8.4% *************************************************
 0.4  396  7.9% ***********************************************
 0.5  326  6.5% ***************************************
 0.6  267  5.3% ********************************
 0.7  171  3.4% ********************
 0.8  132  2.6% ****************
 0.9  118  2.4% **************
 1.0   33  0.7% ****

After:
-0.454879880773845 ULP to 0.5800814567479762 ULP
-0.4  234  4.7% **************************
-0.3  470  9.4% ***************************************************
-0.2  450  9.0% *************************************************
-0.1  466  9.3% ***************************************************
 0.0  513 10.3% ********************************************************
 0.1  514 10.3% ********************************************************
 0.2  515 10.3% ********************************************************
 0.3  471  9.4% ***************************************************
 0.4  514 10.3% ********************************************************
 0.5  559 11.2% ************************************************************
 0.6  294  5.9% ********************************

So max errors (in both directions) were cut, and the error distribution got significantly tighter.

Note that, either way, our results tend to be "too large". I believe that's because, on my Windows box, while log10(2) returns the best possible (representable) result, it's nevertheless larger than the infinitely precise value, and multiplying by e amplifies that.

This is a very easy one if someone wants to beat me to it, although, ya, testing can consume as much of your future life as you allow it to 😉.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Labels

easyinterpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-refactorCode refactoring (with no changes in behavior)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions