KEMBAR78
gh-136006: fix `Py_NAN` expansion on Solaris systems by picnixz · Pull Request #136575 · python/cpython · GitHub
Skip to content

Conversation

@picnixz
Copy link
Member

@picnixz picnixz commented Jul 12, 2025

@vstinner do you have access to some Solaris machine to test this?

@picnixz picnixz force-pushed the fix/solaris/nan-expansion-136006 branch from 2fbf141 to 85cffe2 Compare July 12, 2025 11:48
@vstinner
Copy link
Member

@vstinner do you have access to some Solaris machine to test this?

No, I don't have access to Solaris.

@picnixz
Copy link
Member Author

picnixz commented Sep 9, 2025

@kulikjak I think you're the Solaris expert here but could you confirm that this is the correct way to do it on Solaris?

@kulikjak
Copy link
Contributor

kulikjak commented Sep 10, 2025

Uh, sorry - I missed this whole issue. Let me go through it.

What I can say right now is that I am not seeing this issue on our side, but we are not building Python 3 on Solaris 10, so it wouldn't surprise me if there were some build issues.

And also, there is a Solaris buildbot worker. I don't have a permission to start it, but you probably can (!buildbot Solaris apparently does the trick). The test suite isn't clean yet, but the build is, and that's better than nothing ;)

@picnixz
Copy link
Member Author

picnixz commented Sep 10, 2025

!buildbot Solaris

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @picnixz for commit 85cffe2 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136575%2Fmerge

The command will test the builders whose names match following regular expression: Solaris

The builders matched are:

  • SPARCv9 Oracle Solaris 11.4 PR

@kulikjak
Copy link
Contributor

So I tested the following on both the latest Solaris 10 (with gcc 7) and 11.4 (gcc 14):

#include <stdio.h>
#include <math.h>

#define Py_NAN_A ((double)__builtin_nanf(""))
#define Py_NAN_B ((double)NAN)

int main(int argc, char const *argv[]) {
	printf("%f\n", Py_NAN_A - 0.0);
	printf("%f\n", Py_NAN_B - 0.0);
}

and the result on both is the same. The code happily compiles, and both Py_NAN_A and Py_NAN_B are expanded to the same thing:

> /opt/gcc7/bin/gcc -E test.c
.....
# 7 "test.c"
int main(int argc, char const *argv[]) {
 printf("%f\n", ((double)__builtin_nanf("")) - 0.0);
 printf("%f\n", ((double)
# 9 "test.c" 3 4
               (__builtin_nanf(""))
# 9 "test.c"
               ) - 0.0);
}

So this PR is correct, but seems unnecessary in our case. I wonder if this changed during Solaris 10 lifetime, though that is IMHO unlikely. Maybe it's the even newer gcc (or some defines from Python itself that change the behavior)?

@picnixz
Copy link
Member Author

picnixz commented Sep 10, 2025

Thank you! Can you check that strtod("NAN", NULL) works as well? I don't remember where I read this, but this should be the equivalent expression for the built-in.

@kulikjak
Copy link
Contributor

Yes, I tested it and that works as expected on both 10 and 11 :)

$ cat test.c
#include <stdio.h>
#include <math.h>

void main(int argc, char const *argv[]) {
        fprintf(stderr, "%f\n", strtod("NAN", NULL));
}
$ /opt/gcc7/bin/gcc test.c
$ ./a.out
NaN

@kulikjak
Copy link
Contributor

And the buildbot seems as happy as currently possible.

@picnixz picnixz merged commit d54b109 into python:main Sep 10, 2025
41 of 42 checks passed
@picnixz picnixz deleted the fix/solaris/nan-expansion-136006 branch September 10, 2025 09:11
@picnixz picnixz added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Sep 10, 2025
@miss-islington-app
Copy link

Thanks @picnixz for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks @picnixz for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @picnixz, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker d54b1091d43b9d8f0da0ba081565bccca3f138fd 3.13

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 10, 2025
…136575)

On Solaris, `Py_NAN` may expand as a function address instead of a floating-point number.
This amends commit 7a3b035.
(cherry picked from commit d54b109)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Sep 10, 2025

GH-138733 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Sep 10, 2025
picnixz added a commit to picnixz/cpython that referenced this pull request Sep 10, 2025
…thonGH-136575)

On Solaris, `Py_NAN` may expand as a function address instead of a floating-point number.
This amends commit 7a3b035.
(cherry picked from commit d54b109)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Sep 10, 2025

GH-138734 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Sep 10, 2025
@serhiy-storchaka
Copy link
Member

But was this change needed? Can we apply it only on Solaris 10? OpenIndiana also does not need this.

@picnixz, you need to run the tests on the Solaris buildbot without this change to check that they were needed.

@picnixz
Copy link
Member Author

picnixz commented Sep 10, 2025

But was this change needed? Can we apply it only on Solaris 10? OpenIndiana also does not need this.

Well, the issue is that it appears non-reproducible for Solaris 10 and 11 (the bot uses Solaris 11 but the bug seems to trigger only for Solaris 10 and not consistently), but the bug somehow exists (and probably due to gcc exact versions) as someone had it. It's documented in https://web.archive.org/web/20240806172752/https://www.gnu.org/software/gnulib/manual/html_node/math_002eh.html.

Now, if it's possible to check for exact solaris version I would be happy but I don't know how to do it :(

@vstinner
Copy link
Member

Now, if it's possible to check for exact solaris version I would be happy but I don't know how to do it :(

Python has the Py_SUNOS_VERSION macro. Example:

#if defined(_AIX) || (defined(__sun) && defined(__SVR4) && Py_SUNOS_VERSION <= 510)
/* issue #18259, sethostname is not declared in any useful header file on AIX
 * the same is true for Solaris 10 */

@picnixz
Copy link
Member Author

picnixz commented Sep 10, 2025

I'll take care of this tomorrow then so that we only configure this for Solaris 10.

@serhiy-storchaka
Copy link
Member

If we cannot reproduce this, we cannot check that the new definition works on platforms where the issue is reproducible. So, it is either unneeded or untested.

@kulikjak
Copy link
Contributor

The original reporter mentioned that they are on Solaris 10, but I tried it on it and I cannot reproduce it. It might be the gcc version difference, or maybe the behavior changed during Solaris 10 lifetime (though this seems like quite a big change, so I don't think that's it)?

I'd really like more info about that platform, because I cannot reproduce #136604 either.

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot ARM Raspbian Linux Asan 3.x (no tier) has failed when building commit d54b109.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1811/builds/211) and take a look at the build logs.
  4. Check if the failure is related to this commit (d54b109) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1811/builds/211

Summary of the results of the build (if available):

Click to see traceback logs
remote: Enumerating objects: 9, done.        
remote: Counting objects:  16% (1/6)        
remote: Counting objects:  33% (2/6)        
remote: Counting objects:  50% (3/6)        
remote: Counting objects:  66% (4/6)        
remote: Counting objects:  83% (5/6)        
remote: Counting objects: 100% (6/6)        
remote: Counting objects: 100% (6/6), done.        
remote: Compressing objects:  25% (1/4)        
remote: Compressing objects:  50% (2/4)        
remote: Compressing objects:  75% (3/4)        
remote: Compressing objects: 100% (4/4)        
remote: Compressing objects: 100% (4/4), done.        
remote: Total 9 (delta 2), reused 2 (delta 2), pack-reused 3 (from 2)        
From https://github.com/python/cpython
 * branch                    main       -> FETCH_HEAD
Note: switching to 'd54b1091d43b9d8f0da0ba081565bccca3f138fd'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at d54b1091d43 gh-136006: fix `Py_NAN` expansion on Solaris systems (#136575)
Switched to and reset branch 'main'

configure: WARNING: no system libmpdec found; falling back to pure-Python version for the decimal module

In file included from ./Include/internal/pycore_dict.h:11,
                 from Objects/typeobject.c:7:
In function ‘Py_DECREF_MORTAL’,
    inlined from ‘PyStackRef_XCLOSE’ at ./Include/internal/pycore_stackref.h:730:9,
    inlined from ‘_PyThreadState_PopCStackRef’ at ./Include/internal/pycore_stackref.h:810:5,
    inlined from ‘vectorcall_maybe’ at Objects/typeobject.c:3108:9:
./Include/internal/pycore_object.h:481:8: warning: array subscript 0 is outside array bounds of ‘PyObject[0]’ {aka ‘struct _object[]’} [-Warray-bounds]
  481 |     if (--op->ob_refcnt == 0) {
      |        ^

Timeout (0:05:00)!
Thread 0x0000007f787ef100 [Thread-2] (most recent call first):
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/subprocess.py"Segmentation fault
make: *** [Makefile:2494: buildbottest] Error 139

Cannot open file '/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/test-results.xml' for upload

Yhg1s pushed a commit that referenced this pull request Sep 18, 2025
… (#138734)

On Solaris, `Py_NAN` may expand as a function address instead of a floating-point number.
This amends commit 7a3b035.
(cherry picked from commit d54b109)
@kulikjak
Copy link
Contributor

@TCH68k found out what the difference between our setup is (#136604 (comment)) - I have access to the latest Oracle Solaris 10 (U11), and they have the last Sun Solaris 10 (U8). And in case of shm_open/shm_unlink, there were indeed header changes between those two versions.

It's possible that there are similar differences in this case as well.

epbennetts added a commit to epbennetts/cpython that referenced this pull request Sep 22, 2025
hugovk pushed a commit that referenced this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants