This bit of defensive programming is costly. Removing the volatile declaration reduces the running time of fsum() for a hundred floats from 2.26 usec per loop to 1.42 usec per loop.
I'm thinking the x87 issues have mostly faded. If we do need to keep this, can it be wrapped in an #ifdef so that we don't have everyone paying for a problem that almost nobody has?
Linked PRs