-
Notifications
You must be signed in to change notification settings - Fork 192
Decouple Atomic from host on MSVC #43
Conversation
|
Looks good overall pending the first review comment; please ping me when you un-WIP this. |
| return __cxx_atomic_alignment_unwrap(detail::__atomic_load_n_cuda(&__a->__a_value, __order, detail::__scope_tag<_Sco>())); | ||
| #else | ||
| return __cxx_atomic_alignment_unwrap(::std::atomic_load_explicit(&__a->__a_value, (::std::memory_order)__order)); | ||
| alignas(_Tp) unsigned char __buf[sizeof(_Tp)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@griwes it looks like I tore off another alignment unwrap here.
Are these char* output buffers a code smell? This was done to fix initialization warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use __atomic_load_n here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__atomic_load_n does not work for non-integral non-pointer types: https://gcc.godbolt.org/z/osrcs1
Maybe I'm mistaken in some way about its usage?
149fcee to
264a9f2
Compare
|
Testing on a GV100 looks good. No failures with 1 unsupported test. I'll kick off a CI run now. I've replaced the I'm unsure if that is the correct thing to do however. |
| return __cxx_atomic_alignment_unwrap(detail::__atomic_load_n_cuda(&__a->__a_value, __order, detail::__scope_tag<_Sco>())); | ||
| #else | ||
| return __cxx_atomic_alignment_unwrap(::std::atomic_load_explicit(&__a->__a_value, (::std::memory_order)__order)); | ||
| alignas(_Tp) unsigned char __buf[sizeof(_Tp)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use __atomic_load_n here instead?
| return detail::__atomic_fetch_add_cuda(&__a->__a_value, __delta, __order, detail::__scope_tag<_Sco>()); | ||
| #else | ||
| return ::std::atomic_fetch_add_explicit(&__a->__a_value, __delta, (::std::memory_order)__order); | ||
| return __atomic_fetch_add(&__a->__a_value, __delta * __skip_amt<_Tp*>::value, __order); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem right to me that we should need the skip amount in this layer. The layer below should be doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, every deviation between the CUDA_ARCH side and this side looks like a bug to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem right to me that we should need the skip amount in this layer. The layer below should be doing that.
I'd agree, but there is no layer at the compiler intrinsic level for GCC. At that point incrementing by the sizeof(_Tp) is necessary. https://github.com/NVIDIA/libcudacxx/blob/main/libcxx/include/atomic#L846
Basically, every deviation between the CUDA_ARCH side and this side looks like a bug to me.
@griwes, @jrhemstad, and I had a meeting today about how we could resolve some of this with a better platform layering framework. There's some neat ideas on the table for making this nesting doll thing be a bit cleaner.
It would be relevant to know what things are being done wrong ahead of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good.
…/sub; make msvc sfinae a tad less repetitive
…clude to <atomic>
264a9f2 to
7d802a6
Compare
|
Atomic decouple builds clean on CI. SC: 29322243.2 |
No description provided.