-
Notifications
You must be signed in to change notification settings - Fork 2.4k
CUDA official GCC conflicts #25054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA official GCC conflicts #25054
Conversation
conflicts('%gcc@11:', when='+cuda ^cuda@:11.1.0 target=x86_64:') | ||
conflicts('%gcc@:4', when='+cuda ^cuda@11.0.0: target=x86_64:') | ||
conflicts('%gcc@:5', when='+cuda ^cuda@11.4.0: target=x86_64:') | ||
conflicts('%gcc@10:', when='+cuda ^cuda@11.0.0: target=x86_64:') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also mean that newer versions of cuda would conflict with newer gcc versions, so it's better to have a lower bound on gcc + an upper bound on cuda or the other way around.
We should just check better that whenever a new cuda minor version is released we actually bump the upperbound for cuda on the conflict rule.
I ran this script on x86_64: $ get_headers.sh
#!/bin/bash -e
cat <<EOF |
8.0-devel-ubuntu16.04
9.0-devel-ubuntu16.04
9.1-devel-ubuntu16.04
9.2-devel-ubuntu16.04
10.0-devel-ubuntu18.04
10.1-devel-ubuntu18.04
10.2-devel-ubuntu18.04
11.0.3-devel-ubuntu18.04
11.1.1-devel-ubuntu18.04
11.2.0-devel-ubuntu18.04
11.2.1-devel-ubuntu18.04
11.2.2-devel-ubuntu18.04
11.3.0-devel-ubuntu18.04
11.3.1-devel-ubuntu18.04
11.4.0-devel-ubuntu18.04
11.5.0-devel-ubuntu18.04
11.6.0-devel-ubuntu18.04
EOF
while read tag
do
mkdir -p "$tag"
echo "$tag"
docker run --rm "nvidia/cuda:$tag" bash -c 'cat /usr/local/cuda-*.*/targets/x86_64-linux/include/host_config.h /usr/local/cuda-*.*/targets/x86_64-linux/include/crt/host_config.h' > "$tag/host_config.h" || true
done and grepping that header file I get:
So for GCC: conflicts( '%gcc@6:', when='+cuda ^cuda@:8.0')
conflicts( '%gcc@7:', when='+cuda ^cuda@:9.1')
conflicts( '%gcc@8:', when='+cuda ^cuda@:10.0')
conflicts( '%gcc@9:', when='+cuda ^cuda@:10.2')
conflicts('%gcc@10:', when='+cuda ^cuda@:11.0')
conflicts('%gcc@11:', when='+cuda ^cuda@:11.4')
conflicts('%gcc@12:', when='+cuda ^cuda@:11.6') And clang: conflicts( '%clang@9:', when='+cuda ^cuda@:10.2')
conflicts('%clang@10:', when='+cuda ^cuda@:11.0')
conflicts('%clang@11:', when='+cuda ^cuda@:11.1')
conflicts('%clang@12:', when='+cuda ^cuda@:11.4')
conflicts('%clang@13:', when='+cuda ^cuda@:11.5')
conflicts('%clang@14:', when='+cuda ^cuda@:11.6') Should we just specify this on the minor versions only @ax3l, that would simplify life a bit... |
Nice job @haampie! I was quickly checking previous CUDA version documentation (<11) and, at least for GCC, it does not seems so explicit the range of version supported. In particular, I'm not really sure about the minimal requirement for GCC for CUDA<11 (it may be related to C++11), but for sure it is stated for CUDA 11 (GCC 5, and starting from 11.4 it is GCC 6). At least we should fix the GCC allowed range for CUDA11. I don't know if you want to touch also others (IMHO it is ok fixing as per the output provided by @haampie, which means a range for CLANG and a open range with upper bound for GCC on CUDA<11). Waiting a feedback from others (@ax3l?) on how to proceed, and as soon as we agree, I'll update the code changes in this PR. |
@albestro let's get a PR in that fixes the issue with CUDA 11.x and review the other versions in a separate thread. So these lower bounds for GCC:
and these upper bounds for GCC:
They hold for x86_64, ppc64le, arm64, since the host_config.h header is exactly the same on these versions my current scriptdownloading #!/usr/bin/env bash
set -e
cat <<-EOF |
nvidia/cuda-arm64:11.0.3-devel-ubuntu18.04
nvidia/cuda-arm64:11.1.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.0-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.2-devel-ubuntu18.04
nvidia/cuda-arm64:11.3.0-devel-ubuntu18.04
nvidia/cuda-arm64:11.3.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.4.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:8.0-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.0-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.1-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.2-devel-ubuntu16.04
nvidia/cuda-ppc64le:10.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:10.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:10.2-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.0.3-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.1.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.2.0-devel
nvidia/cuda-ppc64le:11.2.1-devel
nvidia/cuda-ppc64le:11.2.2-devel
nvidia/cuda-ppc64le:11.3.0-devel-centos8
nvidia/cuda-ppc64le:11.3.1-devel
nvidia/cuda-ppc64le:11.4.0-devel
nvidia/cuda:8.0-devel-ubuntu16.04
nvidia/cuda:9.0-devel-ubuntu16.04
nvidia/cuda:9.1-devel-ubuntu16.04
nvidia/cuda:9.2-devel-ubuntu16.04
nvidia/cuda:10.0-devel-ubuntu18.04
nvidia/cuda:10.1-devel-ubuntu18.04
nvidia/cuda:10.2-devel-ubuntu18.04
nvidia/cuda:11.0.3-devel-ubuntu18.04
nvidia/cuda:11.1.1-devel-ubuntu18.04
nvidia/cuda:11.2.0-devel-ubuntu18.04
nvidia/cuda:11.2.1-devel-ubuntu18.04
nvidia/cuda:11.2.2-devel-ubuntu18.04
nvidia/cuda:11.3.0-devel-ubuntu18.04
nvidia/cuda:11.3.1-devel-ubuntu18.04
nvidia/cuda:11.4.0-devel-ubuntu18.04
EOF
while read image
do
echo "$image"
mkdir -p "$image"
rootfs="/dev/shm/rootfs"
unshare -r rm -rf "$rootfs" && mkdir "$rootfs"
docker export $(docker create "$image") | tar -C "$rootfs" -xf -
cat "$rootfs"/usr/local/cuda-*.*/targets/*/include/host_config.h "$rootfs"/usr/local/cuda-*.*/targets/*/include/crt/host_config.h > "$image/host_config.h" || true
done comparing header files #!/usr/bin/env bash
set -e
cat <<-EOF |
nvidia/cuda-arm64:11.0.3-devel-ubuntu18.04 nvidia/cuda:11.0.3-devel-ubuntu18.04
nvidia/cuda-arm64:11.1.1-devel-ubuntu18.04 nvidia/cuda:11.1.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.0-devel-ubuntu18.04 nvidia/cuda:11.2.0-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.1-devel-ubuntu18.04 nvidia/cuda:11.2.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.2-devel-ubuntu18.04 nvidia/cuda:11.2.2-devel-ubuntu18.04
nvidia/cuda-arm64:11.3.0-devel-ubuntu18.04 nvidia/cuda:11.3.0-devel-ubuntu18.04
nvidia/cuda-arm64:11.3.1-devel-ubuntu18.04 nvidia/cuda:11.3.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.4.0-devel-ubuntu18.04 nvidia/cuda:11.4.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:8.0-devel-ubuntu16.04 nvidia/cuda:8.0-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.0-devel-ubuntu16.04 nvidia/cuda:9.0-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.1-devel-ubuntu16.04 nvidia/cuda:9.1-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.2-devel-ubuntu16.04 nvidia/cuda:9.2-devel-ubuntu16.04
nvidia/cuda-ppc64le:10.0-devel-ubuntu18.04 nvidia/cuda:10.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:10.1-devel-ubuntu18.04 nvidia/cuda:10.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:10.2-devel-ubuntu18.04 nvidia/cuda:10.2-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.0.3-devel-ubuntu18.04 nvidia/cuda:11.0.3-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.1.1-devel-ubuntu18.04 nvidia/cuda:11.1.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.2.0-devel nvidia/cuda:11.2.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.2.1-devel nvidia/cuda:11.2.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.2.2-devel nvidia/cuda:11.2.2-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.3.0-devel-centos8 nvidia/cuda:11.3.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.3.1-devel nvidia/cuda:11.3.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.4.0-devel nvidia/cuda:11.4.0-devel-ubuntu18.04
EOF
while read line
do
read -a arr <<< $line
diff ${arr[0]}/host_config.h ${arr[1]}/host_config.h
done running the latter i don't get any diff, so all header files are the same across archs |
d37128b
to
16977d1
Compare
I've removed the existing duplication for CUDA 11 for the available platforms (x86_64 and ppc64_le) and I added a small section on top. In the end this PR will result in an update of the CUDA 11 compatibility with GCC, with the lower bound updated to GCC 6 for version of CUDA>=11.4, and about the upper bound, the compatibility of GCC10.x has been extended to CUDA 11.4. I put there also a note about the update of this latter one. After the discussions we had, it looks to me that probably the same decision had been taken about keeping an upper bound to not constrain newer CUDA versions, but then it was not get updated after new CUDA versions were released. Is there any suggestion on how/where to put an alert for this, i.e. "do this and that when cuda version is updated"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot and and appreciate also the great scripting.
I would also mark GCC 10 as a conflict for CUDA <11.4.1 due to an incompatibility in a stdlib, even if Nvidia advertised it otherwise:
https://gist.github.com/ax3l/9489132#gistcomment-3860114
Let's get this in? cc @haampie
This probably needs a rebase. Sorry for being so busy. |
16977d1
to
8ccae20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx! :)
Looking at the CUDA conflicts declaration I realized that there is a mismatch between CUDA versions and officially supported GCC.
In particular, targeting CUDA 11 on generic x86_64, looking at the official DOC for various minor versions (11.0, 11.1.0, 11.2.0, 11.3.0, 11.4.0), they all report GCC 9.x as supported version.
From this, together with the following notes extents from the official doc
I would say that:
As an additional information, I quickly checked
crt/host_config.h
in the CUDA version I have right now (11.0) which contains the following snippetwhich looks quite strict in not supporting newer versions.
As a last note, I looked at https://gist.github.com/ax3l/9489132 that is reported just above the declaration of cuda conflicts in spack, and it says
which may be (at least partially) in contrast with the previous
crt/host_config.h
. Moreover, there is also a section that tries to report in a table the compatibility list of CUDA with the different compilers, but it looks incomplete and not fully correct (e.g. it reports11.1.0 NVCC:11.1.74 compatible with GCC (5-)6-10.0
, but AFAIK is incorrect).The content of the gist may be useful and it may be worth to put it somewhere where it can be easily updated/fixed (thanks @haampie for the suggestion).