KEMBAR78
CUDA official GCC conflicts by albestro · Pull Request #25054 · spack/spack · GitHub
Skip to content

Conversation

albestro
Copy link
Contributor

Looking at the CUDA conflicts declaration I realized that there is a mismatch between CUDA versions and officially supported GCC.

In particular, targeting CUDA 11 on generic x86_64, looking at the official DOC for various minor versions (11.0, 11.1.0, 11.2.0, 11.3.0, 11.4.0), they all report GCC 9.x as supported version.

From this, together with the following notes extents from the official doc

(2) Note that starting with CUDA 11.0, the minimum recommended GCC compiler is at least GCC 5 [ed GCC 6 for Cuda 11.4.0] due to C++11 requirements in CUDA libraries e.g. cuFFT and CUB

(3) Minor versions of the following compilers listed: of GCC, ICC, PGI and XLC, as host compilers for nvcc are supported.

I would say that:

  • CUDA 11 (at the time of writing) works with GCC up to version 9 (all minor versions included);
  • CUDA [11.0, 11.4) requires GCC 5 as minimum version
  • CUDA 11.4 requires GCC 6 as minimum version

As an additional information, I quickly checked crt/host_config.h in the CUDA version I have right now (11.0) which contains the following snippet

#if __GNUC__ > 9

#error -- unsupported GNU version! gcc versions later than 9 are not supported!

#endif /* __GNUC__ > 9 */

which looks quite strict in not supporting newer versions.

As a last note, I looked at https://gist.github.com/ax3l/9489132 that is reported just above the declaration of cuda conflicts in spack, and it says

[...] Sometimes it is possible to hack the requirements there to get some newer versions working, too :)

which may be (at least partially) in contrast with the previous crt/host_config.h. Moreover, there is also a section that tries to report in a table the compatibility list of CUDA with the different compilers, but it looks incomplete and not fully correct (e.g. it reports 11.1.0 NVCC:11.1.74 compatible with GCC (5-)6-10.0, but AFAIK is incorrect).

The content of the gist may be useful and it may be worth to put it somewhere where it can be easily updated/fixed (thanks @haampie for the suggestion).

conflicts('%gcc@11:', when='+cuda ^cuda@:11.1.0 target=x86_64:')
conflicts('%gcc@:4', when='+cuda ^cuda@11.0.0: target=x86_64:')
conflicts('%gcc@:5', when='+cuda ^cuda@11.4.0: target=x86_64:')
conflicts('%gcc@10:', when='+cuda ^cuda@11.0.0: target=x86_64:')
Copy link
Member

@haampie haampie Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also mean that newer versions of cuda would conflict with newer gcc versions, so it's better to have a lower bound on gcc + an upper bound on cuda or the other way around.

We should just check better that whenever a new cuda minor version is released we actually bump the upperbound for cuda on the conflict rule.

@haampie
Copy link
Member

haampie commented Jul 23, 2021

I ran this script on x86_64:

$ get_headers.sh
#!/bin/bash -e

cat <<EOF |
8.0-devel-ubuntu16.04
9.0-devel-ubuntu16.04
9.1-devel-ubuntu16.04
9.2-devel-ubuntu16.04
10.0-devel-ubuntu18.04
10.1-devel-ubuntu18.04
10.2-devel-ubuntu18.04
11.0.3-devel-ubuntu18.04
11.1.1-devel-ubuntu18.04
11.2.0-devel-ubuntu18.04
11.2.1-devel-ubuntu18.04
11.2.2-devel-ubuntu18.04
11.3.0-devel-ubuntu18.04
11.3.1-devel-ubuntu18.04
11.4.0-devel-ubuntu18.04
11.5.0-devel-ubuntu18.04
11.6.0-devel-ubuntu18.04
EOF

while read tag
do
    mkdir -p "$tag"
    echo "$tag"
    docker run --rm "nvidia/cuda:$tag" bash -c 'cat /usr/local/cuda-*.*/targets/x86_64-linux/include/host_config.h /usr/local/cuda-*.*/targets/x86_64-linux/include/crt/host_config.h' > "$tag/host_config.h" || true
done

and grepping that header file I get:

$ grep unsupported */host_config.h | grep -E '(gcc|clang)' | sort -h
8.0     #error -- unsupported GNU version! gcc versions later than 5 are not supported!
9.0     #error -- unsupported GNU version! gcc versions later than 6 are not supported!
9.1     #error -- unsupported GNU version! gcc versions later than 6 are not supported!
9.2     #error -- unsupported GNU version! gcc versions later than 7 are not supported!
10.0    #error -- unsupported GNU version! gcc versions later than 7 are not supported!
10.1    #error -- unsupported clang version! clang version must be less than 9 and greater than 3.2
10.1    #error -- unsupported GNU version! gcc versions later than 8 are not supported!
10.2    #error -- unsupported clang version! clang version must be less than 9 and greater than 3.2
10.2    #error -- unsupported GNU version! gcc versions later than 8 are not supported!
11.0.3  #error -- unsupported clang version! clang version must be less than 10 and greater than 3.2
11.0.3  #error -- unsupported GNU version! gcc versions later than 9 are not supported!
11.1.1  #error -- unsupported clang version! clang version must be less than 11 and greater than 3.2
11.1.1  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.2.0  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.2.0  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.2.1  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.2.1  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.2.2  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.2.2  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.3.0  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.3.0  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.3.1  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.3.1  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.4.0  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.4.0  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.5.0  #error -- unsupported clang version! clang version must be less than 13 and greater than 3.2
11.5.0  #error -- unsupported GNU version! gcc versions later than 11 are not supported!
11.6.0  #error -- unsupported clang version! clang version must be less than 14 and greater than 3.2
11.6.0  #error -- unsupported GNU version! gcc versions later than 11 are not supported!

So for GCC:

    conflicts( '%gcc@6:', when='+cuda ^cuda@:8.0')
    conflicts( '%gcc@7:', when='+cuda ^cuda@:9.1')
    conflicts( '%gcc@8:', when='+cuda ^cuda@:10.0')
    conflicts( '%gcc@9:', when='+cuda ^cuda@:10.2')
    conflicts('%gcc@10:', when='+cuda ^cuda@:11.0')
    conflicts('%gcc@11:', when='+cuda ^cuda@:11.4')
    conflicts('%gcc@12:', when='+cuda ^cuda@:11.6')

And clang:

    conflicts( '%clang@9:', when='+cuda ^cuda@:10.2')
    conflicts('%clang@10:', when='+cuda ^cuda@:11.0')
    conflicts('%clang@11:', when='+cuda ^cuda@:11.1')
    conflicts('%clang@12:', when='+cuda ^cuda@:11.4')
    conflicts('%clang@13:', when='+cuda ^cuda@:11.5')
    conflicts('%clang@14:', when='+cuda ^cuda@:11.6')

Should we just specify this on the minor versions only @ax3l, that would simplify life a bit...

@albestro
Copy link
Contributor Author

$ grep unsupported */host_config.h | grep -E '(gcc|clang)' | sort -h
8.0     #error -- unsupported GNU version! gcc versions later than 5 are not supported!
9.0     #error -- unsupported GNU version! gcc versions later than 6 are not supported!
9.1     #error -- unsupported GNU version! gcc versions later than 6 are not supported!
9.2     #error -- unsupported GNU version! gcc versions later than 7 are not supported!
10.0    #error -- unsupported GNU version! gcc versions later than 7 are not supported!
10.1    #error -- unsupported clang version! clang version must be less than 9 and greater than 3.2
10.1    #error -- unsupported GNU version! gcc versions later than 8 are not supported!
10.2    #error -- unsupported clang version! clang version must be less than 9 and greater than 3.2
10.2    #error -- unsupported GNU version! gcc versions later than 8 are not supported!
11.0.3  #error -- unsupported clang version! clang version must be less than 10 and greater than 3.2
11.0.3  #error -- unsupported GNU version! gcc versions later than 9 are not supported!
11.1.1  #error -- unsupported clang version! clang version must be less than 11 and greater than 3.2
11.1.1  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.2.0  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.2.0  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.2.1  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.2.1  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.2.2  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.2.2  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.3.0  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.3.0  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.3.1  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.3.1  #error -- unsupported GNU version! gcc versions later than 10 are not supported!
11.4.0  #error -- unsupported clang version! clang version must be less than 12 and greater than 3.2
11.4.0  #error -- unsupported GNU version! gcc versions later than 10 are not supported!

Nice job @haampie!

I was quickly checking previous CUDA version documentation (<11) and, at least for GCC, it does not seems so explicit the range of version supported. In particular, I'm not really sure about the minimal requirement for GCC for CUDA<11 (it may be related to C++11), but for sure it is stated for CUDA 11 (GCC 5, and starting from 11.4 it is GCC 6).

At least we should fix the GCC allowed range for CUDA11. I don't know if you want to touch also others (IMHO it is ok fixing as per the output provided by @haampie, which means a range for CLANG and a open range with upper bound for GCC on CUDA<11).

Waiting a feedback from others (@ax3l?) on how to proceed, and as soon as we agree, I'll update the code changes in this PR.

@alalazo alalazo requested a review from ax3l July 26, 2021 08:19
@haampie
Copy link
Member

haampie commented Jul 26, 2021

@albestro let's get a PR in that fixes the issue with CUDA 11.x and review the other versions in a separate thread.

So these lower bounds for GCC:

    conflicts('%gcc@:4', when='+cuda ^cuda@11.0:')
    conflicts('%gcc@:5', when='+cuda ^cuda@11.4:')

and these upper bounds for GCC:

    conflicts('%gcc@10:', when='+cuda ^cuda@:11.0')
    conflicts('%gcc@11:', when='+cuda ^cuda@:11.4')

They hold for x86_64, ppc64le, arm64, since the host_config.h header is exactly the same on these versions

my current script

downloading

#!/usr/bin/env bash

set -e

cat <<-EOF |
nvidia/cuda-arm64:11.0.3-devel-ubuntu18.04
nvidia/cuda-arm64:11.1.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.0-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.2-devel-ubuntu18.04
nvidia/cuda-arm64:11.3.0-devel-ubuntu18.04
nvidia/cuda-arm64:11.3.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.4.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:8.0-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.0-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.1-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.2-devel-ubuntu16.04
nvidia/cuda-ppc64le:10.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:10.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:10.2-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.0.3-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.1.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.2.0-devel
nvidia/cuda-ppc64le:11.2.1-devel
nvidia/cuda-ppc64le:11.2.2-devel
nvidia/cuda-ppc64le:11.3.0-devel-centos8
nvidia/cuda-ppc64le:11.3.1-devel
nvidia/cuda-ppc64le:11.4.0-devel
nvidia/cuda:8.0-devel-ubuntu16.04
nvidia/cuda:9.0-devel-ubuntu16.04
nvidia/cuda:9.1-devel-ubuntu16.04
nvidia/cuda:9.2-devel-ubuntu16.04
nvidia/cuda:10.0-devel-ubuntu18.04
nvidia/cuda:10.1-devel-ubuntu18.04
nvidia/cuda:10.2-devel-ubuntu18.04
nvidia/cuda:11.0.3-devel-ubuntu18.04
nvidia/cuda:11.1.1-devel-ubuntu18.04
nvidia/cuda:11.2.0-devel-ubuntu18.04
nvidia/cuda:11.2.1-devel-ubuntu18.04
nvidia/cuda:11.2.2-devel-ubuntu18.04
nvidia/cuda:11.3.0-devel-ubuntu18.04
nvidia/cuda:11.3.1-devel-ubuntu18.04
nvidia/cuda:11.4.0-devel-ubuntu18.04
EOF

while read image
do
	echo "$image"
	mkdir -p "$image"
	rootfs="/dev/shm/rootfs"
	unshare -r rm -rf "$rootfs" && mkdir "$rootfs"
	docker export $(docker create "$image") | tar -C "$rootfs" -xf -
	cat "$rootfs"/usr/local/cuda-*.*/targets/*/include/host_config.h "$rootfs"/usr/local/cuda-*.*/targets/*/include/crt/host_config.h > "$image/host_config.h" || true
done

comparing header files

#!/usr/bin/env bash

set -e

cat <<-EOF |
nvidia/cuda-arm64:11.0.3-devel-ubuntu18.04   nvidia/cuda:11.0.3-devel-ubuntu18.04
nvidia/cuda-arm64:11.1.1-devel-ubuntu18.04   nvidia/cuda:11.1.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.0-devel-ubuntu18.04   nvidia/cuda:11.2.0-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.1-devel-ubuntu18.04   nvidia/cuda:11.2.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.2.2-devel-ubuntu18.04   nvidia/cuda:11.2.2-devel-ubuntu18.04
nvidia/cuda-arm64:11.3.0-devel-ubuntu18.04   nvidia/cuda:11.3.0-devel-ubuntu18.04
nvidia/cuda-arm64:11.3.1-devel-ubuntu18.04   nvidia/cuda:11.3.1-devel-ubuntu18.04
nvidia/cuda-arm64:11.4.0-devel-ubuntu18.04   nvidia/cuda:11.4.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:8.0-devel-ubuntu16.04    nvidia/cuda:8.0-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.0-devel-ubuntu16.04    nvidia/cuda:9.0-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.1-devel-ubuntu16.04    nvidia/cuda:9.1-devel-ubuntu16.04
nvidia/cuda-ppc64le:9.2-devel-ubuntu16.04    nvidia/cuda:9.2-devel-ubuntu16.04
nvidia/cuda-ppc64le:10.0-devel-ubuntu18.04   nvidia/cuda:10.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:10.1-devel-ubuntu18.04   nvidia/cuda:10.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:10.2-devel-ubuntu18.04   nvidia/cuda:10.2-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.0.3-devel-ubuntu18.04 nvidia/cuda:11.0.3-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.1.1-devel-ubuntu18.04 nvidia/cuda:11.1.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.2.0-devel             nvidia/cuda:11.2.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.2.1-devel             nvidia/cuda:11.2.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.2.2-devel             nvidia/cuda:11.2.2-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.3.0-devel-centos8     nvidia/cuda:11.3.0-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.3.1-devel             nvidia/cuda:11.3.1-devel-ubuntu18.04
nvidia/cuda-ppc64le:11.4.0-devel             nvidia/cuda:11.4.0-devel-ubuntu18.04
EOF

while read line
do
    read -a arr <<< $line
    diff ${arr[0]}/host_config.h ${arr[1]}/host_config.h
done

running the latter i don't get any diff, so all header files are the same across archs

@albestro albestro force-pushed the alby/cuda_gcc_conflicts_mapping branch from d37128b to 16977d1 Compare July 26, 2021 12:30
@albestro
Copy link
Contributor Author

I've removed the existing duplication for CUDA 11 for the available platforms (x86_64 and ppc64_le) and I added a small section on top.

In the end this PR will result in an update of the CUDA 11 compatibility with GCC, with the lower bound updated to GCC 6 for version of CUDA>=11.4, and about the upper bound, the compatibility of GCC10.x has been extended to CUDA 11.4.

I put there also a note about the update of this latter one. After the discussions we had, it looks to me that probably the same decision had been taken about keeping an upper bound to not constrain newer CUDA versions, but then it was not get updated after new CUDA versions were released.

Is there any suggestion on how/where to put an alert for this, i.e. "do this and that when cuda version is updated"?

@haampie @ax3l

@albestro albestro requested a review from haampie July 26, 2021 16:29
ax3l
ax3l previously approved these changes Sep 4, 2021
Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot and and appreciate also the great scripting.

I would also mark GCC 10 as a conflict for CUDA <11.4.1 due to an incompatibility in a stdlib, even if Nvidia advertised it otherwise:
https://gist.github.com/ax3l/9489132#gistcomment-3860114

Let's get this in? cc @haampie

@ax3l
Copy link
Member

ax3l commented Sep 4, 2021

This probably needs a rebase. Sorry for being so busy.

@albestro
Copy link
Contributor Author

albestro commented Sep 8, 2021

@haampie @ax3l Just rebased. Please check that I did it as expected.

I also partially rephrased the note in the comment trying to make it more clear. Please give a check to that comment too.

Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx! :)

@ax3l ax3l merged commit 59d8031 into spack:develop Sep 9, 2021
@albestro albestro deleted the alby/cuda_gcc_conflicts_mapping branch September 10, 2021 04:48
@haampie haampie mentioned this pull request Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants