-
Notifications
You must be signed in to change notification settings - Fork 282
Provide cuda::static_for
#4855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide cuda::static_for
#4855
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
🟨 CI finished in 2h 12m: Pass: 97%/187 | Total: 1d 08h | Avg: 10m 21s | Max: 59m 18s | Hits: 96%/279594
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| +/- | libcu++ |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 187)
| # | Runner |
|---|---|
| 129 | linux-amd64-cpu16 |
| 15 | windows-amd64-cpu16 |
| 12 | linux-arm64-cpu16 |
| 12 | linux-amd64-gpu-rtxa6000-latest-1 |
| 11 | linux-amd64-gpu-rtx2080-latest-1 |
| 5 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
|
|
||
| template <auto Size, typename Operator, typename... TArgs> | ||
| __host__ __device__ | ||
| constexpr void static_for(Operator op, TArgs&&... args) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want a universal referene for the arguments? The arguments will be passed to op multiple times. When a R-value reference is passed, it will be forwarded to the first op call and all of the other instantiations will get a value that has been already moved from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake was related to std::forward (fixed by Michael). Using a universal reference should not be a problem.
Co-authored-by: David Bayer <48736217+davebayer@users.noreply.github.com>
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
🟨 CI finished in 2h 57m: Pass: 97%/183 | Total: 1d 09h | Avg: 11m 03s | Max: 2h 28m | Hits: 97%/281468
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| +/- | libcu++ |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 183)
| # | Runner |
|---|---|
| 125 | linux-amd64-cpu16 |
| 15 | windows-amd64-cpu16 |
| 12 | linux-arm64-cpu16 |
| 12 | linux-amd64-gpu-rtxa6000-latest-1 |
| 11 | linux-amd64-gpu-rtx2080-latest-1 |
| 5 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
🟩 CI finished in 1h 26m: Pass: 100%/183 | Total: 1d 07h | Avg: 10m 12s | Max: 43m 26s | Hits: 97%/294386
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| +/- | libcu++ |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 183)
| # | Runner |
|---|---|
| 125 | linux-amd64-cpu16 |
| 15 | windows-amd64-cpu16 |
| 12 | linux-arm64-cpu16 |
| 12 | linux-amd64-gpu-rtxa6000-latest-1 |
| 11 | linux-amd64-gpu-rtx2080-latest-1 |
| 5 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>
🟩 CI finished in 2h 37m: Pass: 100%/183 | Total: 1d 07h | Avg: 10m 24s | Max: 43m 01s | Hits: 97%/294386
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| +/- | libcu++ |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 183)
| # | Runner |
|---|---|
| 125 | linux-amd64-cpu16 |
| 15 | windows-amd64-cpu16 |
| 12 | linux-arm64-cpu16 |
| 12 | linux-amd64-gpu-rtxa6000-latest-1 |
| 11 | linux-amd64-gpu-rtx2080-latest-1 |
| 5 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>
Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>
Description
A compile-time for
static_foris a common utility in CUDA, especially when we don't want to rely on the compiler for loop unrolling.The PR provides:
static_for<Size>(op, args...)static_for<Start, End, Step>(op, args...)(looking for feedback)