-
Notifications
You must be signed in to change notification settings - Fork 282
Drop deprecated CUB macros #3821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5c959be to
fa676c1
Compare
fa676c1 to
33e712d
Compare
| #include <limits> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move towards ::cuda::std::numeric_limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too, but in the example it wasn't necessary, and maybe <limits> is more familiar. I really don't mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me
33e712d to
7608f32
Compare
7608f32 to
8881a1d
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
🟨 CI finished in 1h 51m: Pass: 94%/93 | Total: 2d 00h | Avg: 30m 59s | Max: 1h 14m | Hits: 84%/127854
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| +/- | CCCL C Parallel Library |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
8e1a5bb to
9bf7bbc
Compare
9bf7bbc to
a1ef5b8
Compare
c/parallel/src/reduce.cu
Outdated
| // Implement part of MemBoundScaling | ||
| items_per_thread = CUB_MAX(1, CUB_MIN(items_per_thread * 4 / accumulator_type.size, items_per_thread * 2)); | ||
| block_size = CUB_MIN(block_size, (((1024 * 48) / (accumulator_type.size * items_per_thread)) + 31) / 32 * 32); | ||
| items_per_thread = cuda::std::clamp(items_per_thread * 4 / accumulator_type.size, 1, items_per_thread * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be consistent with the namespace usage _CUDA_VSTD vs. cuda::std::. Also, the global namespace is missing ::cuda...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b635217 to
d01fe85
Compare
d01fe85 to
5d5215b
Compare
db00811 to
2dda2c3
Compare
2dda2c3 to
b60fe52
Compare
🟨 CI finished in 6h 04m: Pass: 98%/93 | Total: 3d 02h | Avg: 48m 10s | Max: 5h 59m | Hits: 41%/133724
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
🟩 CI finished in 6h 26m: Pass: 100%/93 | Total: 2d 20h | Avg: 44m 26s | Max: 1h 24m | Hits: 41%/133878
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
Split out of #3748, since it causes SASS changes in at least
cub.bench.copy.memcpy.base. Several split-off PRs proposed sub-parts, accompanied by SASS diffs and benchmarks. The remainder of this PR is now only the removal of the unused macros (and the addition of a few missing includes).