KEMBAR78
Add multi-dimensional support to block_radix_sort routines. by tpn · Pull Request #4035 · NVIDIA/cccl · GitHub
Skip to content

Conversation

@tpn
Copy link
Contributor

@tpn tpn commented Mar 6, 2025

Tests have been updated and gave the _block_radix_sort.py module some general spring cleaning.

N.B. I created this off #4028; I'll rebase once that's merged. Only review the third commit onward.

@tpn tpn requested a review from a team as a code owner March 6, 2025 04:35
@tpn tpn requested a review from leofang March 6, 2025 04:35
@github-project-automation github-project-automation bot moved this to Todo in CCCL Mar 6, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Mar 6, 2025
@tpn tpn requested a review from brycelelbach March 6, 2025 04:36
@tpn tpn self-assigned this Mar 6, 2025
@tpn tpn added the 3.0 Targeted for 3.0 release label Mar 6, 2025
@tpn tpn linked an issue Mar 6, 2025 that may be closed by this pull request
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2025

🟩 CI finished in 1h 02m: Pass: 100%/1 | Total: 1h 02m | Avg: 1h 02m | Max: 1h 02m
  • 🟩 python: Pass: 100%/1 | Total: 1h 02m | Avg: 1h 02m | Max: 1h 02m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-rtx2080-latest-1

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending one naming question

@tpn tpn force-pushed the 4032-multidimensional-block_radix_sort branch from 2ca462b to 45bc3f3 Compare March 13, 2025 18:44
@github-actions
Copy link
Contributor

🟥 CI finished in 9m 06s: Pass: 0%/1 | Total: 9m 06s | Avg: 9m 06s | Max: 9m 06s
  • 🟥 python: Pass: 0%/1 | Total: 9m 06s | Avg: 9m 06s | Max: 9m 06s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s
    🟥 ctk
      🟥 12.8               Pass:   0%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s
    🟥 cudacxx
      🟥 nvcc12.8           Pass:   0%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s
    🟥 gpu
      🟥 rtx2080            Pass:   0%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-rtx2080-latest-1

@tpn tpn force-pushed the 4032-multidimensional-block_radix_sort branch from 45bc3f3 to a79a4ee Compare March 13, 2025 19:02
@github-actions
Copy link
Contributor

🟩 CI finished in 1h 02m: Pass: 100%/1 | Total: 1h 02m | Avg: 1h 02m | Max: 1h 02m
  • 🟩 python: Pass: 100%/1 | Total: 1h 02m | Avg: 1h 02m | Max: 1h 02m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-rtx2080-latest-1

@tpn tpn merged commit aa3190f into NVIDIA:main Mar 13, 2025
17 of 18 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Mar 13, 2025
davebayer pushed a commit to davebayer/cccl that referenced this pull request Apr 7, 2025
)

* Implement CudaSharedMemConfig enum with supporting tests.

* Relocate CUB_BLOCK_SCAN_ALOGS to _common.py.

We will be using it from _block_radix_sort.py imminently.

* Add multi-dimensional support to block_radix_sort routines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.0 Targeted for 3.0 release

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Support multidimensional thread blocks in block_radix_sort

2 participants