KEMBAR78
Drop deprecated CUB iterators by bernhardmgruber · Pull Request #3831 · NVIDIA/cccl · GitHub
Skip to content

Conversation

@bernhardmgruber
Copy link
Contributor

No description provided.

#include <cub/util_allocator.cuh>

#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That include was missing before, since the file used thrust::transform_iterator.

Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@github-actions
Copy link
Contributor

🟩 CI finished in 1h 03m: Pass: 100%/93 | Total: 18h 29m | Avg: 11m 55s | Max: 36m 30s | Hits: 95%/133473
  • 🟩 cub: Pass: 100%/45 | Total: 11h 16m | Avg: 15m 02s | Max: 36m 30s | Hits: 92%/53041

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 10h 55m | Avg: 15m 13s | Max: 36m 30s | Hits:  92%/50631 
      🟩 arm64              Pass: 100%/2   | Total: 21m 41s | Avg: 10m 50s | Max: 11m 55s | Hits:  99%/2410  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 21m | Avg: 16m 16s | Max: 36m 01s | Hits:  84%/5859  
      🟩 12.5               Pass: 100%/2   | Total: 45m 14s | Avg: 22m 37s | Max: 23m 01s | Hits:  98%/2228  
      🟩 12.8               Pass: 100%/38  | Total:  9h 10m | Avg: 14m 28s | Max: 36m 30s | Hits:  93%/44954 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 14m 38s | Avg:  7m 19s | Max:  7m 25s | Hits:  99%/2082  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 21m | Avg: 16m 16s | Max: 36m 01s | Hits:  84%/5859  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 45m 14s | Avg: 22m 37s | Max: 23m 01s | Hits:  98%/2228  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  8h 55m | Avg: 14m 52s | Max: 36m 30s | Hits:  93%/42872 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 14m 38s | Avg:  7m 19s | Max:  7m 25s | Hits:  99%/2082  
      🟩 nvcc               Pass: 100%/43  | Total: 11h 02m | Avg: 15m 23s | Max: 36m 30s | Hits:  92%/50959 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 42m 36s | Avg: 10m 39s | Max: 11m 05s | Hits:  99%/4828  
      🟩 Clang15            Pass: 100%/2   | Total: 19m 12s | Avg:  9m 36s | Max:  9m 44s | Hits:  99%/2410  
      🟩 Clang16            Pass: 100%/2   | Total: 20m 35s | Avg: 10m 17s | Max: 10m 31s | Hits:  99%/2410  
      🟩 Clang17            Pass: 100%/2   | Total: 19m 34s | Avg:  9m 47s | Max: 10m 12s | Hits:  99%/2410  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 25m | Avg: 12m 13s | Max: 23m 45s | Hits:  99%/8107  
      🟩 GCC7               Pass: 100%/2   | Total: 22m 17s | Avg: 11m 08s | Max: 11m 14s | Hits:  99%/2414  
      🟩 GCC8               Pass: 100%/1   | Total: 11m 19s | Avg: 11m 19s | Max: 11m 19s | Hits:  99%/1207  
      🟩 GCC9               Pass: 100%/2   | Total: 22m 47s | Avg: 11m 23s | Max: 12m 28s | Hits:  99%/2414  
      🟩 GCC10              Pass: 100%/2   | Total: 20m 21s | Avg: 10m 10s | Max: 10m 34s | Hits:  99%/2414  
      🟩 GCC11              Pass: 100%/2   | Total: 21m 22s | Avg: 10m 41s | Max: 11m 00s | Hits:  99%/2410  
      🟩 GCC12              Pass: 100%/2   | Total: 21m 43s | Avg: 10m 51s | Max: 10m 59s | Hits:  99%/2410  
      🟩 GCC13              Pass: 100%/11  | Total:  3h 00m | Avg: 16m 22s | Max: 24m 07s | Hits:  99%/13255 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 12m | Avg: 36m 15s | Max: 36m 30s | Hits:  16%/2062  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 11m | Avg: 35m 41s | Max: 35m 50s | Hits:  16%/2062  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 45m 14s | Avg: 22m 37s | Max: 23m 01s | Hits:  98%/2228  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  3h 07m | Avg: 11m 02s | Max: 23m 45s | Hits:  99%/20165 
      🟩 GCC                Pass: 100%/22  | Total:  4h 59m | Avg: 13m 38s | Max: 24m 07s | Hits:  99%/26524 
      🟩 MSVC               Pass: 100%/4   | Total:  2h 23m | Avg: 35m 58s | Max: 36m 30s | Hits:  16%/4124  
      🟩 NVHPC              Pass: 100%/2   | Total: 45m 14s | Avg: 22m 37s | Max: 23m 01s | Hits:  98%/2228  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 51m 59s | Avg: 17m 19s | Max: 23m 57s | Hits:  99%/3615  
      🟩 rtx2080            Pass: 100%/34  | Total:  7h 58m | Avg: 14m 04s | Max: 36m 30s | Hits:  90%/39786 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 26m | Avg: 18m 17s | Max: 24m 07s | Hits:  99%/9640  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  8h 25m | Avg: 13m 39s | Max: 36m 30s | Hits:  91%/43401 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 51s | Avg: 20m 51s | Max: 20m 51s | Hits:  99%/1205  
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 50s | Avg: 15m 50s | Max: 15m 50s | Hits:  99%/1205  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 11m | Avg: 23m 56s | Max: 24m 07s | Hits:  99%/3615  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 02m | Avg: 20m 51s | Max: 22m 05s | Hits:  99%/3615  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 51m 59s | Avg: 17m 19s | Max: 23m 57s | Hits:  99%/3615  
      🟩 90;90a;100         Pass: 100%/1   | Total: 10m 44s | Avg: 10m 44s | Max: 10m 44s | Hits:  99%/1205  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  4h 58m | Avg: 14m 56s | Max: 36m 30s | Hits:  88%/23339 
      🟩 20                 Pass: 100%/25  | Total:  6h 17m | Avg: 15m 07s | Max: 35m 33s | Hits:  96%/29702 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 30m | Avg: 8m 40s | Max: 32m 54s | Hits: 96%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 49s | Avg:  8m 24s | Max: 11m 02s | Hits:  99%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 21m | Avg:  8m 51s | Max: 32m 54s | Hits:  96%/76573 
      🟩 arm64              Pass: 100%/2   | Total:  9m 30s | Avg:  4m 45s | Max:  5m 03s | Hits:  99%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 41m 33s | Avg:  8m 18s | Max: 22m 15s | Hits:  94%/8901  
      🟩 12.5               Pass: 100%/2   | Total: 28m 47s | Avg: 14m 23s | Max: 14m 45s | Hits:  99%/3562  
      🟩 12.8               Pass: 100%/38  | Total:  5h 20m | Avg:  8m 25s | Max: 32m 54s | Hits:  96%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 22s | Avg:  5m 11s | Max:  5m 21s | Hits: 100%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 41m 33s | Avg:  8m 18s | Max: 22m 15s | Hits:  94%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 47s | Avg: 14m 23s | Max: 14m 45s | Hits:  99%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 09m | Avg:  8m 36s | Max: 32m 54s | Hits:  96%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 22s | Avg:  5m 11s | Max:  5m 21s | Hits: 100%/3562  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 20m | Avg:  8m 50s | Max: 32m 54s | Hits:  96%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 21s | Avg:  5m 05s | Max:  5m 45s | Hits: 100%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 10m 51s | Avg:  5m 25s | Max:  5m 45s | Hits: 100%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  5m 40s | Hits: 100%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 07s | Avg:  5m 33s | Max:  5m 41s | Hits: 100%/3562  
      🟩 Clang18            Pass: 100%/7   | Total: 43m 20s | Avg:  6m 11s | Max: 10m 10s | Hits: 100%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 30s | Avg:  5m 15s | Max:  5m 44s | Hits:  99%/3564  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s | Hits:  99%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 50s | Avg:  5m 25s | Max:  5m 37s | Hits:  99%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  5m 48s | Hits:  99%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 10m 48s | Avg:  5m 24s | Max:  5m 25s | Hits:  99%/3564  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 53s | Avg:  5m 56s | Max:  6m 00s | Hits:  99%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 15m | Avg:  7m 34s | Max: 11m 42s | Hits:  99%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 47m 46s | Avg: 23m 53s | Max: 25m 31s | Hits:  70%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 20m | Avg: 26m 57s | Max: 32m 54s | Hits:  70%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 47s | Avg: 14m 23s | Max: 14m 45s | Hits:  99%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 36m | Avg:  5m 40s | Max: 10m 10s | Hits: 100%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  2h 16m | Avg:  6m 30s | Max: 11m 42s | Hits:  99%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 08m | Avg: 25m 43s | Max: 32m 54s | Hits:  70%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 47s | Avg: 14m 23s | Max: 14m 45s | Hits:  99%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 15s | Avg:  8m 07s | Max: 11m 42s | Hits:  99%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 10m | Avg:  7m 35s | Max: 25m 31s | Hits:  97%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 03m | Avg: 12m 21s | Max: 32m 54s | Hits:  94%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  4h 58m | Avg:  7m 51s | Max: 25m 35s | Hits:  96%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 48m 11s | Avg: 16m 03s | Max: 32m 54s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 12s | Avg: 11m 03s | Max: 11m 42s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 15s | Avg:  8m 07s | Max: 11m 42s | Hits:  99%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 19s | Avg:  6m 19s | Max:  6m 19s | Hits:  99%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 52m | Avg:  8m 36s | Max: 25m 31s | Hits:  95%/35611 
      🟩 20                 Pass: 100%/23  | Total:  3h 21m | Avg:  8m 46s | Max: 32m 54s | Hits:  97%/40961 
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 34s | Avg: 6m 17s | Max: 10m 16s | Hits: 98%/296

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 34s | Avg:  6m 17s | Max: 10m 16s | Hits:  98%/296   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 12m 34s | Avg:  6m 17s | Max: 10m 16s | Hits:  98%/296   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 12m 34s | Avg:  6m 17s | Max: 10m 16s | Hits:  98%/296   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 34s | Avg:  6m 17s | Max: 10m 16s | Hits:  98%/296   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 34s | Avg:  6m 17s | Max: 10m 16s | Hits:  98%/296   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 34s | Avg:  6m 17s | Max: 10m 16s | Hits:  98%/296   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 12m 34s | Avg:  6m 17s | Max: 10m 16s | Hits:  98%/296   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s | Hits:  98%/148   
      🟩 Test               Pass: 100%/1   | Total: 10m 16s | Avg: 10m 16s | Max: 10m 16s | Hits:  98%/148   
    
  • 🟩 python: Pass: 100%/1 | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 29m 38s | Avg: 29m 38s | Max: 29m 38s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Copy link
Contributor

@elstehle elstehle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. We probably want to help - or at least inform - our RAFT friends to switch to thrust iterators.

@bernhardmgruber bernhardmgruber merged commit b8f4d77 into NVIDIA:main Feb 17, 2025
109 of 112 checks passed
@bernhardmgruber bernhardmgruber deleted the drop_cub_iterators branch February 17, 2025 13:55
davebayer pushed a commit to davebayer/cccl that referenced this pull request Feb 20, 2025
davebayer pushed a commit to davebayer/cccl that referenced this pull request Apr 7, 2025
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Jun 28, 2025
…sage in ATen (#153373)

A major release of CCCL 3.0.0 will introduce some bc-breaking changes. Namely iterators like TransformInputIterator and ConstantInputIterator were moved from CUB to Thrust, some operators like Max and Sum were moved to LibCUDACXX.

For the more info on changes please visit: https://nvidia.github.io/cccl/cccl/3.0_migration_guide.html

This is a follow up to PR #147493. A description from the original PR:
> Several cub iterators have been deprecated and removed in the latest CCCL (cub) development NVIDIA/cccl#3831. This PR replaced the usage of those cub iterators with thrust iterators.
>
> Some cub thread operators were also deprecated and removed in NVIDIA/cccl#3918. This PR replaced those operators with libcudacxx ops.
>
> This might also affect ROCM usability a bit.
>
> This patch is tested to work with CCCL commit at NVIDIA/cccl@82befb0
>
> Tracking of CCCL/CUB deprecations in the most recent development NVIDIA/cccl#101

Pull Request resolved: #153373
Approved by: https://github.com/cyyever, https://github.com/atalman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants