KEMBAR78
Backport to 3.0: Fix grid dependency sync in cub::DeviceMergeSort (#5456) by bernhardmgruber · Pull Request #5461 · NVIDIA/cccl · GitHub
Skip to content

Conversation

@bernhardmgruber
Copy link
Contributor

The sync was too late and did not guard loading from merge_partitions, leading to a data race

The sync was too late and did not guard loading from merge_partitions, leading to a data race
@bernhardmgruber bernhardmgruber requested a review from a team as a code owner August 7, 2025 18:47
@bernhardmgruber bernhardmgruber requested a review from fbusato August 7, 2025 18:47
@github-project-automation github-project-automation bot moved this to Todo in CCCL Aug 7, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Aug 7, 2025
@bernhardmgruber bernhardmgruber enabled auto-merge (squash) August 7, 2025 19:17
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2025

🟨 CI finished in 4h 44m: Pass: 98%/103 | Total: 3d 06h | Avg: 45m 30s | Max: 1h 28m | Hits: 76%/141876
  • 🟨 cub: Pass: 97%/48 | Total: 2d 02h | Avg: 1h 02m | Max: 1h 28m | Hits: 66%/55944

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/46  | Total:  1d 23h | Avg:  1h 02m | Max:  1h 28m | Hits:  66%/53496 
      🟩 arm64              Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  55%/2448  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/5   | Total:  5h 48m | Avg:  1h 09m | Max:  1h 12m | Hits:  66%/5949  
      🟩 12.6               Pass: 100%/2   | Total:  2h 47m | Avg:  1h 23m | Max:  1h 28m | Hits:  65%/2254  
      🔍 12.8               Pass:  97%/41  | Total:  1d 17h | Avg:  1h 00m | Max:  1h 28m | Hits:  66%/47741 
    🔍 cudacxx: nvcc12.8 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 11m | Hits:  67%/2110  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 48m | Avg:  1h 09m | Max:  1h 12m | Hits:  66%/5949  
      🟩 nvcc12.6           Pass: 100%/2   | Total:  2h 47m | Avg:  1h 23m | Max:  1h 28m | Hits:  65%/2254  
      🔍 nvcc12.8           Pass:  97%/39  | Total:  1d 15h | Avg:  1h 00m | Max:  1h 28m | Hits:  66%/45631 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 11m | Hits:  67%/2110  
      🔍 nvcc               Pass:  97%/46  | Total:  1d 23h | Avg:  1h 02m | Max:  1h 28m | Hits:  66%/53834 
    🔍 cxx: Clang18 🔍
      🟩 Clang14            Pass: 100%/4   | Total:  4h 28m | Avg:  1h 07m | Max:  1h 09m | Hits:  60%/4904  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m | Hits:  55%/2448  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  55%/2448  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 06m | Hits:  55%/2448  
      🔍 Clang18            Pass:  85%/7   | Total:  6h 37m | Avg: 56m 49s | Max:  1h 12m | Hits:  68%/7006  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 09m | Hits:  63%/2452  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 07m | Avg:  1h 07m | Max:  1h 07m | Hits:  55%/1226  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 13m | Hits:  60%/2452  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m | Hits:  55%/2452  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 10m | Hits:  55%/2448  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 05m | Hits:  55%/2448  
      🟩 GCC13              Pass: 100%/12  | Total:  9h 42m | Avg: 48m 31s | Max:  1h 12m | Hits:  77%/14688 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 40m | Avg:  1h 20m | Max:  1h 28m | Hits:  71%/2090  
      🟩 MSVC14.42          Pass: 100%/4   | Total:  5h 03m | Avg:  1h 15m | Max:  1h 25m | Hits:  71%/4180  
      🟩 NVHPC25.1          Pass: 100%/2   | Total:  2h 47m | Avg:  1h 23m | Max:  1h 28m | Hits:  65%/2254  
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  94%/17  | Total: 17h 29m | Avg:  1h 01m | Max:  1h 12m | Hits:  61%/19254 
      🟩 GCC                Pass: 100%/23  | Total: 22h 09m | Avg: 57m 48s | Max:  1h 13m | Hits:  68%/28166 
      🟩 MSVC               Pass: 100%/6   | Total:  7h 43m | Avg:  1h 17m | Max:  1h 28m | Hits:  71%/6270  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 47m | Avg:  1h 23m | Max:  1h 28m | Hits:  65%/2254  
    🔍 gpu: rtxa6000 🔍
      🟩 h100               Pass: 100%/3   | Total:  1h 58m | Avg: 39m 24s | Max: 57m 12s | Hits:  77%/3672  
      🟩 rtx2080            Pass: 100%/37  | Total:  1d 18h | Avg:  1h 09m | Max:  1h 28m | Hits:  60%/43704 
      🔍 rtxa6000           Pass:  87%/8   | Total:  5h 28m | Avg: 41m 03s | Max:  1h 10m | Hits:  88%/8568  
    🔍 jobs: TestGPU 🔍
      🟩 Build              Pass: 100%/40  | Total:  1d 21h | Avg:  1h 08m | Max:  1h 28m | Hits:  60%/47376 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 29m 14s | Avg: 29m 14s | Max: 29m 14s | Hits:  99%/1224  
      🟩 GraphCapture       Pass: 100%/1   | Total: 32m 30s | Avg: 32m 30s | Max: 32m 30s | Hits:  99%/1224  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 43m | Avg: 34m 22s | Max: 40m 29s | Hits:  99%/3672  
      🔍 TestGPU            Pass:  66%/3   | Total:  1h 52m | Avg: 37m 33s | Max: 57m 12s | Hits:  84%/2448  
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 23h 45m | Avg:  1h 11m | Max:  1h 28m | Hits:  60%/23693 
      🔍 20                 Pass:  96%/28  | Total:  1d 02h | Avg: 56m 35s | Max:  1h 22m | Hits:  70%/32251 
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 58m | Avg: 39m 24s | Max: 57m 12s | Hits:  77%/3672  
      🟩 90;90a             Pass: 100%/2   | Total:  1h 58m | Avg: 59m 01s | Max:  1h 08m | Hits:  63%/2269  
      🟩 100;120            Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 07m | Hits:  68%/2269  
    
  • 🟥 python: Pass: 0%/1 | Total: 6m 24s | Avg: 6m 24s | Max: 6m 24s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s
    🟥 ctk
      🟥 12.8               Pass:   0%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s
    🟥 cudacxx
      🟥 nvcc12.8           Pass:   0%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s
    🟥 gpu
      🟥 rtx2080            Pass:   0%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total:  6m 24s | Avg:  6m 24s | Max:  6m 24s
    
  • 🟩 thrust: Pass: 100%/48 | Total: 1d 02h | Avg: 33m 26s | Max: 59m 40s | Hits: 84%/85612

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 40m 06s | Avg: 20m 03s | Max: 26m 04s | Hits:  91%/3570  
    🟩 cpu
      🟩 amd64              Pass: 100%/46  | Total:  1d 01h | Avg: 33m 34s | Max: 59m 40s | Hits:  84%/82043 
      🟩 arm64              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 41s | Max: 32m 35s | Hits:  82%/3569  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 08m | Avg: 37m 42s | Max: 57m 34s | Hits:  82%/8916  
      🟩 12.6               Pass: 100%/2   | Total:  1h 54m | Avg: 57m 17s | Max: 57m 53s | Hits:  80%/3568  
      🟩 12.8               Pass: 100%/41  | Total: 21h 42m | Avg: 31m 46s | Max: 59m 40s | Hits:  84%/73128 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 51m 29s | Avg: 25m 44s | Max: 26m 40s | Hits:  82%/3568  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 08m | Avg: 37m 42s | Max: 57m 34s | Hits:  82%/8916  
      🟩 nvcc12.6           Pass: 100%/2   | Total:  1h 54m | Avg: 57m 17s | Max: 57m 53s | Hits:  80%/3568  
      🟩 nvcc12.8           Pass: 100%/39  | Total: 20h 50m | Avg: 32m 04s | Max: 59m 40s | Hits:  84%/69560 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 51m 29s | Avg: 25m 44s | Max: 26m 40s | Hits:  82%/3568  
      🟩 nvcc               Pass: 100%/46  | Total:  1d 01h | Avg: 33m 47s | Max: 59m 40s | Hits:  84%/82044 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 04m | Avg: 31m 11s | Max: 32m 20s | Hits:  82%/7136  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 08m | Avg: 34m 00s | Max: 35m 41s | Hits:  82%/3568  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 42s | Max: 34m 53s | Hits:  82%/3568  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 09m | Avg: 34m 30s | Max: 34m 54s | Hits:  82%/3568  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 44m | Avg: 23m 28s | Max: 33m 28s | Hits:  87%/12488 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 09m | Avg: 34m 56s | Max: 35m 28s | Hits:  82%/3570  
      🟩 GCC8               Pass: 100%/1   | Total: 32m 10s | Avg: 32m 10s | Max: 32m 10s | Hits:  82%/1785  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 11m | Avg: 35m 55s | Max: 36m 41s | Hits:  82%/3570  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 00s | Max: 35m 03s | Hits:  82%/3570  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 18s | Max: 31m 26s | Hits:  82%/3570  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 40s | Max: 36m 34s | Hits:  82%/3570  
      🟩 GCC13              Pass: 100%/11  | Total:  4h 28m | Avg: 24m 22s | Max: 41m 33s | Hits:  85%/19635 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 41s | Max: 57m 48s | Hits:  80%/3556  
      🟩 MSVC14.42          Pass: 100%/5   | Total:  3h 58m | Avg: 47m 38s | Max: 59m 40s | Hits:  84%/8890  
      🟩 NVHPC25.1          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 17s | Max: 57m 53s | Hits:  80%/3568  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 11m | Avg: 28m 54s | Max: 35m 41s | Hits:  84%/30328 
      🟩 GCC                Pass: 100%/22  | Total: 10h 46m | Avg: 29m 21s | Max: 41m 33s | Hits:  84%/39270 
      🟩 MSVC               Pass: 100%/7   | Total:  5h 53m | Avg: 50m 30s | Max: 59m 40s | Hits:  83%/12446 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 54m | Avg: 57m 17s | Max: 57m 53s | Hits:  80%/3568  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 35m 40s | Avg: 17m 50s | Max: 20m 42s | Hits:  91%/3570  
      🟩 rtx2080            Pass: 100%/36  | Total: 22h 09m | Avg: 36m 56s | Max: 57m 53s | Hits:  81%/64209 
      🟩 rtx4090            Pass: 100%/10  | Total:  4h 00m | Avg: 24m 00s | Max: 59m 40s | Hits:  92%/17833 
    🟩 jobs
      🟩 Build              Pass: 100%/41  | Total:  1d 01h | Avg: 36m 38s | Max: 59m 40s | Hits:  81%/73126 
      🟩 TestCPU            Pass: 100%/3   | Total: 48m 26s | Avg: 16m 08s | Max: 31m 58s | Hits:  99%/5347  
      🟩 TestGPU            Pass: 100%/4   | Total: 54m 41s | Avg: 13m 40s | Max: 14m 58s | Hits:  99%/7139  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 35m 40s | Avg: 17m 50s | Max: 20m 42s | Hits:  91%/3570  
      🟩 90;90a             Pass: 100%/2   | Total:  1h 14m | Avg: 37m 08s | Max: 43m 47s | Hits:  81%/3563  
      🟩 100;120            Pass: 100%/2   | Total:  1h 17m | Avg: 38m 52s | Max: 47m 48s | Hits:  81%/3563  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 52m | Avg: 38m 36s | Max: 57m 48s | Hits:  80%/35671 
      🟩 20                 Pass: 100%/26  | Total: 13h 13m | Avg: 30m 30s | Max: 59m 40s | Hits:  86%/46371 
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 44m 11s | Avg: 11m 02s | Max: 13m 57s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 24m 14s | Avg: 12m 07s | Max: 13m 57s
      🟩 arm64              Pass: 100%/2   | Total: 19m 57s | Avg:  9m 58s | Max: 12m 46s
    🟩 ctk
      🟩 12.6               Pass: 100%/4   | Total: 44m 11s | Avg: 11m 02s | Max: 13m 57s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/4   | Total: 44m 11s | Avg: 11m 02s | Max: 13m 57s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 44m 11s | Avg: 11m 02s | Max: 13m 57s
    🟩 cxx
      🟩 NVHPC25.1          Pass: 100%/4   | Total: 44m 11s | Avg: 11m 02s | Max: 13m 57s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 44m 11s | Avg: 11m 02s | Max: 13m 57s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 44m 11s | Avg: 11m 02s | Max: 13m 57s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 44m 11s | Avg: 11m 02s | Max: 13m 57s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total: 26m 43s | Avg: 13m 21s | Max: 13m 57s
      🟩 20                 Pass: 100%/2   | Total: 17m 28s | Avg:  8m 44s | Max: 10m 17s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 21m 56s | Avg: 10m 58s | Max: 17m 54s | Hits: 65%/320

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 21m 56s | Avg: 10m 58s | Max: 17m 54s | Hits:  65%/320   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 21m 56s | Avg: 10m 58s | Max: 17m 54s | Hits:  65%/320   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 21m 56s | Avg: 10m 58s | Max: 17m 54s | Hits:  65%/320   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 21m 56s | Avg: 10m 58s | Max: 17m 54s | Hits:  65%/320   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 21m 56s | Avg: 10m 58s | Max: 17m 54s | Hits:  65%/320   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 21m 56s | Avg: 10m 58s | Max: 17m 54s | Hits:  65%/320   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 21m 56s | Avg: 10m 58s | Max: 17m 54s | Hits:  65%/320   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  4m 02s | Avg:  4m 02s | Max:  4m 02s | Hits:  31%/160   
      🟩 Test               Pass: 100%/1   | Total: 17m 54s | Avg: 17m 54s | Max: 17m 54s | Hits:  98%/160   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 103)

# Runner
70 linux-amd64-cpu16
13 windows-amd64-cpu16
6 linux-arm64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@bernhardmgruber
Copy link
Contributor Author

/ok to test e975e16

@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2025

🟩 CI finished in 2h 44m: Pass: 100%/103 | Total: 22h 41m | Avg: 13m 12s | Max: 2h 06m | Hits: 99%/143100
  • 🟩 cub: Pass: 100%/48 | Total: 12h 45m | Avg: 15m 56s | Max: 2h 06m | Hits: 98%/57168

    🟩 cpu
      🟩 amd64              Pass: 100%/46  | Total: 12h 32m | Avg: 16m 21s | Max:  2h 06m | Hits:  98%/54720 
      🟩 arm64              Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 40s | Hits:  99%/2448  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 45m | Avg: 21m 02s | Max:  1h 04m | Hits:  93%/5949  
      🟩 12.6               Pass: 100%/2   | Total: 24m 49s | Avg: 12m 24s | Max: 12m 46s | Hits:  98%/2254  
      🟩 12.8               Pass: 100%/41  | Total: 10h 34m | Avg: 15m 29s | Max:  2h 06m | Hits:  98%/48965 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 21s | Avg:  5m 10s | Max:  5m 18s | Hits: 100%/2110  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 45m | Avg: 21m 02s | Max:  1h 04m | Hits:  93%/5949  
      🟩 nvcc12.6           Pass: 100%/2   | Total: 24m 49s | Avg: 12m 24s | Max: 12m 46s | Hits:  98%/2254  
      🟩 nvcc12.8           Pass: 100%/39  | Total: 10h 24m | Avg: 16m 00s | Max:  2h 06m | Hits:  98%/46855 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 21s | Avg:  5m 10s | Max:  5m 18s | Hits: 100%/2110  
      🟩 nvcc               Pass: 100%/46  | Total: 12h 34m | Avg: 16m 24s | Max:  2h 06m | Hits:  98%/55058 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 27m 29s | Avg:  6m 52s | Max:  7m 54s | Hits:  99%/4904  
      🟩 Clang15            Pass: 100%/2   | Total: 13m 39s | Avg:  6m 49s | Max:  7m 04s | Hits: 100%/2448  
      🟩 Clang16            Pass: 100%/2   | Total: 14m 06s | Avg:  7m 03s | Max:  7m 10s | Hits: 100%/2448  
      🟩 Clang17            Pass: 100%/2   | Total: 13m 10s | Avg:  6m 35s | Max:  6m 36s | Hits: 100%/2448  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 29m | Avg: 12m 46s | Max: 30m 30s | Hits: 100%/8230  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 11m | Avg: 35m 39s | Max:  1h 04m | Hits:  84%/2452  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 56s | Avg:  6m 56s | Max:  6m 56s | Hits:  99%/1226  
      🟩 GCC9               Pass: 100%/2   | Total: 13m 55s | Avg:  6m 57s | Max:  7m 35s | Hits:  99%/2452  
      🟩 GCC10              Pass: 100%/2   | Total: 13m 50s | Avg:  6m 55s | Max:  6m 58s | Hits:  99%/2452  
      🟩 GCC11              Pass: 100%/2   | Total: 14m 20s | Avg:  7m 10s | Max:  7m 20s | Hits:  99%/2448  
      🟩 GCC12              Pass: 100%/2   | Total: 14m 47s | Avg:  7m 23s | Max:  7m 28s | Hits:  99%/2448  
      🟩 GCC13              Pass: 100%/12  | Total:  5h 17m | Avg: 26m 27s | Max:  2h 06m | Hits:  97%/14688 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 43m 03s | Avg: 21m 31s | Max: 22m 26s | Hits:  99%/2090  
      🟩 MSVC14.42          Pass: 100%/4   | Total:  1h 26m | Avg: 21m 41s | Max: 22m 25s | Hits:  99%/4180  
      🟩 NVHPC25.1          Pass: 100%/2   | Total: 24m 49s | Avg: 12m 24s | Max: 12m 46s | Hits:  98%/2254  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 37m | Avg:  9m 16s | Max: 30m 30s | Hits:  99%/20478 
      🟩 GCC                Pass: 100%/23  | Total:  7h 32m | Avg: 19m 40s | Max:  2h 06m | Hits:  97%/28166 
      🟩 MSVC               Pass: 100%/6   | Total:  2h 09m | Avg: 21m 38s | Max: 22m 26s | Hits:  99%/6270  
      🟩 NVHPC              Pass: 100%/2   | Total: 24m 49s | Avg: 12m 24s | Max: 12m 46s | Hits:  98%/2254  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 01m | Avg: 20m 31s | Max: 29m 13s | Hits:  99%/3672  
      🟩 rtx2080            Pass: 100%/37  | Total:  6h 48m | Avg: 11m 01s | Max:  1h 04m | Hits:  98%/43704 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 55m | Avg: 36m 56s | Max:  2h 06m | Hits:  95%/9792  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  7h 08m | Avg: 10m 42s | Max:  1h 04m | Hits:  98%/47376 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 28m 48s | Avg: 28m 48s | Max: 28m 48s | Hits:  99%/1224  
      🟩 GraphCapture       Pass: 100%/1   | Total:  2h 06m | Avg:  2h 06m | Max:  2h 06m | Hits:  69%/1224  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 29m | Avg: 29m 42s | Max: 32m 06s | Hits:  99%/3672  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 32m | Avg: 30m 53s | Max: 33m 37s | Hits:  99%/3672  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 01m | Avg: 20m 31s | Max: 29m 13s | Hits:  99%/3672  
      🟩 90;90a             Pass: 100%/2   | Total: 26m 58s | Avg: 13m 29s | Max: 20m 43s | Hits:  99%/2269  
      🟩 100;120            Pass: 100%/2   | Total: 27m 51s | Avg: 13m 55s | Max: 21m 16s | Hits:  99%/2269  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  4h 03m | Avg: 12m 09s | Max:  1h 04m | Hits:  98%/23693 
      🟩 20                 Pass: 100%/28  | Total:  8h 41m | Avg: 18m 37s | Max:  2h 06m | Hits:  98%/33475 
    
  • 🟩 thrust: Pass: 100%/48 | Total: 7h 59m | Avg: 9m 59s | Max: 29m 09s | Hits: 99%/85612

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 20m 30s | Avg: 10m 15s | Max: 13m 26s | Hits:  99%/3570  
    🟩 cpu
      🟩 amd64              Pass: 100%/46  | Total:  7h 48m | Avg: 10m 11s | Max: 29m 09s | Hits:  99%/82043 
      🟩 arm64              Pass: 100%/2   | Total: 10m 26s | Avg:  5m 13s | Max:  5m 42s | Hits:  99%/3569  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 44m 45s | Avg:  8m 57s | Max: 23m 33s | Hits:  99%/8916  
      🟩 12.6               Pass: 100%/2   | Total: 32m 49s | Avg: 16m 24s | Max: 16m 27s | Hits:  99%/3568  
      🟩 12.8               Pass: 100%/41  | Total:  6h 41m | Avg:  9m 47s | Max: 29m 09s | Hits:  99%/73128 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 47s | Hits: 100%/3568  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 44m 45s | Avg:  8m 57s | Max: 23m 33s | Hits:  99%/8916  
      🟩 nvcc12.6           Pass: 100%/2   | Total: 32m 49s | Avg: 16m 24s | Max: 16m 27s | Hits:  99%/3568  
      🟩 nvcc12.8           Pass: 100%/39  | Total:  6h 30m | Avg: 10m 00s | Max: 29m 09s | Hits:  99%/69560 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 47s | Hits: 100%/3568  
      🟩 nvcc               Pass: 100%/46  | Total:  7h 48m | Avg: 10m 10s | Max: 29m 09s | Hits:  99%/82044 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 24s | Avg:  5m 21s | Max:  5m 44s | Hits: 100%/7136  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 09s | Hits: 100%/3568  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 21s | Avg:  6m 10s | Max:  6m 13s | Hits: 100%/3568  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 27s | Avg:  5m 43s | Max:  5m 47s | Hits: 100%/3568  
      🟩 Clang18            Pass: 100%/7   | Total: 46m 44s | Avg:  6m 40s | Max: 11m 27s | Hits: 100%/12488 
      🟩 GCC7               Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  5m 48s | Hits:  99%/3570  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s | Hits:  99%/1785  
      🟩 GCC9               Pass: 100%/2   | Total: 11m 38s | Avg:  5m 49s | Max:  5m 59s | Hits:  99%/3570  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 56s | Avg:  5m 58s | Max:  6m 03s | Hits:  99%/3570  
      🟩 GCC11              Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 29s | Hits:  99%/3570  
      🟩 GCC12              Pass: 100%/2   | Total: 12m 48s | Avg:  6m 24s | Max:  6m 34s | Hits:  99%/3570  
      🟩 GCC13              Pass: 100%/11  | Total:  1h 45m | Avg:  9m 36s | Max: 18m 07s | Hits:  99%/19635 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 46m 14s | Avg: 23m 07s | Max: 23m 33s | Hits:  99%/3556  
      🟩 MSVC14.42          Pass: 100%/5   | Total:  2h 04m | Avg: 24m 53s | Max: 29m 09s | Hits:  99%/8890  
      🟩 NVHPC25.1          Pass: 100%/2   | Total: 32m 49s | Avg: 16m 24s | Max: 16m 27s | Hits:  99%/3568  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 43m | Avg:  6m 06s | Max: 11m 27s | Hits: 100%/30328 
      🟩 GCC                Pass: 100%/22  | Total:  2h 51m | Avg:  7m 48s | Max: 18m 07s | Hits:  99%/39270 
      🟩 MSVC               Pass: 100%/7   | Total:  2h 50m | Avg: 24m 22s | Max: 29m 09s | Hits:  99%/12446 
      🟩 NVHPC              Pass: 100%/2   | Total: 32m 49s | Avg: 16m 24s | Max: 16m 27s | Hits:  99%/3568  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 20m 26s | Avg: 10m 13s | Max: 14m 46s | Hits:  99%/3570  
      🟩 rtx2080            Pass: 100%/36  | Total:  5h 29m | Avg:  9m 09s | Max: 24m 03s | Hits:  99%/64209 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 08m | Avg: 12m 53s | Max: 29m 09s | Hits:  99%/17833 
    🟩 jobs
      🟩 Build              Pass: 100%/41  | Total:  6h 21m | Avg:  9m 17s | Max: 26m 14s | Hits:  99%/73126 
      🟩 TestCPU            Pass: 100%/3   | Total: 45m 26s | Avg: 15m 08s | Max: 29m 09s | Hits:  99%/5347  
      🟩 TestGPU            Pass: 100%/4   | Total: 52m 42s | Avg: 13m 10s | Max: 14m 46s | Hits:  99%/7139  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 20m 26s | Avg: 10m 13s | Max: 14m 46s | Hits:  99%/3570  
      🟩 90;90a             Pass: 100%/2   | Total: 28m 10s | Avg: 14m 05s | Max: 21m 57s | Hits:  99%/3563  
      🟩 100;120            Pass: 100%/2   | Total: 30m 30s | Avg: 15m 15s | Max: 24m 03s | Hits:  99%/3563  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 11m | Avg:  9m 34s | Max: 23m 33s | Hits:  99%/35671 
      🟩 20                 Pass: 100%/26  | Total:  4h 27m | Avg: 10m 16s | Max: 29m 09s | Hits:  99%/46371 
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 19m 24s | Avg: 4m 51s | Max: 5m 51s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 11s | Avg:  5m 35s | Max:  5m 51s
      🟩 arm64              Pass: 100%/2   | Total:  8m 13s | Avg:  4m 06s | Max:  4m 12s
    🟩 ctk
      🟩 12.6               Pass: 100%/4   | Total: 19m 24s | Avg:  4m 51s | Max:  5m 51s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/4   | Total: 19m 24s | Avg:  4m 51s | Max:  5m 51s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 19m 24s | Avg:  4m 51s | Max:  5m 51s
    🟩 cxx
      🟩 NVHPC25.1          Pass: 100%/4   | Total: 19m 24s | Avg:  4m 51s | Max:  5m 51s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 19m 24s | Avg:  4m 51s | Max:  5m 51s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 19m 24s | Avg:  4m 51s | Max:  5m 51s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 19m 24s | Avg:  4m 51s | Max:  5m 51s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  9m 21s | Avg:  4m 40s | Max:  5m 20s
      🟩 20                 Pass: 100%/2   | Total: 10m 03s | Avg:  5m 01s | Max:  5m 51s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 22m 25s | Avg: 11m 12s | Max: 20m 02s | Hits: 98%/320

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 22m 25s | Avg: 11m 12s | Max: 20m 02s | Hits:  98%/320   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 22m 25s | Avg: 11m 12s | Max: 20m 02s | Hits:  98%/320   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 22m 25s | Avg: 11m 12s | Max: 20m 02s | Hits:  98%/320   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 22m 25s | Avg: 11m 12s | Max: 20m 02s | Hits:  98%/320   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 22m 25s | Avg: 11m 12s | Max: 20m 02s | Hits:  98%/320   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 22m 25s | Avg: 11m 12s | Max: 20m 02s | Hits:  98%/320   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 22m 25s | Avg: 11m 12s | Max: 20m 02s | Hits:  98%/320   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 23s | Avg:  2m 23s | Max:  2m 23s | Hits:  98%/160   
      🟩 Test               Pass: 100%/1   | Total: 20m 02s | Avg: 20m 02s | Max: 20m 02s | Hits:  98%/160   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 15m | Avg: 1h 15m | Max: 1h 15m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 15m | Avg:  1h 15m | Max:  1h 15m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 15m | Avg:  1h 15m | Max:  1h 15m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 15m | Avg:  1h 15m | Max:  1h 15m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 15m | Avg:  1h 15m | Max:  1h 15m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 15m | Avg:  1h 15m | Max:  1h 15m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 15m | Avg:  1h 15m | Max:  1h 15m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 15m | Avg:  1h 15m | Max:  1h 15m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 15m | Avg:  1h 15m | Max:  1h 15m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 103)

# Runner
70 linux-amd64-cpu16
13 windows-amd64-cpu16
6 linux-arm64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@bernhardmgruber bernhardmgruber merged commit a7a0fd5 into NVIDIA:branch/3.0.x Aug 8, 2025
117 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Aug 8, 2025
@bernhardmgruber bernhardmgruber deleted the backport_pdl_fix_30 branch August 9, 2025 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants