KEMBAR78
fix: [https://nvbugspro.nvidia.com/bug/5349343] Fix mPtrExpertCounts allocation in MoE TRT-LLM backend (nvfp4) by ChristinaZ · Pull Request #5519 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

Conversation

@ChristinaZ
Copy link
Collaborator

Fix mPtrExpertCounts allocation in MoE TRT-LLM backend (nvfp4)

Description

  1. Fix mPtrExpertCounts allocation in MoE TRT-LLM backend (nvfp4)
    @zhhuang-nv find that: for DeepSeek Lite, the MoE TRT-LLM backend with nvfp4 reports the following error:
[TensorRT-LLM][ERROR] CUDA runtime error in cudaMemsetAsync( data.mPtrExpertCounts, 0, static_cast<size_t>(2 * NumThreads) * sizeof(int32_t), (cudaStream_t) stream): invalid argument (/home/scratch.zhhuang_sw/project/TensorRT-LLM/cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingKernel.cu:1254).

Through our analysis, we identified the root cause:
The data size used in cudaMemset ( cudaMemsetAsync( data.mPtrExpertCounts, 0, static_cast<size_t>(2 * NumThreads) * sizeof(int32_t)...) exceeds the actual allocated memory size (at::Tensor expert_count_histogram = at::detail::empty_cuda({((num_experts * 2 + 255) / 256) * 256},...).

So I tried to fix this bug by revising the code to

    int const size_of_expert_count_histogram = max(num_experts * 2, 256 * 2);
    at::Tensor expert_count_histogram = at::detail::empty_cuda({size_of_expert_count_histogram},
       ...
  1. Some routing kernels require a compute capability of 9.0 or higher, which the A30 does not support. I had to skip certain tests for now. In this PR, I revised the unit test to use a better approach for skipping the unit tests.

@ChristinaZ ChristinaZ force-pushed the fix_routing_expertCount_malloc branch 2 times, most recently from b0f5c2e to 702b664 Compare June 26, 2025 12:50
@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10036 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10036 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #7406 completed with status: 'FAILURE'

@ChristinaZ ChristinaZ force-pushed the fix_routing_expertCount_malloc branch from 702b664 to 991b9ed Compare June 26, 2025 14:43
@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10046 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10046 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #7414 completed with status: 'FAILURE'

@ChristinaZ ChristinaZ force-pushed the fix_routing_expertCount_malloc branch from 991b9ed to 44c6b42 Compare June 27, 2025 06:43
…ipping routing tests for unsupported GPU architectures

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
@ChristinaZ ChristinaZ force-pushed the fix_routing_expertCount_malloc branch from 44c6b42 to b67efcb Compare June 27, 2025 06:54
@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10120 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10120 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7469 completed with status: 'FAILURE'

@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10133 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10133 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7480 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@byshiue byshiue merged commit a608b00 into NVIDIA:main Jun 27, 2025
3 checks passed
@ChristinaZ ChristinaZ changed the title Fix mPtrExpertCounts allocation in MoE TRT-LLM backend (nvfp4) fix: [https://nvbugspro.nvidia.com/bug/5349343] Fix mPtrExpertCounts allocation in MoE TRT-LLM backend (nvfp4) Jun 30, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025
…A#5519)

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…A#5519)

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…A#5519)

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…A#5519)

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…A#5519)

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
…A#5519)

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
…A#5519)

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
…A#5519)

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants