KEMBAR78
[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 by peaceh-nv · Pull Request #5636 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

Conversation

@peaceh-nv
Copy link
Collaborator

@peaceh-nv peaceh-nv commented Jul 1, 2025

Description

Avoid using void elementc to fix the illegal memory access issue in moe gemm on SM120 after cutlass upgrade and add related unit tests on SM120

@peaceh-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10450 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10450 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #7732 completed with status: 'FAILURE'

@peaceh-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10496 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10496 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7771 completed with status: 'FAILURE'

@pamelap-nvidia
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10530 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10530 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7794 completed with status: 'FAILURE'

@peaceh-nv peaceh-nv changed the title [fix] Downgrade cutlass to fix the illegal memory access issue in moe gemm on SM120 [fix] WAR to fix the illegal memory access issue in moe gemm on SM120 Jul 8, 2025
@peaceh-nv peaceh-nv requested a review from djns99 July 8, 2025 09:25
@peaceh-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11424 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11424 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8448 completed with status: 'FAILURE'

peaceh-nv added 3 commits July 9, 2025 06:34
… gemm on SM120

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
@peaceh-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11437 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11437 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8459 completed with status: 'SUCCESS'

@kaiyux kaiyux merged commit 76c3a12 into NVIDIA:main Jul 10, 2025
3 checks passed
zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Jul 15, 2025
…NVIDIA#5636)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Yuxin <yuxinz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants