KEMBAR78
[5321981] fix: Fix the Llama3.1 405B hanging issue. by hyukn · Pull Request #5698 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

Conversation

@hyukn
Copy link
Collaborator

@hyukn hyukn commented Jul 3, 2025

The output shapes of the fusedLayerNormplugin for nvFP4 are mismatched. This pollutes the barrier buffer for the one-shot allreduce kernel, which causes the hanging issue.
This can also be the root cause of the accuracy issue as @zihaok recently mentioned. Because other data buffers are also messed up due to the out-of-range memory write.

@hyukn hyukn requested review from liji-nv and zihaok July 3, 2025 06:25
@hyukn hyukn requested a review from a team as a code owner July 3, 2025 06:25
@hyukn hyukn force-pushed the fix/5321981 branch 2 times, most recently from 6de8fa2 to d694659 Compare July 3, 2025 06:39
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 3, 2025

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10770 [ run ] triggered by Bot

@hyukn hyukn requested a review from litaotju July 3, 2025 06:58
@hyukn hyukn changed the title [5321981] fix: Fix the Llama-405B hanging issue. [5321981] fix: Fix the Llama 3.1-405B hanging issue. Jul 3, 2025
@hyukn hyukn changed the title [5321981] fix: Fix the Llama 3.1-405B hanging issue. [5321981] fix: Fix the Llama3.1 405B hanging issue. Jul 3, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #10770 [ run ] completed with state SUCCESS
/LLM/release-0.21/L0_MergeRequest_PR pipeline #143 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Jul 3, 2025

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10827 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10827 [ run ] completed with state SUCCESS
/LLM/release-0.21/L0_MergeRequest_PR pipeline #148 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10887 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10887 [ run ] completed with state SUCCESS
/LLM/release-0.21/L0_MergeRequest_PR pipeline #157 completed with status: 'SUCCESS'

@hyukn hyukn merged commit b0354ef into NVIDIA:release/0.21 Jul 4, 2025
3 checks passed
dc3671 pushed a commit to dc3671/TensorRT-LLM that referenced this pull request Jul 10, 2025
Correct the output shape of the fusedLayerNormPlugin.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
dc3671 pushed a commit to dc3671/TensorRT-LLM that referenced this pull request Jul 10, 2025
Correct the output shape of the fusedLayerNormPlugin.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
nvzhihanj pushed a commit to nvzhihanj/TensorRT-LLM that referenced this pull request Jul 10, 2025
Correct the output shape of the fusedLayerNormPlugin.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
hyukn added a commit that referenced this pull request Jul 10, 2025
…#5698) (#5925)

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
nvzhihanj added a commit that referenced this pull request Jul 11, 2025
…#5698) (#5925)

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Jul 15, 2025
…NVIDIA#5698) (NVIDIA#5925)

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxin <yuxinz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants