-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[None][fix] Cherry-pick 6850: Complete the last missing allreduce op in Llama3/4. #7420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e5f1825 to
570930f
Compare
|
/bot run --disable-fail-fast |
|
Caution Review failedFailed to post review comments. Configuration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (4)
🧰 Additional context used📓 Path-based instructions (2)**/*.{cpp,cc,cxx,cu,h,hpp,hh,hxx,cuh,py}📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
**/*.py📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
🧠 Learnings (7)📚 Learning: 2025-08-21T00:16:56.457ZApplied to files:
📚 Learning: 2025-08-27T14:23:55.566ZApplied to files:
📚 Learning: 2025-08-06T13:58:07.506ZApplied to files:
📚 Learning: 2025-08-01T15:14:45.673ZApplied to files:
📚 Learning: 2025-08-11T20:09:24.389ZApplied to files:
📚 Learning: 2025-07-28T17:06:08.621ZApplied to files:
📚 Learning: 2025-08-21T21:48:35.135ZApplied to files:
🧬 Code graph analysis (1)tensorrt_llm/_torch/models/modeling_llama.py (4)
📝 WalkthroughWalkthroughVersion bumped to 1.1.0rc2.post1 across version.py, README badge, and examples constraints. In Llama4DecoderLayer.forward, modified post-MOE/post-MLP fusion handling: added pure all_reduce paths when no next_layer_layernorm; adjusted scale and fusion_op selection when next_layer_layernorm exists, preserving cutlass_min_latency_mode behavior. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Input as Hidden State
participant Layer as Llama4DecoderLayer
participant AR as TensorParallel AllReduce
participant LN as Next LayerNorm (optional)
Note over Layer: POST_MOE_FUSION / POST_MLP_FUSION branch
Input->>Layer: forward(...)
alt next_layer_layernorm is None
Layer->>AR: all_reduce(fusion_op=None, norm_weight=None, scale=None)
AR-->>Layer: reduced hidden
Layer-->>Input: return reduced hidden
else next_layer_layernorm exists
Note over Layer: Determine scale\n- if NVFP4/FP8 and next_attn: use qkv_proj.input_scale\n- else: scale=None
Layer->>AR: all_reduce(fusion_op=post_*_fusion_op, norm_weight=LN.weight, eps, scale)
AR-->>Layer: reduced+fused output
Layer->>LN: (implicit in fusion_op behavior)
LN-->>Layer: output (if applicable)
Layer-->>Input: return output
end
opt cutlass_min_latency_mode and MOE
Note over Layer: Use moe_allreduce instead of generic all_reduce
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
|
PR_Github #17160 [ run ] triggered by Bot |
|
PR_Github #17160 [ run ] completed with state |
570930f to
ae8e7eb
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #17308 [ run ] triggered by Bot |
|
PR_Github #17308 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #17468 [ run ] triggered by Bot |
|
PR_Github #17468 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #17584 [ run ] triggered by Bot |
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
ae8e7eb to
04279b0
Compare
|
/bot run |
|
/bot kill |
|
PR_Github #17607 [ kill ] triggered by Bot |
|
PR_Github #17584 [ run ] completed with state |
|
PR_Github #17607 [ kill ] completed with state |
|
/bot run |
|
PR_Github #17610 [ run ] triggered by Bot |
|
PR_Github #17610 [ run ] completed with state |
Cherry-pick commits in #6850.
Summary by CodeRabbit
Bug Fixes
Documentation
Chores