-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[None][chore] Refine qwen3-next implementation. #8064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[None][chore] Refine qwen3-next implementation. #8064
Conversation
📝 WalkthroughWalkthroughUpdates adjust attention module logging and QKV projection sizing when attn_output_gate is enabled, simplify RoPE application flow, refine RoPE fusion enablement in QK-Norm attention, and change a config utility to explicitly check attribute presence before comparison. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Config
participant A as Attention.__init__
participant Q as QKNormRoPEAttention.__init__
Note over A: Initialization decisions
A->>A: if attn_output_gate\n set info log
A->>A: compute qkv_proj size\n(adjust when gate=true)
A->>A: apply_rope path simplified\n(unconditional split when not fused)
Note over Q: RoPE fusion gating
Q->>Q: rope_fusion = fuse_qk_norm_rope && !skip_rope && !attn_output_gate && !use_gemma_rms_norm
sequenceDiagram
autonumber
participant C as Config
participant U as is_qwen3_next
C->>U: query linear_key_head_dim
alt attribute exists
U->>U: return linear_key_head_dim > 0
else attribute missing
U->>U: return False
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (3 warnings)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/modules/attention.py (1)
226-239: Critical: LoRA tensor size mismatch when attn_output_gate=True.With gating on,
qkv_projemits2*q_size + 2*kv_sizefeatures, but bothsplitted_qkv_loraandfused_qkv_loraremain initialized toq_size + 2*kv_size, leading to a runtime shape-mismatch. Apply:- self.splitted_qkv_lora = LoraLayer([ - LoraModuleType.ATTENTION_Q, LoraModuleType.ATTENTION_K, - LoraModuleType.ATTENTION_V - ], [self.q_size, self.kv_size, self.kv_size]) - self.fused_qkv_lora = LoraLayer([LoraModuleType.ATTENTION_QKV], - [self.q_size + 2 * self.kv_size]) + q_out_size_for_lora = (2 * self.q_size) if self.attn_output_gate else self.q_size + self.splitted_qkv_lora = LoraLayer( + [LoraModuleType.ATTENTION_Q, LoraModuleType.ATTENTION_K, LoraModuleType.ATTENTION_V], + [q_out_size_for_lora, self.kv_size, self.kv_size], + ) + self.fused_qkv_lora = LoraLayer( + [LoraModuleType.ATTENTION_QKV], + [q_out_size_for_lora + 2 * self.kv_size], + )Also verify that
LoraLayersupports these fused sizes (or pad the extra gated-Q slice) to avoid runtime errors.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/config_utils.py (1)
17-18: Make the check robust to non-numeric/None values to avoid TypeError.If
linear_key_head_dimexists but isNoneor non‑numeric,> 0will raise. Recommend guarding the type before comparing.Apply:
-def is_qwen3_next(config): - return hasattr(config, - 'linear_key_head_dim') and config.linear_key_head_dim > 0 +def is_qwen3_next(config): + val = getattr(config, 'linear_key_head_dim', None) + return isinstance(val, int) and val > 0
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tensorrt_llm/_torch/modules/attention.py(3 hunks)tensorrt_llm/_torch/modules/qk_norm_attention.py(1 hunks)tensorrt_llm/_torch/pyexecutor/config_utils.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tensorrt_llm/_torch/modules/attention.pytensorrt_llm/_torch/pyexecutor/config_utils.pytensorrt_llm/_torch/modules/qk_norm_attention.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tensorrt_llm/_torch/modules/attention.pytensorrt_llm/_torch/pyexecutor/config_utils.pytensorrt_llm/_torch/modules/qk_norm_attention.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tensorrt_llm/_torch/modules/attention.pytensorrt_llm/_torch/pyexecutor/config_utils.pytensorrt_llm/_torch/modules/qk_norm_attention.py
🧬 Code graph analysis (2)
tensorrt_llm/_torch/modules/attention.py (2)
tensorrt_llm/logger.py (1)
info_once(141-142)tensorrt_llm/_torch/distributed/communicator.py (1)
tp_size(53-54)
tensorrt_llm/_torch/pyexecutor/config_utils.py (1)
tensorrt_llm/_torch/models/modeling_utils.py (1)
config(519-520)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (3)
tensorrt_llm/_torch/modules/attention.py (2)
180-181: Log level downgrade to info_once is fine.Reduces unnecessary warns while still surfacing one‑time signal.
584-588: Unconditional split_qkv in unfused RoPE path looks good; confirm no extra work in already-split cases.
split_qkvis no‑op whenk/vare provided, so this simplifies flow without changing behavior.Please sanity‑check a gated configuration with unfused RoPE to ensure sequences pass shapes through: enable
attn_output_gate=True, setrope_fusion=False, and run a short forward to confirm no extra allocations/regressions.tensorrt_llm/_torch/modules/qk_norm_attention.py (1)
169-172: Correctly disable fused RoPE when gating or Gemma RMSNorm are active.The consolidated condition aligns fused‑path constraints with upstream kernels.
Double‑check that when
attn_output_gate=True,Attention.apply_ropehandles the unfused path with split‑then‑norm‑then‑RoPE as intended in this subclass.
|
@nv-guomingz Please also address the following comments: #7892 (comment), #7892 (comment), #7892 (comment), #7892 (comment), thanks~ |
550d815 to
a62e5bf
Compare
a62e5bf to
939ecdd
Compare
All resolved. |
939ecdd to
8b06ade
Compare
|
/bot run |
|
PR_Github #20346 [ run ] triggered by Bot |
|
PR_Github #20346 [ run ] completed with state |
|
/bot run |
|
PR_Github #20370 [ run ] triggered by Bot |
|
PR_Github #20370 [ run ] completed with state |
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
8b06ade to
429bda9
Compare
|
/bot run |
|
PR_Github #20395 [ run ] triggered by Bot |
|
PR_Github #20395 [ run ] completed with state |
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Faradawn Yang <faradawny@gmail.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Faradawn Yang <faradawny@gmail.com>
It's a PR which address the comments from @yuxianq on #7892
Summary by CodeRabbit