[None][chore] Refine qwen3-next implementation. #8064

nv-guomingz · 2025-09-29T13:30:12Z

It's a PR which address the comments from @yuxianq on #7892

Summary by CodeRabbit

Bug Fixes
- Corrected attention behavior with output gating to ensure proper dimensions and RoPE application.
- Improved model variant detection in config parsing for more reliable behavior.
Performance
- Tightened conditions for applying fused RoPE, using it only when compatible with gating and norm settings for more predictable performance and stability.
Chores
- Reduced log noise by downgrading a warning to info when output gating is enabled.

coderabbitai · 2025-09-29T13:38:53Z

📝 Walkthrough

Walkthrough

Updates adjust attention module logging and QKV projection sizing when attn_output_gate is enabled, simplify RoPE application flow, refine RoPE fusion enablement in QK-Norm attention, and change a config utility to explicitly check attribute presence before comparison.

Changes

Cohort / File(s)	Summary
Attention module updates `tensorrt_llm/_torch/modules/attention.py`	Change log level to info when attn_output_gate is enabled; adjust qkv_proj input dimension when attn_output_gate is true; simplify apply_rope path by removing None-guards and unconditionally splitting QKV when RoPE isn’t fused.
QK-Norm RoPE fusion gating `tensorrt_llm/_torch/modules/qk_norm_attention.py`	Tighten rope_fusion condition: disable when attn_output_gate or use_gemma_rms_norm is true, or when skip_rope, or fuse_qk_norm_rope is false; update inline comments; no API changes.
Config attribute check `tensorrt_llm/_torch/pyexecutor/config_utils.py`	is_qwen3_next now uses hasattr to ensure linear_key_head_dim exists before verifying it is > 0, avoiding getattr defaulting.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Config
  participant A as Attention.__init__
  participant Q as QKNormRoPEAttention.__init__

  Note over A: Initialization decisions
  A->>A: if attn_output_gate\n set info log
  A->>A: compute qkv_proj size\n(adjust when gate=true)
  A->>A: apply_rope path simplified\n(unconditional split when not fused)

  Note over Q: RoPE fusion gating
  Q->>Q: rope_fusion = fuse_qk_norm_rope && !skip_rope && !attn_output_gate && !use_gemma_rms_norm

sequenceDiagram
  autonumber
  participant C as Config
  participant U as is_qwen3_next

  C->>U: query linear_key_head_dim
  alt attribute exists
    U->>U: return linear_key_head_dim > 0
  else attribute missing
    U->>U: return False
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The title “[None][chore] Refine qwen3-next implementation” focuses narrowly on the qwen3-next detection change but omits the broader updates to tensor-based attention gating, RoPE application, and fusion logic in multiple modules, so it does not accurately summarize the main changes.	Please revise the title to clearly reflect the key changes, for example: “[None][chore] Refine attention gating, RoPE fusion logic, and qwen3-next detection.”
Description Check	⚠️ Warning	The pull request description is a single informal sentence and does not follow the repository’s template; it lacks the required “## Description” section explaining what and why, the “## Test Coverage” section listing relevant tests, and the checklist items to confirm coding guidelines and documentation updates.	Please expand the description to include a “## Description” that details the problem and solution, a “## Test Coverage” section listing new or updated tests, and confirm the PR checklist items as outlined in the template.
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/modules/attention.py (1)

226-239: Critical: LoRA tensor size mismatch when attn_output_gate=True.

With gating on, qkv_proj emits 2*q_size + 2*kv_size features, but both splitted_qkv_lora and fused_qkv_lora remain initialized to q_size + 2*kv_size, leading to a runtime shape-mismatch. Apply:

-        self.splitted_qkv_lora = LoraLayer([
-            LoraModuleType.ATTENTION_Q, LoraModuleType.ATTENTION_K,
-            LoraModuleType.ATTENTION_V
-        ], [self.q_size, self.kv_size, self.kv_size])
-        self.fused_qkv_lora = LoraLayer([LoraModuleType.ATTENTION_QKV],
-                                        [self.q_size + 2 * self.kv_size])
+        q_out_size_for_lora = (2 * self.q_size) if self.attn_output_gate else self.q_size
+        self.splitted_qkv_lora = LoraLayer(
+            [LoraModuleType.ATTENTION_Q, LoraModuleType.ATTENTION_K, LoraModuleType.ATTENTION_V],
+            [q_out_size_for_lora, self.kv_size, self.kv_size],
+        )
+        self.fused_qkv_lora = LoraLayer(
+            [LoraModuleType.ATTENTION_QKV],
+            [q_out_size_for_lora + 2 * self.kv_size],
+        )

Also verify that LoraLayer supports these fused sizes (or pad the extra gated-Q slice) to avoid runtime errors.

🧹 Nitpick comments (1)

tensorrt_llm/_torch/pyexecutor/config_utils.py (1)
17-18: Make the check robust to non-numeric/None values to avoid TypeError.

If linear_key_head_dim exists but is None or non‑numeric, > 0 will raise. Recommend guarding the type before comparing.

Apply:
-def is_qwen3_next(config):
-    return hasattr(config,
-                   'linear_key_head_dim') and config.linear_key_head_dim > 0
+def is_qwen3_next(config):
+    val = getattr(config, 'linear_key_head_dim', None)
+    return isinstance(val, int) and val > 0

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38d6e4e and 550d815.

📒 Files selected for processing (3)

tensorrt_llm/_torch/modules/attention.py (3 hunks)
tensorrt_llm/_torch/modules/qk_norm_attention.py (1 hunks)
tensorrt_llm/_torch/pyexecutor/config_utils.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tensorrt_llm/_torch/modules/attention.py
tensorrt_llm/_torch/pyexecutor/config_utils.py
tensorrt_llm/_torch/modules/qk_norm_attention.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tensorrt_llm/_torch/modules/attention.py
tensorrt_llm/_torch/pyexecutor/config_utils.py
tensorrt_llm/_torch/modules/qk_norm_attention.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tensorrt_llm/_torch/modules/attention.py
tensorrt_llm/_torch/pyexecutor/config_utils.py
tensorrt_llm/_torch/modules/qk_norm_attention.py

🧬 Code graph analysis (2)

tensorrt_llm/_torch/modules/attention.py (2)

tensorrt_llm/logger.py (1)

info_once (141-142)

tensorrt_llm/_torch/distributed/communicator.py (1)

tp_size (53-54)

tensorrt_llm/_torch/pyexecutor/config_utils.py (1)

tensorrt_llm/_torch/models/modeling_utils.py (1)

config (519-520)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (3)

tensorrt_llm/_torch/modules/attention.py (2)

180-181: Log level downgrade to info_once is fine.

Reduces unnecessary warns while still surfacing one‑time signal.

584-588: Unconditional split_qkv in unfused RoPE path looks good; confirm no extra work in already-split cases.

split_qkv is no‑op when k/v are provided, so this simplifies flow without changing behavior.

Please sanity‑check a gated configuration with unfused RoPE to ensure sequences pass shapes through: enable attn_output_gate=True, set rope_fusion=False, and run a short forward to confirm no extra allocations/regressions.

tensorrt_llm/_torch/modules/qk_norm_attention.py (1)

169-172: Correctly disable fused RoPE when gating or Gemma RMSNorm are active.

The consolidated condition aligns fused‑path constraints with upstream kernels.

Double‑check that when attn_output_gate=True, Attention.apply_rope handles the unfused path with split‑then‑norm‑then‑RoPE as intended in this subclass.

tensorrt_llm/_torch/pyexecutor/config_utils.py

yuxianq · 2025-09-30T02:54:12Z

@nv-guomingz Please also address the following comments: #7892 (comment), #7892 (comment), #7892 (comment), #7892 (comment), thanks~

nv-guomingz · 2025-09-30T07:31:40Z

@nv-guomingz Please also address the following comments: #7892 (comment), #7892 (comment), #7892 (comment), #7892 (comment), thanks~

All resolved.

nv-guomingz · 2025-09-30T07:35:48Z

/bot run

tensorrt-cicd · 2025-09-30T07:45:30Z

PR_Github #20346 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-30T09:03:38Z

PR_Github #20346 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15350 completed with status: 'FAILURE'

nv-guomingz · 2025-09-30T11:30:16Z

/bot run

tensorrt-cicd · 2025-09-30T11:36:08Z

PR_Github #20370 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-30T13:47:11Z

PR_Github #20370 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15366 completed with status: 'FAILURE'

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

nv-guomingz · 2025-09-30T15:44:07Z

/bot run

tensorrt-cicd · 2025-09-30T15:52:46Z

PR_Github #20395 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-30T19:04:57Z

PR_Github #20395 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15389 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Faradawn Yang <faradawny@gmail.com>

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Faradawn Yang <faradawny@gmail.com>

nv-guomingz requested review from a team as code owners September 29, 2025 13:30

nv-guomingz requested review from byshiue and hlu1 September 29, 2025 13:30

nv-guomingz assigned yuxianq Sep 29, 2025

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

byshiue approved these changes Sep 30, 2025

View reviewed changes

yuxianq reviewed Sep 30, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/config_utils.py Outdated Show resolved Hide resolved

nv-guomingz force-pushed the user/guomingz/refine_qwen3_next_attn branch from 550d815 to a62e5bf Compare September 30, 2025 07:01

nv-guomingz requested a review from a team as a code owner September 30, 2025 07:01

nv-guomingz requested a review from symphonylyh September 30, 2025 07:01

nv-guomingz force-pushed the user/guomingz/refine_qwen3_next_attn branch from a62e5bf to 939ecdd Compare September 30, 2025 07:31

nv-guomingz force-pushed the user/guomingz/refine_qwen3_next_attn branch from 939ecdd to 8b06ade Compare September 30, 2025 07:35

yuxianq approved these changes Sep 30, 2025

View reviewed changes

nv-guomingz requested a review from syuoni September 30, 2025 07:42

syuoni approved these changes Sep 30, 2025

View reviewed changes

nv-guomingz enabled auto-merge (squash) September 30, 2025 12:59

[None][chore] Refine qwen3-next implementation.

429bda9

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

nv-guomingz force-pushed the user/guomingz/refine_qwen3_next_attn branch from 8b06ade to 429bda9 Compare September 30, 2025 15:43

nv-guomingz merged commit b4be0d2 into NVIDIA:main Sep 30, 2025
5 checks passed

faradawn pushed a commit to faradawn/TensorRT-LLM that referenced this pull request Oct 2, 2025

[None][chore] Refine qwen3-next implementation. (NVIDIA#8064)

20b247b

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Faradawn Yang <faradawny@gmail.com>

evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request Oct 3, 2025

[None][chore] Refine qwen3-next implementation. (NVIDIA#8064)

8943e7c

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

faradawn pushed a commit to faradawn/TensorRT-LLM that referenced this pull request Oct 3, 2025

[None][chore] Refine qwen3-next implementation. (NVIDIA#8064)

b0350c5

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Faradawn Yang <faradawny@gmail.com>

[None][chore] Refine qwen3-next implementation. #8064

[None][chore] Refine qwen3-next implementation. #8064

Uh oh!

Conversation

nv-guomingz commented Sep 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 29, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yuxianq commented Sep 30, 2025

Uh oh!

nv-guomingz commented Sep 30, 2025

Uh oh!

nv-guomingz commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

nv-guomingz commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

nv-guomingz commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nv-guomingz commented Sep 29, 2025 •

edited by coderabbitai bot

Loading