[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. #6924

hyukn · 2025-08-15T03:11:27Z

Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process.

Summary by CodeRabbit

New Features
- Added an optional input pre-processing hook to modify inputs before autotuning.
- Added a configurable token cap to limit tokens used during tuning.
- Expanded tactic selection to accept richer tactic configuration objects for profiling.
Tests
- Updated tests to cover input pre-processing, token caps, and the enhanced tactic configuration and profiling flow.

coderabbitai · 2025-08-15T03:11:34Z

📝 Walkthrough

Walkthrough

Adds an optional inputs_pre_hook to TuningConfig and invokes it at the start of AutoTuner._profile_runners to preprocess input tensors before profiling. Unit tests updated to use a new test runner with tune_max_num_tokens, an inputs_pre_hook, and dict-based tactic handling.

Changes

Cohort / File(s)	Summary
Autotuner core `tensorrt_llm/_torch/autotuner.py`	Added `TuningConfig.inputs_pre_hook: Callable = None`. `AutoTuner._profile_runners` now calls `inputs_pre_hook(input_tensors)` if provided and replaces `input_tensors` with the result before profiling. Docstring for the hook updated to describe preprocessing behavior.
Unit tests for autotuner `tests/unittest/_torch/test_autotuner.py`	Renamed test runner `GemmRunnerWithTacticConfigs` → `GemmRunnerComplexTuningConfigs`; added `tune_max_num_tokens = 32` and a static `inputs_pre_hook` that mutates inputs. `forward` signature changed to `forward(self, /, inputs: List[torch.Tensor], *, tactic: dict = {})`. `get_valid_tactics` now returns `List[Dict[str,int]]` and includes runtime checks against `tune_max_num_tokens` and input content. Test updated to construct `TuningConfig` with `tune_max_num_tokens` and `inputs_pre_hook`, and test name changed to `test_autotuner_tuning_configs`.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Test Runner
  participant AutoTuner as AutoTuner
  participant Hook as inputs_pre_hook
  participant Runner as GEMM Runner

  Test->>AutoTuner: start autotune(config with inputs_pre_hook)
  AutoTuner->>Hook: inputs_pre_hook(input_tensors)
  Hook-->>AutoTuner: modified_input_tensors
  AutoTuner->>Runner: profile(modified_input_tensors, tactic=dict)
  Runner-->>AutoTuner: profiling results
  AutoTuner-->>Test: return best tactic

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[TRTLLM-4501][feat] AutoTuner tuning config refactor and valid tactic generalization. #6545: Modifies tensorrt_llm/_torch/autotuner.py and profiling/tuning flow; closely related to adding preprocessing/hooks in profiling.

Suggested reviewers

lfr-0531
suyoggupta
limin2021
liji-nv

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (7)

tensorrt_llm/_torch/autotuner.py (4)

89-91: Fix inputs_pre_hook docs to match implementation and wrap lines (ruff E501).

The docstring currently says the hook takes (profile, inputs) and is called “before the input tensors are prepared,” but the implementation calls it after _prepare_input_tensors and passes only inputs. Please align the docs with the actual behavior and keep lines ≤120 chars.

Apply:

-        inputs_pre_hook (Callable): A function that takes a profile and a list of input tensors, and returns a list of input tensors.
-            The function is called before the input tensors are prepared for the tuning process to match the real data distribution.
+        inputs_pre_hook (Callable | None): Optional callable invoked after synthetic
+            input tensors are prepared for tuning, but before profiling. It should accept
+            either (inputs) or (inputs, profile) and return a list of input tensors with
+            the same shapes, dtypes, and devices.

95-95: Type inputs_pre_hook precisely and make it Optional.

Using a bare Callable loses intent and None is not captured. Recommend Optional with a variadic signature to support both hook forms.

-    inputs_pre_hook: Callable = None
+    inputs_pre_hook: Optional[Callable[..., List[torch.Tensor]]] = None

Additionally, add the missing import at the top of the file:

from typing import Optional

453-456: Support both hook signatures and validate outputs to avoid hard-to-debug failures.

Right now we always call hook(inputs) and don’t validate the result. Add support for both (inputs) and (inputs, profile) forms and verify that shapes/dtypes/devices remain unchanged.

-        # If the inputs_pre_hook is provided, it will be called before profiling.
-        if tuning_config.inputs_pre_hook is not None:
-            input_tensors = tuning_config.inputs_pre_hook(input_tensors)
+        # If the inputs_pre_hook is provided, it will be called before profiling.
+        # Support both signatures: hook(inputs) and hook(inputs, profile).
+        if tuning_config.inputs_pre_hook is not None:
+            hook = tuning_config.inputs_pre_hook
+            try:
+                sig = inspect.signature(hook)
+                if len(sig.parameters) >= 2:
+                    new_inputs = hook(input_tensors, profile)
+                else:
+                    new_inputs = hook(input_tensors)
+            except Exception as e:
+                logger.debug(f"[Autotuner] inputs_pre_hook invocation failed: {e}")
+                raise
+            # Validate hook output
+            if not isinstance(new_inputs, list):
+                raise TypeError("inputs_pre_hook must return a List[torch.Tensor]")
+            if len(new_inputs) != len(input_tensors):
+                raise ValueError("inputs_pre_hook must return the same number of tensors")
+            for old, new in zip(input_tensors, new_inputs):
+                if not isinstance(new, torch.Tensor):
+                    raise TypeError("inputs_pre_hook must return only torch.Tensor elements")
+                if old.size() != new.size() or old.dtype != new.dtype or old.device != new.device:
+                    raise ValueError(
+                        "inputs_pre_hook must preserve shapes, dtype and device for all inputs"
+                    )
+            input_tensors = new_inputs

1-1: Missing NVIDIA copyright header.

Per coding guidelines, prepend the current-year NVIDIA copyright header.

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

tests/unittest/_torch/test_autotuner.py (3)

341-349: Avoid mutable default for tactic dict.

Using {} as a default is a Python footgun. Use None and normalize inside the function; keep -1 behavior intact.

-        tactic: dict = {},
+        tactic: Dict[str, int] = None,
     ) -> torch.Tensor:
         # Notice that in fallback case tactic is -1
-        if tactic == -1:
+        if tactic in (-1, None):
             # assign default configs for fallback case
             block_size, tactic_id = 128, -1
         else:
             block_size, tactic_id = tactic["block_size"], tactic["tactic_id"]

352-359: Clarify pre-hook and avoid implicit dtype cast.

Minor cleanup: add return type, and set the sentinel value with the correct dtype/device explicitly to avoid implicit int→float cast.

     @staticmethod
-    def inputs_pre_hook(inputs: List[torch.Tensor]):
-        # always set the first element to bo iota in x
+    def inputs_pre_hook(inputs: List[torch.Tensor]) -> List[torch.Tensor]:
+        # Set the last row/first column of x to the current token count
         x, w = inputs
         x_hooked = torch.zeros_like(x)
-        x_hooked[-1, 0] = x.shape[0]
+        x_hooked[-1, 0] = torch.tensor(x.shape[0], dtype=x.dtype, device=x.device)
         return [x_hooked, w]

1-1: Missing NVIDIA copyright header.

Per coding guidelines, prepend the current-year NVIDIA copyright header.

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9a133e9 and faa2424.

📒 Files selected for processing (2)

tensorrt_llm/_torch/autotuner.py (2 hunks)
tests/unittest/_torch/test_autotuner.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tensorrt_llm/_torch/autotuner.py
tests/unittest/_torch/test_autotuner.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tensorrt_llm/_torch/autotuner.py
tests/unittest/_torch/test_autotuner.py

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/autotuner.py

89-89: Line too long (133 > 120)

(E501)

90-90: Line too long (132 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

tests/unittest/_torch/test_autotuner.py (1)

361-375: LGTM on wiring the new tuning knobs in tests.

Good end-to-end coverage of dynamic buckets, tune_max_num_tokens, and the pre-hook. Once the tune-max enforcement is fixed in autotuner.py, this scenario should be robust.

tensorrt_llm/_torch/autotuner.py

tests/unittest/_torch/test_autotuner.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

tensorrt_llm/_torch/autotuner.py (1)
89-90: Docstring/API mismatch and >120-char lines (Ruff E501) — clarify signature and wrap lines

Docstring states the hook takes (profile, inputs), but implementation currently passes only inputs. Also both lines exceed 120 chars.

Apply this diff to (a) document the intended signature clearly and (b) fix line length:
-        inputs_pre_hook (Callable): A function that takes a profile and a list of input tensors, and returns a list of input tensors.
-            The function is called before the input tensors are prepared for the tuning process to match the real data distribution.
+        inputs_pre_hook (Callable): A function that takes (profile, inputs) and returns a new list
+            of input tensors. It is invoked before preparing the tuning tensors to better match
+            real data distributions.
tests/unittest/_torch/test_autotuner.py (1)
323-329: This assertion will fail until _optimization_profiles is gated by tune_max_num_tokens (duplicate of prior review)

_optimization_profiles currently unconditionally adds the mapped current input size to opt_shapes, which introduces 64 when tune_max_num_tokens=32. That will make inputs[0].shape[0] == 64 for some profiles, tripping this assertion.

Proposed fix in tensorrt_llm/_torch/autotuner.py (context ~Lines 605–611), gate the addition:
# Add the current input value as one of the opt values, but don't exceed the cap.
opt_shapes = set(opt_shapes)
mapped_val = spec.map_to_tuning_buckets(
    base_profile.shapes[spec.input_idx][spec.dim_idx].val
)
if tuning_config.tune_max_num_tokens is None or mapped_val <= tuning_config.tune_max_num_tokens:
    opt_shapes.add(mapped_val)
opt_shapes = sorted(list(opt_shapes))
Once applied, this test assertion should pass consistently.

🧹 Nitpick comments (4)

tensorrt_llm/_torch/autotuner.py (1)
95-95: Type the new field as optional and narrow the callable’s contract

Typing this as Optional and clarifying the callable’s shape improves readability and tooling. If you keep supporting only (inputs)->inputs for now, please reflect that in types.

Suggested change (requires adding Optional to typing imports at Line 8):
-    inputs_pre_hook: Callable = None
+    # Accepts either (profile, inputs) -> inputs or (inputs) -> inputs
+    inputs_pre_hook: Optional[Callable] = None
Optionally, narrow further to Callable[..., List[torch.Tensor]] to reflect return type.
tests/unittest/_torch/test_autotuner.py (3)
341-349: Avoid mutable default argument for tactic; accept None and handle fallback cleanly

tactic: dict = {} uses a mutable default and can lead to subtle bugs. Use None and treat None like fallback (-1). Optionally widen the type to allow either -1 or a dict.

Apply this diff (add Optional to imports at the top):
-    def forward(
+    def forward(
         self,
         /,
         inputs: List[torch.Tensor],
         *,
-        tactic: dict = {},
+        tactic: dict = None,
     ) -> torch.Tensor:
-        # Notice that in fallback case tactic is -1
-        if tactic == -1:
+        # Notice that in fallback case tactic is -1 or tactic is None
+        if tactic in (-1, None):
             # assign default configs for fallback case
             block_size, tactic_id = 128, -1
         else:
             block_size, tactic_id = tactic["block_size"], tactic["tactic_id"]
If you prefer stricter typing:

from typing import Optional, Union

tactic: Union[int, Dict[str, int]] = -1

if tactic == -1: ...

352-359: Align test pre-hook signature with documented API (profile, inputs)

To match the TuningConfig doc and the suggested change in autotuner, accept profile as the first arg. This keeps tests representative of intended usage.

Apply this diff:
     @staticmethod
-    def inputs_pre_hook(inputs: List[torch.Tensor]):
-        # always set the first element to bo iota in x
-        x, w = inputs
+    def inputs_pre_hook(profile: OptimizationProfile, inputs: List[torch.Tensor]):
+        # always set the first element to an identifying value in x
+        x, w = inputs
         x_hooked = torch.zeros_like(x)
         x_hooked[-1, 0] = x.shape[0]
         return [x_hooked, w]
354-355: Clarify comment wording (“bo iota”)

“bo iota” is unclear and looks like a typo. Consider clarifying the intent (e.g., “set a sentinel value based on sequence length”).

Suggested:

“Set a sentinel value in x based on the sequence length to verify the pre-hook ran.”

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between faa2424 and f335dd0.

📒 Files selected for processing (2)

tensorrt_llm/_torch/autotuner.py (2 hunks)
tests/unittest/_torch/test_autotuner.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tests/unittest/_torch/test_autotuner.py
tensorrt_llm/_torch/autotuner.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tests/unittest/_torch/test_autotuner.py
tensorrt_llm/_torch/autotuner.py

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/autotuner.py

89-89: Line too long (133 > 120)

(E501)

90-90: Line too long (132 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

tensorrt_llm/_torch/autotuner.py

hyukn · 2025-08-15T12:55:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-15T13:01:01Z

PR_Github #15443 [ run ] triggered by Bot

hyukn · 2025-08-15T13:27:06Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-15T13:32:42Z

PR_Github #15446 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-15T13:32:43Z

PR_Github #15443 [ run ] completed with state ABORTED

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

tensorrt_llm/_torch/autotuner.py (1)
453-456: Pass profile into inputs_pre_hook (per API/intent) and validate return; optionally support both signatures

Currently the hook is called with only inputs, diverging from the documented API and limiting usefulness. Pass the profile too and validate the return value. For backward compatibility you can fall back to the 1-arg call.

Apply this diff:
-        # If the inputs_pre_hook is provided, it will be called before profiling.
-        if tuning_config.inputs_pre_hook is not None:
-            input_tensors = tuning_config.inputs_pre_hook(input_tensors)
+        # If provided, preprocess inputs before profiling to better match real data distributions.
+        if tuning_config.inputs_pre_hook is not None:
+            hook = tuning_config.inputs_pre_hook
+            # Prefer the documented signature (profile, inputs); fall back to (inputs) for compat.
+            try:
+                input_tensors = hook(profile, input_tensors)
+            except TypeError:
+                input_tensors = hook(input_tensors)
+            # Basic validation: ensure we got back a list with the same arity.
+            if not isinstance(input_tensors, list) or len(input_tensors) != len(profile.shapes):
+                raise ValueError(
+                    "inputs_pre_hook must return a list of tensors with the same arity as inputs"
+                )
tests/unittest/_torch/test_autotuner.py (1)
322-329: Fix: don't add base (current-input) opt value when it exceeds tuning_config.tune_max_num_tokens

The autotuner unconditionally adds the mapped current input size into opt_shapes (tensorrt_llm/_torch/autotuner.py ~lines 605–610). When tuning_config.tune_max_num_tokens is set (e.g. 32) that can introduce opt values > tune_max_num_tokens (e.g. 64) and cause the test assertion in tests/unittest/_torch/test_autotuner.py (lines ~322–329) to fail.

The assertion should pass after applying the suggested fix in autotuner.py to gate the addition of the base shape by tune_max_num_tokens.
#!/bin/bash
# Description: Verify that _optimization_profiles properly respects tune_max_num_tokens
# Expected: The mapped current input size should only be added if it's <= tune_max_num_tokens

# Search for the problematic code in _optimization_profiles
rg -n -A 5 -B 5 'opt_shapes\.add' tensorrt_llm/_torch/autotuner.py

🧹 Nitpick comments (1)

tensorrt_llm/_torch/autotuner.py (1)

89-90: Fix line length violations.

Lines 89-90 exceed the 120-character limit. Consider breaking these lines appropriately.

Apply this diff to fix the line length issues:

-        inputs_pre_hook (Callable): A function that takes a list of input tensors, returns a list of modified input tensors.
-            It is called before the input tensors are prepared for the tuning process to match the real data distribution.
+        inputs_pre_hook (Callable): A function that takes a list of input tensors, 
+            returns a list of modified input tensors. It is called before the input
+            tensors are prepared for the tuning process to match the real data distribution.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f335dd0 and ed46d85.

📒 Files selected for processing (2)

tensorrt_llm/_torch/autotuner.py (2 hunks)
tests/unittest/_torch/test_autotuner.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tests/unittest/_torch/test_autotuner.py
tensorrt_llm/_torch/autotuner.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tests/unittest/_torch/test_autotuner.py
tensorrt_llm/_torch/autotuner.py

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/autotuner.py

89-89: Line too long (124 > 120)

(E501)

90-90: Line too long (122 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (5)

tensorrt_llm/_torch/autotuner.py (1)

95-95: LGTM!

The addition of the inputs_pre_hook field to the TuningConfig dataclass is well-integrated and follows the existing pattern.

tests/unittest/_torch/test_autotuner.py (4)

313-315: LGTM!

The class rename and addition of tune_max_num_tokens attribute provide clearer semantics for the complex tuning configuration test.

352-358: LGTM!

The inputs_pre_hook static method provides a clear test implementation that modifies the input tensor as documented. The implementation correctly sets the last row first column to the shape[0] value for validation purposes.

361-379: LGTM!

The updated test function properly exercises the new inputs_pre_hook and tune_max_num_tokens features. The test configuration correctly uses both the hook function and the token limit from the test runner class.

336-350: LGTM!

The updated forward method correctly handles both dict-based tactics and the fallback case. The logic properly extracts block_size and tactic_id from the dictionary while maintaining backward compatibility with the integer fallback case.

tensorrt-cicd · 2025-08-15T18:29:53Z

PR_Github #15446 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11639 completed with status: 'FAILURE'

hyukn · 2025-08-18T00:53:30Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-18T00:59:15Z

PR_Github #15550 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-18T04:52:02Z

PR_Github #15550 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11712 completed with status: 'FAILURE'

hyukn · 2025-08-18T05:50:24Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-18T05:55:33Z

PR_Github #15595 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-18T14:03:11Z

PR_Github #15595 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11741 completed with status: 'FAILURE'

hyukn · 2025-08-19T03:30:33Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-19T03:36:21Z

PR_Github #15702 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-19T06:03:58Z

PR_Github #15702 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11809 completed with status: 'SUCCESS'

tensorrt_llm/_torch/autotuner.py

tests/unittest/_torch/test_autotuner.py

tensorrt-cicd · 2025-08-22T04:41:34Z

PR_Github #16120 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-22T11:48:17Z

PR_Github #16120 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12125 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

hyukn · 2025-08-25T01:23:20Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-25T01:28:24Z

PR_Github #16339 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-25T05:28:24Z

PR_Github #16339 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12281 completed with status: 'FAILURE'

hyukn · 2025-08-25T05:32:24Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-25T05:37:36Z

PR_Github #16385 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-25T06:02:55Z

PR_Github #16385 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #12312 completed with status: 'ABORTED'

hyukn · 2025-08-25T06:31:45Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-25T06:37:21Z

PR_Github #16392 [ run ] triggered by Bot

hyukn · 2025-08-25T07:12:09Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-25T07:17:11Z

PR_Github #16406 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-25T07:17:13Z

PR_Github #16392 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-08-25T12:09:56Z

PR_Github #16406 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12330 completed with status: 'FAILURE'

hyukn · 2025-08-26T01:08:35Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-26T01:14:02Z

PR_Github #16493 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-26T03:14:31Z

PR_Github #16493 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12388 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

…ning process. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

hyukn · 2025-10-15T03:22:08Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-10-15T03:34:12Z

PR_Github #21425 [ run ] triggered by Bot

tensorrt-cicd · 2025-10-15T07:15:41Z

PR_Github #21425 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #16180 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

…ning process. (NVIDIA#6924) Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

…ning process. (NVIDIA#6924) Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

hyukn requested review from limin2021 and litaotju August 15, 2025 03:11

hyukn requested a review from a team as a code owner August 15, 2025 03:11

coderabbitai bot reviewed Aug 15, 2025

View reviewed changes

tensorrt_llm/_torch/autotuner.py Outdated Show resolved Hide resolved

tests/unittest/_torch/test_autotuner.py Show resolved Hide resolved

hyukn force-pushed the feat/tuning_pre_hook branch from faa2424 to f335dd0 Compare August 15, 2025 05:52

coderabbitai bot reviewed Aug 15, 2025

View reviewed changes

tensorrt_llm/_torch/autotuner.py Show resolved Hide resolved

hyukn force-pushed the feat/tuning_pre_hook branch from f335dd0 to ed46d85 Compare August 15, 2025 13:26

coderabbitai bot reviewed Aug 15, 2025

View reviewed changes

hyukn force-pushed the feat/tuning_pre_hook branch from ed46d85 to d6abe21 Compare August 18, 2025 00:52

liji-nv reviewed Aug 20, 2025

View reviewed changes

tensorrt_llm/_torch/autotuner.py Show resolved Hide resolved

tests/unittest/_torch/test_autotuner.py Show resolved Hide resolved

hyukn force-pushed the feat/tuning_pre_hook branch 4 times, most recently from 8f7db4b to 9d4c156 Compare August 21, 2025 02:33

hyukn force-pushed the feat/tuning_pre_hook branch from b45df95 to 3ae0f8b Compare August 25, 2025 01:23

hyukn force-pushed the feat/tuning_pre_hook branch from 3ae0f8b to d0110f4 Compare August 25, 2025 07:11

limin2021 approved these changes Sep 7, 2025

View reviewed changes

[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tu…

d3c945e

…ning process. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

hyukn force-pushed the feat/tuning_pre_hook branch from d0110f4 to d3c945e Compare October 15, 2025 03:22

hyukn mentioned this pull request Oct 15, 2025

[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN #8156

Merged

1 task

litaotju approved these changes Oct 15, 2025

View reviewed changes

hyukn merged commit 56c2066 into NVIDIA:main Oct 15, 2025
5 checks passed

[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. #6924

[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. #6924

Uh oh!

Conversation

hyukn commented Aug 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hyukn commented Aug 15, 2025

Uh oh!

tensorrt-cicd commented Aug 15, 2025

Uh oh!

hyukn commented Aug 15, 2025

Uh oh!

tensorrt-cicd commented Aug 15, 2025

Uh oh!

tensorrt-cicd commented Aug 15, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Aug 15, 2025

Uh oh!

hyukn commented Aug 18, 2025

Uh oh!

tensorrt-cicd commented Aug 18, 2025

Uh oh!

tensorrt-cicd commented Aug 18, 2025

Uh oh!

hyukn commented Aug 18, 2025

Uh oh!

tensorrt-cicd commented Aug 18, 2025

Uh oh!

tensorrt-cicd commented Aug 18, 2025

Uh oh!

hyukn commented Aug 19, 2025

Uh oh!

tensorrt-cicd commented Aug 19, 2025

Uh oh!

tensorrt-cicd commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Aug 22, 2025

Uh oh!

tensorrt-cicd commented Aug 22, 2025

Uh oh!

hyukn commented Aug 25, 2025

Uh oh!

tensorrt-cicd commented Aug 25, 2025

Uh oh!

tensorrt-cicd commented Aug 25, 2025

Uh oh!

hyukn commented Aug 25, 2025

Uh oh!

tensorrt-cicd commented Aug 25, 2025

Uh oh!

hyukn commented Aug 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 15, 2025 •

edited

Loading