KEMBAR78
[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. by hyukn · Pull Request #6924 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

Conversation

@hyukn
Copy link
Collaborator

@hyukn hyukn commented Aug 15, 2025

Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process.

Summary by CodeRabbit

  • New Features

    • Added an optional input pre-processing hook to modify inputs before autotuning.
    • Added a configurable token cap to limit tokens used during tuning.
    • Expanded tactic selection to accept richer tactic configuration objects for profiling.
  • Tests

    • Updated tests to cover input pre-processing, token caps, and the enhanced tactic configuration and profiling flow.

@hyukn hyukn requested review from limin2021 and litaotju August 15, 2025 03:11
@hyukn hyukn requested a review from a team as a code owner August 15, 2025 03:11
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 15, 2025

📝 Walkthrough

Walkthrough

Adds an optional inputs_pre_hook to TuningConfig and invokes it at the start of AutoTuner._profile_runners to preprocess input tensors before profiling. Unit tests updated to use a new test runner with tune_max_num_tokens, an inputs_pre_hook, and dict-based tactic handling.

Changes

Cohort / File(s) Summary
Autotuner core
tensorrt_llm/_torch/autotuner.py
Added TuningConfig.inputs_pre_hook: Callable = None. AutoTuner._profile_runners now calls inputs_pre_hook(input_tensors) if provided and replaces input_tensors with the result before profiling. Docstring for the hook updated to describe preprocessing behavior.
Unit tests for autotuner
tests/unittest/_torch/test_autotuner.py
Renamed test runner GemmRunnerWithTacticConfigsGemmRunnerComplexTuningConfigs; added tune_max_num_tokens = 32 and a static inputs_pre_hook that mutates inputs. forward signature changed to forward(self, /, inputs: List[torch.Tensor], *, tactic: dict = {}). get_valid_tactics now returns List[Dict[str,int]] and includes runtime checks against tune_max_num_tokens and input content. Test updated to construct TuningConfig with tune_max_num_tokens and inputs_pre_hook, and test name changed to test_autotuner_tuning_configs.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Test Runner
  participant AutoTuner as AutoTuner
  participant Hook as inputs_pre_hook
  participant Runner as GEMM Runner

  Test->>AutoTuner: start autotune(config with inputs_pre_hook)
  AutoTuner->>Hook: inputs_pre_hook(input_tensors)
  Hook-->>AutoTuner: modified_input_tensors
  AutoTuner->>Runner: profile(modified_input_tensors, tactic=dict)
  Runner-->>AutoTuner: profiling results
  AutoTuner-->>Test: return best tactic
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • lfr-0531
  • suyoggupta
  • limin2021
  • liji-nv

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
tensorrt_llm/_torch/autotuner.py (4)

89-91: Fix inputs_pre_hook docs to match implementation and wrap lines (ruff E501).

The docstring currently says the hook takes (profile, inputs) and is called “before the input tensors are prepared,” but the implementation calls it after _prepare_input_tensors and passes only inputs. Please align the docs with the actual behavior and keep lines ≤120 chars.

Apply:

-        inputs_pre_hook (Callable): A function that takes a profile and a list of input tensors, and returns a list of input tensors.
-            The function is called before the input tensors are prepared for the tuning process to match the real data distribution.
+        inputs_pre_hook (Callable | None): Optional callable invoked after synthetic
+            input tensors are prepared for tuning, but before profiling. It should accept
+            either (inputs) or (inputs, profile) and return a list of input tensors with
+            the same shapes, dtypes, and devices.

95-95: Type inputs_pre_hook precisely and make it Optional.

Using a bare Callable loses intent and None is not captured. Recommend Optional with a variadic signature to support both hook forms.

-    inputs_pre_hook: Callable = None
+    inputs_pre_hook: Optional[Callable[..., List[torch.Tensor]]] = None

Additionally, add the missing import at the top of the file:

from typing import Optional

453-456: Support both hook signatures and validate outputs to avoid hard-to-debug failures.

Right now we always call hook(inputs) and don’t validate the result. Add support for both (inputs) and (inputs, profile) forms and verify that shapes/dtypes/devices remain unchanged.

-        # If the inputs_pre_hook is provided, it will be called before profiling.
-        if tuning_config.inputs_pre_hook is not None:
-            input_tensors = tuning_config.inputs_pre_hook(input_tensors)
+        # If the inputs_pre_hook is provided, it will be called before profiling.
+        # Support both signatures: hook(inputs) and hook(inputs, profile).
+        if tuning_config.inputs_pre_hook is not None:
+            hook = tuning_config.inputs_pre_hook
+            try:
+                sig = inspect.signature(hook)
+                if len(sig.parameters) >= 2:
+                    new_inputs = hook(input_tensors, profile)
+                else:
+                    new_inputs = hook(input_tensors)
+            except Exception as e:
+                logger.debug(f"[Autotuner] inputs_pre_hook invocation failed: {e}")
+                raise
+            # Validate hook output
+            if not isinstance(new_inputs, list):
+                raise TypeError("inputs_pre_hook must return a List[torch.Tensor]")
+            if len(new_inputs) != len(input_tensors):
+                raise ValueError("inputs_pre_hook must return the same number of tensors")
+            for old, new in zip(input_tensors, new_inputs):
+                if not isinstance(new, torch.Tensor):
+                    raise TypeError("inputs_pre_hook must return only torch.Tensor elements")
+                if old.size() != new.size() or old.dtype != new.dtype or old.device != new.device:
+                    raise ValueError(
+                        "inputs_pre_hook must preserve shapes, dtype and device for all inputs"
+                    )
+            input_tensors = new_inputs

1-1: Missing NVIDIA copyright header.

Per coding guidelines, prepend the current-year NVIDIA copyright header.

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
tests/unittest/_torch/test_autotuner.py (3)

341-349: Avoid mutable default for tactic dict.

Using {} as a default is a Python footgun. Use None and normalize inside the function; keep -1 behavior intact.

-        tactic: dict = {},
+        tactic: Dict[str, int] = None,
     ) -> torch.Tensor:
         # Notice that in fallback case tactic is -1
-        if tactic == -1:
+        if tactic in (-1, None):
             # assign default configs for fallback case
             block_size, tactic_id = 128, -1
         else:
             block_size, tactic_id = tactic["block_size"], tactic["tactic_id"]

352-359: Clarify pre-hook and avoid implicit dtype cast.

Minor cleanup: add return type, and set the sentinel value with the correct dtype/device explicitly to avoid implicit int→float cast.

     @staticmethod
-    def inputs_pre_hook(inputs: List[torch.Tensor]):
-        # always set the first element to bo iota in x
+    def inputs_pre_hook(inputs: List[torch.Tensor]) -> List[torch.Tensor]:
+        # Set the last row/first column of x to the current token count
         x, w = inputs
         x_hooked = torch.zeros_like(x)
-        x_hooked[-1, 0] = x.shape[0]
+        x_hooked[-1, 0] = torch.tensor(x.shape[0], dtype=x.dtype, device=x.device)
         return [x_hooked, w]

1-1: Missing NVIDIA copyright header.

Per coding guidelines, prepend the current-year NVIDIA copyright header.

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9a133e9 and faa2424.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/autotuner.py (2 hunks)
  • tests/unittest/_torch/test_autotuner.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tensorrt_llm/_torch/autotuner.py
  • tests/unittest/_torch/test_autotuner.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tensorrt_llm/_torch/autotuner.py
  • tests/unittest/_torch/test_autotuner.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py

89-89: Line too long (133 > 120)

(E501)


90-90: Line too long (132 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tests/unittest/_torch/test_autotuner.py (1)

361-375: LGTM on wiring the new tuning knobs in tests.

Good end-to-end coverage of dynamic buckets, tune_max_num_tokens, and the pre-hook. Once the tune-max enforcement is fixed in autotuner.py, this scenario should be robust.

@hyukn hyukn force-pushed the feat/tuning_pre_hook branch from faa2424 to f335dd0 Compare August 15, 2025 05:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tensorrt_llm/_torch/autotuner.py (1)

89-90: Docstring/API mismatch and >120-char lines (Ruff E501) — clarify signature and wrap lines

Docstring states the hook takes (profile, inputs), but implementation currently passes only inputs. Also both lines exceed 120 chars.

Apply this diff to (a) document the intended signature clearly and (b) fix line length:

-        inputs_pre_hook (Callable): A function that takes a profile and a list of input tensors, and returns a list of input tensors.
-            The function is called before the input tensors are prepared for the tuning process to match the real data distribution.
+        inputs_pre_hook (Callable): A function that takes (profile, inputs) and returns a new list
+            of input tensors. It is invoked before preparing the tuning tensors to better match
+            real data distributions.
tests/unittest/_torch/test_autotuner.py (1)

323-329: This assertion will fail until _optimization_profiles is gated by tune_max_num_tokens (duplicate of prior review)

_optimization_profiles currently unconditionally adds the mapped current input size to opt_shapes, which introduces 64 when tune_max_num_tokens=32. That will make inputs[0].shape[0] == 64 for some profiles, tripping this assertion.

Proposed fix in tensorrt_llm/_torch/autotuner.py (context ~Lines 605–611), gate the addition:

# Add the current input value as one of the opt values, but don't exceed the cap.
opt_shapes = set(opt_shapes)
mapped_val = spec.map_to_tuning_buckets(
    base_profile.shapes[spec.input_idx][spec.dim_idx].val
)
if tuning_config.tune_max_num_tokens is None or mapped_val <= tuning_config.tune_max_num_tokens:
    opt_shapes.add(mapped_val)
opt_shapes = sorted(list(opt_shapes))

Once applied, this test assertion should pass consistently.

🧹 Nitpick comments (4)
tensorrt_llm/_torch/autotuner.py (1)

95-95: Type the new field as optional and narrow the callable’s contract

Typing this as Optional and clarifying the callable’s shape improves readability and tooling. If you keep supporting only (inputs)->inputs for now, please reflect that in types.

Suggested change (requires adding Optional to typing imports at Line 8):

-    inputs_pre_hook: Callable = None
+    # Accepts either (profile, inputs) -> inputs or (inputs) -> inputs
+    inputs_pre_hook: Optional[Callable] = None

Optionally, narrow further to Callable[..., List[torch.Tensor]] to reflect return type.

tests/unittest/_torch/test_autotuner.py (3)

341-349: Avoid mutable default argument for tactic; accept None and handle fallback cleanly

tactic: dict = {} uses a mutable default and can lead to subtle bugs. Use None and treat None like fallback (-1). Optionally widen the type to allow either -1 or a dict.

Apply this diff (add Optional to imports at the top):

-    def forward(
+    def forward(
         self,
         /,
         inputs: List[torch.Tensor],
         *,
-        tactic: dict = {},
+        tactic: dict = None,
     ) -> torch.Tensor:
-        # Notice that in fallback case tactic is -1
-        if tactic == -1:
+        # Notice that in fallback case tactic is -1 or tactic is None
+        if tactic in (-1, None):
             # assign default configs for fallback case
             block_size, tactic_id = 128, -1
         else:
             block_size, tactic_id = tactic["block_size"], tactic["tactic_id"]

If you prefer stricter typing:

  • from typing import Optional, Union
  • tactic: Union[int, Dict[str, int]] = -1
  • if tactic == -1: ...

352-359: Align test pre-hook signature with documented API (profile, inputs)

To match the TuningConfig doc and the suggested change in autotuner, accept profile as the first arg. This keeps tests representative of intended usage.

Apply this diff:

     @staticmethod
-    def inputs_pre_hook(inputs: List[torch.Tensor]):
-        # always set the first element to bo iota in x
-        x, w = inputs
+    def inputs_pre_hook(profile: OptimizationProfile, inputs: List[torch.Tensor]):
+        # always set the first element to an identifying value in x
+        x, w = inputs
         x_hooked = torch.zeros_like(x)
         x_hooked[-1, 0] = x.shape[0]
         return [x_hooked, w]

354-355: Clarify comment wording (“bo iota”)

“bo iota” is unclear and looks like a typo. Consider clarifying the intent (e.g., “set a sentinel value based on sequence length”).

Suggested:

  • “Set a sentinel value in x based on the sequence length to verify the pre-hook ran.”
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between faa2424 and f335dd0.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/autotuner.py (2 hunks)
  • tests/unittest/_torch/test_autotuner.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tests/unittest/_torch/test_autotuner.py
  • tensorrt_llm/_torch/autotuner.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/unittest/_torch/test_autotuner.py
  • tensorrt_llm/_torch/autotuner.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py

89-89: Line too long (133 > 120)

(E501)


90-90: Line too long (132 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@hyukn
Copy link
Collaborator Author

hyukn commented Aug 15, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15443 [ run ] triggered by Bot

@hyukn hyukn force-pushed the feat/tuning_pre_hook branch from f335dd0 to ed46d85 Compare August 15, 2025 13:26
@hyukn
Copy link
Collaborator Author

hyukn commented Aug 15, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15446 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15443 [ run ] completed with state ABORTED

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tensorrt_llm/_torch/autotuner.py (1)

453-456: Pass profile into inputs_pre_hook (per API/intent) and validate return; optionally support both signatures

Currently the hook is called with only inputs, diverging from the documented API and limiting usefulness. Pass the profile too and validate the return value. For backward compatibility you can fall back to the 1-arg call.

Apply this diff:

-        # If the inputs_pre_hook is provided, it will be called before profiling.
-        if tuning_config.inputs_pre_hook is not None:
-            input_tensors = tuning_config.inputs_pre_hook(input_tensors)
+        # If provided, preprocess inputs before profiling to better match real data distributions.
+        if tuning_config.inputs_pre_hook is not None:
+            hook = tuning_config.inputs_pre_hook
+            # Prefer the documented signature (profile, inputs); fall back to (inputs) for compat.
+            try:
+                input_tensors = hook(profile, input_tensors)
+            except TypeError:
+                input_tensors = hook(input_tensors)
+            # Basic validation: ensure we got back a list with the same arity.
+            if not isinstance(input_tensors, list) or len(input_tensors) != len(profile.shapes):
+                raise ValueError(
+                    "inputs_pre_hook must return a list of tensors with the same arity as inputs"
+                )
tests/unittest/_torch/test_autotuner.py (1)

322-329: Fix: don't add base (current-input) opt value when it exceeds tuning_config.tune_max_num_tokens

The autotuner unconditionally adds the mapped current input size into opt_shapes (tensorrt_llm/_torch/autotuner.py ~lines 605–610). When tuning_config.tune_max_num_tokens is set (e.g. 32) that can introduce opt values > tune_max_num_tokens (e.g. 64) and cause the test assertion in tests/unittest/_torch/test_autotuner.py (lines ~322–329) to fail.

The assertion should pass after applying the suggested fix in autotuner.py to gate the addition of the base shape by tune_max_num_tokens.

#!/bin/bash
# Description: Verify that _optimization_profiles properly respects tune_max_num_tokens
# Expected: The mapped current input size should only be added if it's <= tune_max_num_tokens

# Search for the problematic code in _optimization_profiles
rg -n -A 5 -B 5 'opt_shapes\.add' tensorrt_llm/_torch/autotuner.py
🧹 Nitpick comments (1)
tensorrt_llm/_torch/autotuner.py (1)

89-90: Fix line length violations.

Lines 89-90 exceed the 120-character limit. Consider breaking these lines appropriately.

Apply this diff to fix the line length issues:

-        inputs_pre_hook (Callable): A function that takes a list of input tensors, returns a list of modified input tensors.
-            It is called before the input tensors are prepared for the tuning process to match the real data distribution.
+        inputs_pre_hook (Callable): A function that takes a list of input tensors, 
+            returns a list of modified input tensors. It is called before the input
+            tensors are prepared for the tuning process to match the real data distribution.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f335dd0 and ed46d85.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/autotuner.py (2 hunks)
  • tests/unittest/_torch/test_autotuner.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tests/unittest/_torch/test_autotuner.py
  • tensorrt_llm/_torch/autotuner.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/unittest/_torch/test_autotuner.py
  • tensorrt_llm/_torch/autotuner.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py

89-89: Line too long (124 > 120)

(E501)


90-90: Line too long (122 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tensorrt_llm/_torch/autotuner.py (1)

95-95: LGTM!

The addition of the inputs_pre_hook field to the TuningConfig dataclass is well-integrated and follows the existing pattern.

tests/unittest/_torch/test_autotuner.py (4)

313-315: LGTM!

The class rename and addition of tune_max_num_tokens attribute provide clearer semantics for the complex tuning configuration test.


352-358: LGTM!

The inputs_pre_hook static method provides a clear test implementation that modifies the input tensor as documented. The implementation correctly sets the last row first column to the shape[0] value for validation purposes.


361-379: LGTM!

The updated test function properly exercises the new inputs_pre_hook and tune_max_num_tokens features. The test configuration correctly uses both the hook function and the token limit from the test runner class.


336-350: LGTM!

The updated forward method correctly handles both dict-based tactics and the fallback case. The logic properly extracts block_size and tactic_id from the dictionary while maintaining backward compatibility with the integer fallback case.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15446 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11639 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the feat/tuning_pre_hook branch from ed46d85 to d6abe21 Compare August 18, 2025 00:52
@hyukn
Copy link
Collaborator Author

hyukn commented Aug 18, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15550 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15550 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11712 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Aug 18, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15595 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15595 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11741 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Aug 19, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15702 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15702 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11809 completed with status: 'SUCCESS'

@hyukn hyukn force-pushed the feat/tuning_pre_hook branch 4 times, most recently from 8f7db4b to 9d4c156 Compare August 21, 2025 02:33
@tensorrt-cicd
Copy link
Collaborator

PR_Github #16120 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16120 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12125 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@hyukn hyukn force-pushed the feat/tuning_pre_hook branch from b45df95 to 3ae0f8b Compare August 25, 2025 01:23
@hyukn
Copy link
Collaborator Author

hyukn commented Aug 25, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16339 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16339 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12281 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Aug 25, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16385 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16385 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #12312 completed with status: 'ABORTED'

@hyukn
Copy link
Collaborator Author

hyukn commented Aug 25, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16392 [ run ] triggered by Bot

@hyukn hyukn force-pushed the feat/tuning_pre_hook branch from 3ae0f8b to d0110f4 Compare August 25, 2025 07:11
@hyukn
Copy link
Collaborator Author

hyukn commented Aug 25, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16406 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16392 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16406 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12330 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Aug 26, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16493 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16493 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12388 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

…ning process.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
@hyukn hyukn force-pushed the feat/tuning_pre_hook branch from d0110f4 to d3c945e Compare October 15, 2025 03:22
@hyukn
Copy link
Collaborator Author

hyukn commented Oct 15, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21425 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21425 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #16180 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@hyukn hyukn merged commit 56c2066 into NVIDIA:main Oct 15, 2025
5 checks passed
govind-ramnarayan pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Oct 21, 2025
…ning process. (NVIDIA#6924)

Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Oct 24, 2025
…ning process. (NVIDIA#6924)

Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants