-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. #6924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds an optional inputs_pre_hook to TuningConfig and invokes it at the start of AutoTuner._profile_runners to preprocess input tensors before profiling. Unit tests updated to use a new test runner with tune_max_num_tokens, an inputs_pre_hook, and dict-based tactic handling. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Runner
participant AutoTuner as AutoTuner
participant Hook as inputs_pre_hook
participant Runner as GEMM Runner
Test->>AutoTuner: start autotune(config with inputs_pre_hook)
AutoTuner->>Hook: inputs_pre_hook(input_tensors)
Hook-->>AutoTuner: modified_input_tensors
AutoTuner->>Runner: profile(modified_input_tensors, tactic=dict)
Runner-->>AutoTuner: profiling results
AutoTuner-->>Test: return best tactic
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
tensorrt_llm/_torch/autotuner.py (4)
89-91: Fix inputs_pre_hook docs to match implementation and wrap lines (ruff E501).The docstring currently says the hook takes (profile, inputs) and is called “before the input tensors are prepared,” but the implementation calls it after _prepare_input_tensors and passes only inputs. Please align the docs with the actual behavior and keep lines ≤120 chars.
Apply:
- inputs_pre_hook (Callable): A function that takes a profile and a list of input tensors, and returns a list of input tensors. - The function is called before the input tensors are prepared for the tuning process to match the real data distribution. + inputs_pre_hook (Callable | None): Optional callable invoked after synthetic + input tensors are prepared for tuning, but before profiling. It should accept + either (inputs) or (inputs, profile) and return a list of input tensors with + the same shapes, dtypes, and devices.
95-95: Type inputs_pre_hook precisely and make it Optional.Using a bare Callable loses intent and None is not captured. Recommend Optional with a variadic signature to support both hook forms.
- inputs_pre_hook: Callable = None + inputs_pre_hook: Optional[Callable[..., List[torch.Tensor]]] = NoneAdditionally, add the missing import at the top of the file:
from typing import Optional
453-456: Support both hook signatures and validate outputs to avoid hard-to-debug failures.Right now we always call hook(inputs) and don’t validate the result. Add support for both (inputs) and (inputs, profile) forms and verify that shapes/dtypes/devices remain unchanged.
- # If the inputs_pre_hook is provided, it will be called before profiling. - if tuning_config.inputs_pre_hook is not None: - input_tensors = tuning_config.inputs_pre_hook(input_tensors) + # If the inputs_pre_hook is provided, it will be called before profiling. + # Support both signatures: hook(inputs) and hook(inputs, profile). + if tuning_config.inputs_pre_hook is not None: + hook = tuning_config.inputs_pre_hook + try: + sig = inspect.signature(hook) + if len(sig.parameters) >= 2: + new_inputs = hook(input_tensors, profile) + else: + new_inputs = hook(input_tensors) + except Exception as e: + logger.debug(f"[Autotuner] inputs_pre_hook invocation failed: {e}") + raise + # Validate hook output + if not isinstance(new_inputs, list): + raise TypeError("inputs_pre_hook must return a List[torch.Tensor]") + if len(new_inputs) != len(input_tensors): + raise ValueError("inputs_pre_hook must return the same number of tensors") + for old, new in zip(input_tensors, new_inputs): + if not isinstance(new, torch.Tensor): + raise TypeError("inputs_pre_hook must return only torch.Tensor elements") + if old.size() != new.size() or old.dtype != new.dtype or old.device != new.device: + raise ValueError( + "inputs_pre_hook must preserve shapes, dtype and device for all inputs" + ) + input_tensors = new_inputs
1-1: Missing NVIDIA copyright header.Per coding guidelines, prepend the current-year NVIDIA copyright header.
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.tests/unittest/_torch/test_autotuner.py (3)
341-349: Avoid mutable default for tactic dict.Using {} as a default is a Python footgun. Use None and normalize inside the function; keep -1 behavior intact.
- tactic: dict = {}, + tactic: Dict[str, int] = None, ) -> torch.Tensor: # Notice that in fallback case tactic is -1 - if tactic == -1: + if tactic in (-1, None): # assign default configs for fallback case block_size, tactic_id = 128, -1 else: block_size, tactic_id = tactic["block_size"], tactic["tactic_id"]
352-359: Clarify pre-hook and avoid implicit dtype cast.Minor cleanup: add return type, and set the sentinel value with the correct dtype/device explicitly to avoid implicit int→float cast.
@staticmethod - def inputs_pre_hook(inputs: List[torch.Tensor]): - # always set the first element to bo iota in x + def inputs_pre_hook(inputs: List[torch.Tensor]) -> List[torch.Tensor]: + # Set the last row/first column of x to the current token count x, w = inputs x_hooked = torch.zeros_like(x) - x_hooked[-1, 0] = x.shape[0] + x_hooked[-1, 0] = torch.tensor(x.shape[0], dtype=x.dtype, device=x.device) return [x_hooked, w]
1-1: Missing NVIDIA copyright header.Per coding guidelines, prepend the current-year NVIDIA copyright header.
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
tensorrt_llm/_torch/autotuner.py(2 hunks)tests/unittest/_torch/test_autotuner.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tensorrt_llm/_torch/autotuner.pytests/unittest/_torch/test_autotuner.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tensorrt_llm/_torch/autotuner.pytests/unittest/_torch/test_autotuner.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
89-89: Line too long (133 > 120)
(E501)
90-90: Line too long (132 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tests/unittest/_torch/test_autotuner.py (1)
361-375: LGTM on wiring the new tuning knobs in tests.Good end-to-end coverage of dynamic buckets, tune_max_num_tokens, and the pre-hook. Once the tune-max enforcement is fixed in autotuner.py, this scenario should be robust.
faa2424 to
f335dd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tensorrt_llm/_torch/autotuner.py (1)
89-90: Docstring/API mismatch and >120-char lines (Ruff E501) — clarify signature and wrap linesDocstring states the hook takes (profile, inputs), but implementation currently passes only inputs. Also both lines exceed 120 chars.
Apply this diff to (a) document the intended signature clearly and (b) fix line length:
- inputs_pre_hook (Callable): A function that takes a profile and a list of input tensors, and returns a list of input tensors. - The function is called before the input tensors are prepared for the tuning process to match the real data distribution. + inputs_pre_hook (Callable): A function that takes (profile, inputs) and returns a new list + of input tensors. It is invoked before preparing the tuning tensors to better match + real data distributions.tests/unittest/_torch/test_autotuner.py (1)
323-329: This assertion will fail until _optimization_profiles is gated by tune_max_num_tokens (duplicate of prior review)_optimization_profiles currently unconditionally adds the mapped current input size to opt_shapes, which introduces 64 when tune_max_num_tokens=32. That will make inputs[0].shape[0] == 64 for some profiles, tripping this assertion.
Proposed fix in tensorrt_llm/_torch/autotuner.py (context ~Lines 605–611), gate the addition:
# Add the current input value as one of the opt values, but don't exceed the cap. opt_shapes = set(opt_shapes) mapped_val = spec.map_to_tuning_buckets( base_profile.shapes[spec.input_idx][spec.dim_idx].val ) if tuning_config.tune_max_num_tokens is None or mapped_val <= tuning_config.tune_max_num_tokens: opt_shapes.add(mapped_val) opt_shapes = sorted(list(opt_shapes))Once applied, this test assertion should pass consistently.
🧹 Nitpick comments (4)
tensorrt_llm/_torch/autotuner.py (1)
95-95: Type the new field as optional and narrow the callable’s contractTyping this as Optional and clarifying the callable’s shape improves readability and tooling. If you keep supporting only (inputs)->inputs for now, please reflect that in types.
Suggested change (requires adding Optional to typing imports at Line 8):
- inputs_pre_hook: Callable = None + # Accepts either (profile, inputs) -> inputs or (inputs) -> inputs + inputs_pre_hook: Optional[Callable] = NoneOptionally, narrow further to Callable[..., List[torch.Tensor]] to reflect return type.
tests/unittest/_torch/test_autotuner.py (3)
341-349: Avoid mutable default argument for tactic; accept None and handle fallback cleanlytactic: dict = {} uses a mutable default and can lead to subtle bugs. Use None and treat None like fallback (-1). Optionally widen the type to allow either -1 or a dict.
Apply this diff (add Optional to imports at the top):
- def forward( + def forward( self, /, inputs: List[torch.Tensor], *, - tactic: dict = {}, + tactic: dict = None, ) -> torch.Tensor: - # Notice that in fallback case tactic is -1 - if tactic == -1: + # Notice that in fallback case tactic is -1 or tactic is None + if tactic in (-1, None): # assign default configs for fallback case block_size, tactic_id = 128, -1 else: block_size, tactic_id = tactic["block_size"], tactic["tactic_id"]If you prefer stricter typing:
- from typing import Optional, Union
- tactic: Union[int, Dict[str, int]] = -1
- if tactic == -1: ...
352-359: Align test pre-hook signature with documented API (profile, inputs)To match the TuningConfig doc and the suggested change in autotuner, accept profile as the first arg. This keeps tests representative of intended usage.
Apply this diff:
@staticmethod - def inputs_pre_hook(inputs: List[torch.Tensor]): - # always set the first element to bo iota in x - x, w = inputs + def inputs_pre_hook(profile: OptimizationProfile, inputs: List[torch.Tensor]): + # always set the first element to an identifying value in x + x, w = inputs x_hooked = torch.zeros_like(x) x_hooked[-1, 0] = x.shape[0] return [x_hooked, w]
354-355: Clarify comment wording (“bo iota”)“bo iota” is unclear and looks like a typo. Consider clarifying the intent (e.g., “set a sentinel value based on sequence length”).
Suggested:
- “Set a sentinel value in x based on the sequence length to verify the pre-hook ran.”
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
tensorrt_llm/_torch/autotuner.py(2 hunks)tests/unittest/_torch/test_autotuner.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/unittest/_torch/test_autotuner.pytensorrt_llm/_torch/autotuner.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/unittest/_torch/test_autotuner.pytensorrt_llm/_torch/autotuner.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
89-89: Line too long (133 > 120)
(E501)
90-90: Line too long (132 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
|
/bot run --disable-fail-fast |
|
PR_Github #15443 [ run ] triggered by Bot |
f335dd0 to
ed46d85
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #15446 [ run ] triggered by Bot |
|
PR_Github #15443 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tensorrt_llm/_torch/autotuner.py (1)
453-456: Pass profile into inputs_pre_hook (per API/intent) and validate return; optionally support both signaturesCurrently the hook is called with only inputs, diverging from the documented API and limiting usefulness. Pass the profile too and validate the return value. For backward compatibility you can fall back to the 1-arg call.
Apply this diff:
- # If the inputs_pre_hook is provided, it will be called before profiling. - if tuning_config.inputs_pre_hook is not None: - input_tensors = tuning_config.inputs_pre_hook(input_tensors) + # If provided, preprocess inputs before profiling to better match real data distributions. + if tuning_config.inputs_pre_hook is not None: + hook = tuning_config.inputs_pre_hook + # Prefer the documented signature (profile, inputs); fall back to (inputs) for compat. + try: + input_tensors = hook(profile, input_tensors) + except TypeError: + input_tensors = hook(input_tensors) + # Basic validation: ensure we got back a list with the same arity. + if not isinstance(input_tensors, list) or len(input_tensors) != len(profile.shapes): + raise ValueError( + "inputs_pre_hook must return a list of tensors with the same arity as inputs" + )tests/unittest/_torch/test_autotuner.py (1)
322-329: Fix: don't add base (current-input) opt value when it exceeds tuning_config.tune_max_num_tokensThe autotuner unconditionally adds the mapped current input size into opt_shapes (tensorrt_llm/_torch/autotuner.py ~lines 605–610). When tuning_config.tune_max_num_tokens is set (e.g. 32) that can introduce opt values > tune_max_num_tokens (e.g. 64) and cause the test assertion in tests/unittest/_torch/test_autotuner.py (lines ~322–329) to fail.
The assertion should pass after applying the suggested fix in autotuner.py to gate the addition of the base shape by
tune_max_num_tokens.#!/bin/bash # Description: Verify that _optimization_profiles properly respects tune_max_num_tokens # Expected: The mapped current input size should only be added if it's <= tune_max_num_tokens # Search for the problematic code in _optimization_profiles rg -n -A 5 -B 5 'opt_shapes\.add' tensorrt_llm/_torch/autotuner.py
🧹 Nitpick comments (1)
tensorrt_llm/_torch/autotuner.py (1)
89-90: Fix line length violations.Lines 89-90 exceed the 120-character limit. Consider breaking these lines appropriately.
Apply this diff to fix the line length issues:
- inputs_pre_hook (Callable): A function that takes a list of input tensors, returns a list of modified input tensors. - It is called before the input tensors are prepared for the tuning process to match the real data distribution. + inputs_pre_hook (Callable): A function that takes a list of input tensors, + returns a list of modified input tensors. It is called before the input + tensors are prepared for the tuning process to match the real data distribution.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
tensorrt_llm/_torch/autotuner.py(2 hunks)tests/unittest/_torch/test_autotuner.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/unittest/_torch/test_autotuner.pytensorrt_llm/_torch/autotuner.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/unittest/_torch/test_autotuner.pytensorrt_llm/_torch/autotuner.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
89-89: Line too long (124 > 120)
(E501)
90-90: Line too long (122 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tensorrt_llm/_torch/autotuner.py (1)
95-95: LGTM!The addition of the
inputs_pre_hookfield to theTuningConfigdataclass is well-integrated and follows the existing pattern.tests/unittest/_torch/test_autotuner.py (4)
313-315: LGTM!The class rename and addition of
tune_max_num_tokensattribute provide clearer semantics for the complex tuning configuration test.
352-358: LGTM!The
inputs_pre_hookstatic method provides a clear test implementation that modifies the input tensor as documented. The implementation correctly sets the last row first column to the shape[0] value for validation purposes.
361-379: LGTM!The updated test function properly exercises the new
inputs_pre_hookandtune_max_num_tokensfeatures. The test configuration correctly uses both the hook function and the token limit from the test runner class.
336-350: LGTM!The updated
forwardmethod correctly handles both dict-based tactics and the fallback case. The logic properly extracts block_size and tactic_id from the dictionary while maintaining backward compatibility with the integer fallback case.
|
PR_Github #15446 [ run ] completed with state |
ed46d85 to
d6abe21
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #15550 [ run ] triggered by Bot |
|
PR_Github #15550 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #15595 [ run ] triggered by Bot |
|
PR_Github #15595 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #15702 [ run ] triggered by Bot |
|
PR_Github #15702 [ run ] completed with state |
8f7db4b to
9d4c156
Compare
|
PR_Github #16120 [ run ] triggered by Bot |
|
PR_Github #16120 [ run ] completed with state |
b45df95 to
3ae0f8b
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #16339 [ run ] triggered by Bot |
|
PR_Github #16339 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #16385 [ run ] triggered by Bot |
|
PR_Github #16385 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #16392 [ run ] triggered by Bot |
3ae0f8b to
d0110f4
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #16406 [ run ] triggered by Bot |
|
PR_Github #16392 [ run ] completed with state |
|
PR_Github #16406 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #16493 [ run ] triggered by Bot |
|
PR_Github #16493 [ run ] completed with state |
…ning process. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
d0110f4 to
d3c945e
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #21425 [ run ] triggered by Bot |
|
PR_Github #21425 [ run ] completed with state |
…ning process. (NVIDIA#6924) Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
…ning process. (NVIDIA#6924) Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process.
Summary by CodeRabbit
New Features
Tests