-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-4501][feat] AutoTuner tuning config refactor and valid tactic generalization. #6545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThe changes expand the AutoTuner system to support multi-dimensional tuning configurations, new tuning parameters, and more granular cache management. Runner and tuning config interfaces are updated to accept arbitrary keyword arguments and new parameters. Related tests are updated and extended for the new configuration mechanisms, and the MoERunner integration is refactored to use the new tuning approach. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant fused_moe
participant AutoTuner
participant MoERunner
User->>fused_moe: Call fused_moe(..., tune_max_num_tokens=...)
fused_moe->>AutoTuner: choose_one(..., tune_max_num_tokens=...)
AutoTuner->>MoERunner: get_valid_tactics(..., tune_max_num_tokens=...)
AutoTuner->>MoERunner: forward(..., tactic, tune_max_num_tokens=...)
AutoTuner-->>fused_moe: Return best runner, tactic, config
fused_moe-->>User: Return result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tensorrt_llm/_torch/autotuner.py (2)
413-413: Consider breaking this long line for better readability.Line 413 exceeds the 120-character limit. Consider splitting the log message across multiple lines.
- logger.debug( - f"[Autotuner] Profiling runner={runners[best_runner_id]}, tactic={best_tactic} for cache_key={cache_key}." - ) + logger.debug( + f"[Autotuner] Profiling runner={runners[best_runner_id]}, " + f"tactic={best_tactic} for cache_key={cache_key}." + )
477-477: Break long lines for improved readability.Lines 477 and 489 exceed the 120-character limit. Consider reformatting for better readability.
- logger.warning( - f"[Autotuner] Failed when profiling runner={runner}, tactic={tac}, shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for more details." - ) + logger.warning( + f"[Autotuner] Failed when profiling runner={runner}, tactic={tac}, " + f"shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for more details." + )Also applies to: 489-489
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tensorrt_llm/_torch/autotuner.py(20 hunks)tensorrt_llm/_torch/custom_ops/torch_custom_ops.py(3 hunks)tests/unittest/_torch/test_autotuner.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/unittest/_torch/test_autotuner.pytensorrt_llm/_torch/custom_ops/torch_custom_ops.pytensorrt_llm/_torch/autotuner.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. The block should be prepended to the top of all files, including .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/unittest/_torch/test_autotuner.pytensorrt_llm/_torch/custom_ops/torch_custom_ops.pytensorrt_llm/_torch/autotuner.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
413-413: Line too long (130 > 120)
(E501)
477-477: Line too long (155 > 120)
(E501)
489-489: Line too long (160 > 120)
(E501)
550-550: Line too long (137 > 120)
(E501)
🔇 Additional comments (15)
tensorrt_llm/_torch/custom_ops/torch_custom_ops.py (2)
26-31: LGTM! Clean integration of tune_max_num_tokens into TuningConfig.The direct inclusion of
tune_max_num_tokens=8192in theTuningConfiginitialization simplifies the configuration and aligns with the enhanced autotuner interface.
183-184: LGTM! Correct parameter passing to autotuner.The
tune_max_num_tokensparameter is properly passed from the function arguments to bothchoose_onecalls, maintaining consistency with the new autotuner interface.Also applies to: 195-196
tests/unittest/_torch/test_autotuner.py (5)
22-23: LGTM! Simplified DynamicTensorSpec instantiation.The removal of lambda functions reflects the updated default behavior where
gen_tuning_bucketsandmap_to_tuning_bucketsnow have default values in the autotuner.
30-32: Good clarification on profile count calculation.The updated comment clearly explains why there are 27 profiles, making the test logic more understandable.
95-95: LGTM! Interface compatibility with enhanced TunableRunner.The addition of
**kwargsto both methods ensures compatibility with the updatedTunableRunnerinterface that supports flexible parameter passing.Also applies to: 103-104
313-329: Well-designed test class for multi-parameter tuning.The
GemmRunnerWithTacticConfigsclass effectively demonstrates the handling of additional tuning parameters through the new autotuner interface, with proper parameter passing in bothget_valid_tacticsandforwardmethods.
331-343: Comprehensive test for tactic configuration feature.The test effectively validates the end-to-end flow of multi-parameter tuning, from defining configurations to passing them through the runner's forward method. Good coverage of the new autotuner capabilities.
tensorrt_llm/_torch/autotuner.py (8)
28-29: LGTM! Sensible defaults for DynamicTensorSpec.The default values eliminate the need for users to specify
gen_tuning_bucketsandmap_to_tuning_bucketswhen not needed, simplifying the API.
46-46: Well-designed extension of TuningConfig.The addition of
kw_only=Trueenforces clearer API usage, while theconfigsdict andtune_max_num_tokensfield provide flexible support for multi-dimensional tuning configurations.Also applies to: 87-88
144-144: Clean interface extension for flexible parameter passing.The addition of
**kwargsto both methods enables runners to handle arbitrary tuning parameters while maintaining backward compatibility.Also applies to: 172-173
303-303: Proper cache structure for configuration storage.The enhanced return type correctly includes the configuration dictionary, with a sensible default empty dict for backward compatibility.
Also applies to: 321-321
323-331: Well-implemented backward-compatible API extension.The method properly handles the new
tune_max_num_tokensparameter and conditionally returns configuration dictionaries only when configs are defined, maintaining backward compatibility.Also applies to: 355-356, 372-375
588-605: Excellent handling of tune_max_num_tokens in profile generation.The logic properly uses
tune_max_num_tokenswhen available while ensuring the current input shape is always included in the optimization profiles. This provides good coverage for both tuning and inference scenarios.
640-646: Well-implemented profile matching with tune_max_num_tokens.The method correctly incorporates
tune_max_num_tokensto cap profile dimensions during nearest profile search, ensuring consistency between tuning and inference.Also applies to: 666-670
735-741: Clean implementation of configuration generation.The method efficiently generates all configuration combinations using
itertools.productwhile properly handling the case of no configurations for backward compatibility.
062541c to
5c70291
Compare
|
/bot run --disable-fail-fast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tensorrt_llm/_torch/autotuner.py (1)
413-413: Consider wrapping long lines for better readability.Several lines exceed the 120-character limit. While not critical, wrapping these lines would improve code readability.
Also applies to: 477-477, 489-489, 550-550
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tensorrt_llm/_torch/autotuner.py(20 hunks)tensorrt_llm/_torch/custom_ops/torch_custom_ops.py(3 hunks)tests/unittest/_torch/test_autotuner.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tensorrt_llm/_torch/custom_ops/torch_custom_ops.py
- tests/unittest/_torch/test_autotuner.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/_torch/autotuner.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. The block should be prepended to the top of all files, including .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/_torch/autotuner.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
413-413: Line too long (130 > 120)
(E501)
477-477: Line too long (155 > 120)
(E501)
489-489: Line too long (160 > 120)
(E501)
550-550: Line too long (137 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (12)
tensorrt_llm/_torch/autotuner.py (12)
28-29: LGTM! Sensible defaults for optional tuning functions.The default values make the tuning configuration more user-friendly by allowing users to omit these functions when not needed, which aligns with the PR objectives.
46-46: LGTM! Well-structured dataclass enhancements.The keyword-only constraint improves API clarity, and the new fields properly support multi-dimensional tuning configurations with appropriate defaults.
Also applies to: 87-88
144-144: LGTM! Backward-compatible signature enhancement.Adding
**kwargsprovides flexibility for passing additional parameters to runners without breaking existing implementations.
172-192: LGTM! Well-documented forward method enhancements.The
do_preparationparameter is a useful addition for setup operations, and the comprehensive docstring clearly explains the behavior of all parameters.
283-283: Clarify the purpose ofregistered_tuning_configs.This field is initialized but doesn't appear to be used anywhere in the file. Is this intended for future functionality or can it be removed?
303-321: LGTM! Enhanced cache mechanism with configuration storage.The updated return values properly support storing and retrieving best configurations from the cache, aligning with the PR objectives.
323-442: LGTM! Comprehensive implementation of multi-configuration tuning.The enhanced
choose_onemethod properly implements the new tuning approach with configuration search grid generation and best configuration caching. The conditional return based on config presence maintains backward compatibility.
444-497: LGTM! Robust implementation of configuration-aware profiling.The rewritten method properly implements the search grid approach with good error handling and detailed logging. The preparation phase handling is particularly well done.
511-550: LGTM! Consistent configuration propagation.The method properly handles the configuration parameter throughout the profiling process, including in the debug logging.
588-605: LGTM! Flexible profile generation with tune_max_num_tokens support.The implementation properly handles both cases (with and without
tune_max_num_tokens) and ensures the current input value is always included in the optimization profiles.
640-675: LGTM! Clean implementation of tune_max_num_tokens handling.The method properly caps dimension values when
tune_max_num_tokensis provided, and the simplified zero-check improves readability.
735-741: LGTM! Clean implementation of configuration grid generation.The method efficiently generates all configuration combinations using cartesian product, with proper handling of the empty configuration case.
|
PR_Github #13778 [ run ] triggered by Bot |
|
PR_Github #13778 [ run ] completed with state |
144fd17 to
7d9a561
Compare
|
/bot run --disable-fail-fast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tensorrt_llm/_torch/autotuner.py (4)
313-313: Update docstring to reflect new return value.The docstring should be updated to document the new
best_configreturn value.Returns: A tuple containing: - [is_cache_hit, runner_id, tactic, best_config, stored_profile] + [is_cache_hit, runner_id, tactic, best_config, stored_profile] + - best_config: Dictionary containing the best configuration found
372-375: Consider refactoring return value handling.The conditional return based on
tuning_config.configscreates different return types from the same method. Consider using a consistent return type or separate methods.One approach could be to always return the config (empty dict when not used):
- if tuning_config.configs: - return (best_runner, best_tactic, best_configs) - else: - return (best_runner, best_tactic) + return (best_runner, best_tactic, best_configs)
439-442: Same return value consistency issue.The same conditional return pattern exists here. Consider the refactoring suggestion above for consistency.
550-550: Fix line length violation.The line exceeds the 120 character limit. Consider breaking it for better readability.
- f"[Autotuner] Profiled runner={runner}, tactic={tactic}, shapes={shapes}{f', {config}' if config else ''}: {avg_time:.6f}ms." + f"[Autotuner] Profiled runner={runner}, tactic={tactic}, shapes={shapes}" + f"{f', {config}' if config else ''}: {avg_time:.6f}ms."
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tensorrt_llm/_torch/autotuner.py(21 hunks)tensorrt_llm/_torch/custom_ops/torch_custom_ops.py(3 hunks)tests/unittest/_torch/test_autotuner.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tensorrt_llm/_torch/custom_ops/torch_custom_ops.py
- tests/unittest/_torch/test_autotuner.py
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
413-413: Line too long (130 > 120)
(E501)
477-477: Line too long (155 > 120)
(E501)
489-489: Line too long (160 > 120)
(E501)
550-550: Line too long (137 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (20)
tensorrt_llm/_torch/autotuner.py (20)
28-29: LGTM! Good default values for simplified tuning setup.The default values align with the PR objective of allowing users to omit specifying tuning bucket functions. Empty tuple and identity lambda are sensible defaults.
46-88: LGTM! Well-designed configuration enhancements.The new fields properly support the multi-dimensional tuning approach:
kw_only=Trueimproves API clarity by requiring named parametersconfigsfield enables tuning parameter combinations as described in PR objectivestune_max_num_tokensprovides useful input size constraints- Comprehensive docstring with clear examples
144-144: LGTM! Good interface extension for flexible parameter passing.Adding
**kwargstoget_valid_tacticsenables the flexible parameter passing mentioned in the PR objectives.
172-173: LGTM! Well-documented parameter additions.The
do_preparationparameter and**kwargsare properly documented and serve clear purposes for the enhanced tuning functionality.
283-283: LGTM! Appropriate field addition for configuration management.The
registered_tuning_configsdictionary supports the enhanced configuration management capabilities.
303-303: LGTM! Updated return type supports configuration-aware caching.The return type now properly includes the best configuration dictionary as mentioned in PR objectives.
321-321: LGTM! Appropriate default values for cache miss.Returning empty dict as default best_config is consistent with the new configuration approach.
323-331: LGTM! Well-designed method signature expansion.The new parameters properly support the multi-dimensional tuning approach described in PR objectives. The
**kwargsenables flexible parameter passing as intended.
444-497: LGTM! Well-structured profiling logic extraction.The method properly implements the core profiling logic for multi-dimensional tuning with good error handling and statistics tracking. The inspect usage for checking method signatures is a nice touch.
However, please address the line length violations flagged by static analysis tools for better code readability.
511-511: LGTM! Proper configuration parameter integration.Adding the
configparameter enables the multi-dimensional tuning functionality.
530-530: LGTM! Consistent config parameter usage.The config is properly passed through to the runner calls.
542-542: LGTM! Consistent config parameter usage in profiling loop.
588-604: LGTM! Well-implemented tune_max_num_tokens integration.The logic properly handles both function-based and tuple-based bucket generation, and correctly adds the current input value to the optimization shapes.
630-632: LGTM! Good constraint handling for optional inputs.The check for zero-dimension tensors prevents issues with optional or scalar inputs.
666-670: LGTM! Proper tune_max_num_tokens application in nearest profile.The logic correctly applies the maximum token limit when specified.
640-646: LGTM! Consistent parameter addition.The
tune_max_num_tokensparameter addition maintains consistency with the overall refactor.
672-675: LGTM! Improved constraint handling.The zero-dimension check prevents issues with optional inputs in constraint specifications.
688-691: LGTM! Essential cache key update for correctness.Including
tune_max_num_tokensin the cache key is necessary to ensure cache correctness with the new tuning parameter.
735-741: LGTM! Clean implementation of config combination generation.The method properly implements the cartesian product of tuning parameters as described in PR objectives. The empty config handling is appropriate.
757-760: LGTM! Consistent debug logging updates.The logging properly reflects the new cache structure and includes helpful config information for debugging.
|
PR_Github #13899 [ run ] triggered by Bot |
|
PR_Github #13899 [ run ] completed with state |
7d9a561 to
7da3568
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #14558 [ run ] triggered by Bot |
|
PR_Github #14558 [ run ] completed with state |
7da3568 to
2474469
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #14771 [ run ] triggered by Bot |
|
PR_Github #14771 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #14790 [ run ] triggered by Bot |
|
PR_Github #14790 [ run ] completed with state |
2474469 to
d908bd4
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #14969 [ run ] triggered by Bot |
|
PR_Github #14969 [ run ] completed with state |
d908bd4 to
35dbd62
Compare
…or kernel configs. Adding a config entry in the tuning config to define the valid candidates for each part of the config. * AutoTuner will loop over a search grid generated from the config combinations. * Each config will be tuned along with the specific input profile. * The best config will be recorded in the cache value (instead of the cache key). And it will be recovered and used in the tunable runner forward. * Allow the user not to specify the `gen_tuning_buckets` or the `map_to_tuning_buckets` function. * Other code refactoring. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
35dbd62 to
053be2e
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #15037 [ run ] triggered by Bot |
|
PR_Github #15037 [ run ] completed with state |
Adding a config entry in the tuning config to define the valid candidates for each part of the config.
gen_tuning_bucketsor themap_to_tuning_bucketsfunction.Note: This PR does not include the "decorator refactoring" feature in #5236.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes
Tests