-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-6771][feat] Support MMMU for multimodal models #6828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds MMMU as a public evaluator and CLI subcommand; implements multimodal evaluation support (MultimodalLmEvalWrapper, is_multimodal flag, LM_EVAL_DEFAULT_IMAGE_PLACEHOLDER); extends image utilities to accept PIL images and handle RGBA→RGB conversion; tracks per-message multimodal placeholder counts during chat parsing. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as trtllm eval
participant MMMU as MMMU.command
participant Harness as command_harness
participant Eval as LmEvalEvaluator
participant Wrapper as MultimodalLmEvalWrapper
participant LLM as LLM
User->>CLI: trtllm eval mmmu [options]
CLI->>MMMU: invoke subcommand
MMMU->>Harness: build evaluator_kwargs (is_multimodal=True, params)
Harness->>Eval: instantiate evaluator(**kwargs)
Eval->>Eval: choose MultimodalLmEvalWrapper if MULTIMODAL
Eval->>Wrapper: construct wrapper with llm, sampling_params
Wrapper->>LLM: generate_async(prompt + media)
LLM-->>Wrapper: text outputs
Wrapper-->>Eval: per-sample results
Eval-->>MMMU: evaluation metrics/results
MMMU-->>User: print/report
sequenceDiagram
participant Client
participant Chat as parse_chat_messages_coroutines
participant Utils as add_multimodal_placeholders
Client->>Chat: messages with media
loop for each message
Chat->>Chat: load/parse media, compute mm_placeholder_count
Chat->>Utils: update parsed_msg.content with per-message placeholders
Chat->>Chat: append count to mm_placeholder_counts
end
Chat-->>Client: conversation, coroutines, mm_placeholder_counts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~35 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (3)
tensorrt_llm/inputs/utils.py (1)
456-462: Handle potential empty placeholder counts gracefully.The current implementation assumes all
mm_placeholder_countentries have at least one key when creating image placeholders. If a message has no multimodal content, this could lead to unexpected behavior.Consider adding a check for empty placeholder counts:
def handle_placeholder_exceptions(model_type: str, conversation: list[ConversationMessage], mm_placeholder_counts: list[dict[str, int]]): if model_type == "llava_next": # we need to convert the flattened content back to conversation format for conv, mm_placeholder_count in zip(conversation, mm_placeholder_counts): conv["content"] = [{"type": "text", "text": conv["content"]}, \ - *[{"type": "image"} for _ in mm_placeholder_count]] + *[{"type": "image"} for _ in mm_placeholder_count if mm_placeholder_count]]tensorrt_llm/evaluate/lm_eval.py (2)
161-161: Revise docstring to accurately reflect TRT-LLM's unlimited image capabilityThe comment about "currently unlimited in TRT-LLM" contradicts the parameter's purpose and the default value of 999 from lm_eval. Consider clarifying the actual limitation.
- max_images: Maximum number of images per prompt (currently unlimited in TRT-LLM), set to 999 from lm_eval's default value. + max_images: Maximum number of images per prompt (default: 999 from lm_eval)
321-344: Simplify nested function and improve readabilityThe
_adjust_configfunction is defined inside__init__but could be a class method or module-level function for better organization. Also, the recursive structure could be clearer.Consider extracting this as a private class method:
- # Adopted from https://github.com/EleutherAI/lm-evaluation-harness/blob/7f04db12d2f8e7a99a0830d99eb78130e1ba2122/lm_eval/evaluator.py#L290 - def _adjust_config(task_dict, random_seed): - adjusted_task_dict = {} - for task_name, task_obj in task_dict.items(): - if isinstance(task_obj, dict): - adjusted_task_dict = { - **adjusted_task_dict, - **{ - task_name: _adjust_config(task_obj, random_seed) - }, - } - else: - # NOTE: Few-shot random seed - task_obj.set_config(key="num_fewshot", value=0) - task_obj.set_fewshot_seed(seed=random_seed) - adjusted_task_dict[task_name] = task_obj - - # NOTE: Shuffle dataset - data = adjusted_task_dict[task_name].dataset - for split in data.keys(): - data[split] = data[split].shuffle(random_seed) - - return adjusted_task_dict - - self.task_dict = _adjust_config(self.task_dict, random_seed) + self.task_dict = self._adjust_config(self.task_dict, random_seed)And add as a class method:
def _adjust_config(self, task_dict, random_seed): """Adjust task configuration for evaluation. Adopted from: https://github.com/EleutherAI/lm-evaluation-harness/blob/7f04db12d2f8e7a99a0830d99eb78130e1ba2122/lm_eval/evaluator.py#L290 """ adjusted_task_dict = {} for task_name, task_obj in task_dict.items(): if isinstance(task_obj, dict): # Recursively adjust nested task dictionaries adjusted_task_dict[task_name] = self._adjust_config(task_obj, random_seed) else: # Configure individual task task_obj.set_config(key="num_fewshot", value=0) task_obj.set_fewshot_seed(seed=random_seed) adjusted_task_dict[task_name] = task_obj # Shuffle dataset splits data = task_obj.dataset for split in data.keys(): data[split] = data[split].shuffle(random_seed) return adjusted_task_dict
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
tensorrt_llm/commands/eval.py(2 hunks)tensorrt_llm/evaluate/__init__.py(1 hunks)tensorrt_llm/evaluate/lm_eval.py(10 hunks)tensorrt_llm/inputs/__init__.py(2 hunks)tensorrt_llm/inputs/utils.py(5 hunks)tensorrt_llm/serve/chat_utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tensorrt_llm/evaluate/__init__.pytensorrt_llm/inputs/__init__.pytensorrt_llm/serve/chat_utils.pytensorrt_llm/commands/eval.pytensorrt_llm/inputs/utils.pytensorrt_llm/evaluate/lm_eval.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tensorrt_llm/evaluate/__init__.pytensorrt_llm/inputs/__init__.pytensorrt_llm/serve/chat_utils.pytensorrt_llm/commands/eval.pytensorrt_llm/inputs/utils.pytensorrt_llm/evaluate/lm_eval.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tensorrt_llm/commands/eval.py
🪛 Ruff (0.12.2)
tensorrt_llm/evaluate/lm_eval.py
136-137: Module level import not at top of file
(E402)
158-158: Line too long (134 > 120)
(E501)
🔇 Additional comments (12)
tensorrt_llm/serve/chat_utils.py (2)
182-196: LGTM! Per-message multimodal placeholder tracking is well-implemented.The implementation correctly:
- Initializes
mm_placeholder_countslist to track per-message counts- Computes placeholder counts for each message individually using the tracker
- Updates message content with placeholders when counts exist
- Accumulates all per-message counts for return
This approach enables proper multimodal content handling on a per-message basis rather than globally.
175-199: No further action required: callers already handle the new 3-tuple returnThe only call site (in tensorrt_llm/serve/openai_server.py) destructures three values as expected:
conversation, mm_coroutines, mm_placeholder_counts = parse_chat_messages_coroutines(...)No other callers were found. All consumers have been updated to match the new signature.
tensorrt_llm/inputs/utils.py (5)
27-43: LGTM! Well-documented RGBA to RGB conversion utility.The function properly validates input mode, uses the alpha channel as a mask for proper transparency handling, and includes clear documentation about the background color choice. The default white background is well-justified.
45-53: LGTM! Generic image mode converter with proper RGBA handling.The function correctly delegates RGBA→RGB conversion to the specialized
rgba_to_rgbfunction and falls back to the standardconvertmethod for other conversions. Early return for same-mode cases is efficient.
74-81: LGTM! Enhanced load_image function supports PIL Image inputs.The addition of direct PIL Image support with early return for
Image.Imageinstances is well-implemented. The RGB conversion ensures consistent output format.
98-106: LGTM! Consistent PIL Image support in async variant.The async version maintains the same early-return pattern for PIL Image instances as the synchronous version, ensuring API consistency.
643-646: LGTM! Proper modality normalization for consistent handling.The normalization of
"multiple_image"to"image"ensures consistent modality handling throughout the multimodal pipeline. This prevents potential issues with placeholder generation and data tracking.tensorrt_llm/inputs/__init__.py (1)
8-11: LGTM! Proper public API extension for new utilities.The new utilities
apply_chat_templateandconvert_image_modeare correctly exported and added to__all__, making them available as part of the public API attensorrt_llm.inputs.*. This aligns with the multimodal enhancements in the PR.Also applies to: 32-34
tensorrt_llm/commands/eval.py (2)
23-24: LGTM! MMMU import added for multimodal evaluation support.The import correctly adds MMMU alongside existing evaluation classes, enabling multimodal evaluation capabilities through the CLI interface.
155-155: LGTM! MMMU command properly registered.The MMMU command is correctly added to the CLI group, making it available as a subcommand for multimodal evaluation tasks.
tensorrt_llm/evaluate/__init__.py (1)
18-18: LGTM! MMMU properly exported from evaluate package.MMMU is correctly imported from
lm_evaland added to the public API via__all__, making it available for import by other modules including the CLI interface.Also applies to: 23-23
tensorrt_llm/evaluate/lm_eval.py (1)
253-253: prompt_inputs supports string inputs as expectedThe
prompt_inputsfunction has a branch forisinstance(inputs, str)that wraps the string in aTextPrompt. Passing a string here is handled correctly, so no changes are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yechank-nvidia ,
Thanks for adding this new task to trtllm-eval!
How about addding a test in tests/integration/defs/accuracy/test_llm_api_pytorch.py? It should evaluate a multi-modal model using the new MMMU task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
tensorrt_llm/evaluate/lm_eval.py (6)
651-658: Consider making default values more flexibleBoth
is_multimodalandapply_chat_templatedefault toTrue, which creates redundancy since multimodal mode automatically enables chat templates (line 303). Consider keepingis_multimodalasTruefor MMMU but makingapply_chat_templatedefault toFalseto avoid confusion.@click.option("--apply_chat_template", is_flag=True, - default=True, + default=False, - help="Whether to apply chat template.") + help="Whether to apply chat template (automatically enabled for multimodal).")
134-139: Move imports to the top of the fileThese imports should be moved to the top of the file to follow Python's import organization conventions (PEP 8). Currently they're placed after class definitions, which violates standard practice.
-from tensorrt_llm.inputs import (ConversationMessage, MultimodalDataTracker, - add_multimodal_placeholders, - convert_image_mode) -from tensorrt_llm.inputs.utils import \ - apply_chat_template as trtllm_apply_chat_templateAdd these imports at the top of the file after line 39:
from tensorrt_llm.inputs import (ConversationMessage, MultimodalDataTracker, add_multimodal_placeholders, convert_image_mode) from tensorrt_llm.inputs.utils import apply_chat_template as trtllm_apply_chat_template
141-180: Consider improving model type extraction robustnessThe
_get_model_typemethod accesses a private attribute_hf_model_dirand has a catch-all exception handler. Consider checking for both private and public attribute variants, and catch only expected exceptions.def _get_model_type(self, llm: Union[LLM, PyTorchLLM]) -> str: """Extract model type from the model configuration.""" try: - config_path = os.path.join(llm._hf_model_dir, 'config.json') + # Try to get the model directory path + if hasattr(llm, '_hf_model_dir'): + model_dir = llm._hf_model_dir + elif hasattr(llm, 'hf_model_dir'): + model_dir = llm.hf_model_dir + else: + logger.warning("Could not find model directory attribute") + return "unknown" + + config_path = os.path.join(model_dir, 'config.json') with open(config_path, 'r') as f: config = json.load(f) - return config['model_type'] - except (KeyError, FileNotFoundError, json.JSONDecodeError) as e: + return config.get('model_type', 'unknown') + except (FileNotFoundError, json.JSONDecodeError) as e: logger.warning(f"Could not determine model type: {e}") return "unknown"
200-204: Remove incomplete interleave implementationThe interleaved text and image logic is incomplete with just a split operation and ellipsis. This dead code should be removed or properly implemented.
- if self.interleave: - # TODO: Implement interleaved text and image. - text.split(LM_EVAL_DEFAULT_IMAGE_PLACEHOLDER) - ... - else: - text = text.replace(LM_EVAL_DEFAULT_IMAGE_PLACEHOLDER, "") + # TODO: Implement interleaved text and image support + text = text.replace(LM_EVAL_DEFAULT_IMAGE_PLACEHOLDER, "")
256-259: Add error handling for missing or invalid image dataThe code assumes
media_data["visual"]exists and contains valid images. Add validation to handle cases where the visual data might be missing or malformed.- images = [ - convert_image_mode(img, "RGB") for img in media_data["visual"] - ] + images = [] + if media_data and "visual" in media_data: + try: + images = [ + convert_image_mode(img, "RGB") for img in media_data["visual"] + ] + except Exception as e: + logger.error(f"Failed to process visual data: {e}") + raise ValueError(f"Invalid visual data in request: {e}")
620-621: Pass is_multimodal=True explicitly for MMMU evaluatorThe MMMU evaluator should explicitly pass
is_multimodal=Trueto the parent class to ensure multimodal support is enabled.def __init__(self, **kwargs): + kwargs['is_multimodal'] = True super().__init__("mmmu_val", **kwargs)
🧹 Nitpick comments (2)
tensorrt_llm/inputs/utils.py (2)
74-81: Consider usingconvert_image_modefor consistencyThe function currently uses
image.convert('RGB')directly, while the rest of the codebase uses the newconvert_image_modehelper. Consider using it here for consistency.if isinstance(image, Image.Image): - return image.convert('RGB') + return convert_image_mode(image, 'RGB')
98-106: Consider usingconvert_image_modefor consistencySimilar to the synchronous version, consider using
convert_image_modefor consistency.if isinstance(image, Image.Image): - return image.convert('RGB') + return convert_image_mode(image, 'RGB')
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/evaluate/lm_eval.py(10 hunks)tensorrt_llm/inputs/utils.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tensorrt_llm/inputs/utils.pytensorrt_llm/evaluate/lm_eval.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tensorrt_llm/inputs/utils.pytensorrt_llm/evaluate/lm_eval.py
🪛 Ruff (0.12.2)
tensorrt_llm/evaluate/lm_eval.py
136-137: Module level import not at top of file
(E402)
158-158: Line too long (134 > 120)
(E501)
🔇 Additional comments (7)
tensorrt_llm/inputs/utils.py (4)
27-42: LGTM!The
rgba_to_rgbfunction is well-implemented with proper validation, error handling, and a clear explanation for using white as the default background color.
45-53: LGTM!The
convert_image_modefunction provides a clean abstraction for image mode conversion with proper handling of the RGBA to RGB case.
456-465: LGTM!The updated function correctly handles per-message placeholder counts for the llava_next model by zipping conversations with their corresponding placeholder counts.
643-646: LGTM!The normalization of "multiple_image" to "image" ensures consistent handling of image data across different modality identifiers.
tensorrt_llm/evaluate/lm_eval.py (3)
301-306: LGTM!Proper handling of multimodal mode that automatically enables chat templates with appropriate logging.
321-345: LGTM!The
_adjust_configfunction properly handles dataset shuffling and few-shot configuration with the provided random seed.
380-383: LGTM!Clean implementation that selects the appropriate wrapper class based on the multimodal flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
tensorrt_llm/evaluate/lm_eval.py (5)
36-38: Fix import ordering: Place all imports at the top of the fileThese imports violate PEP 8 by appearing after class definitions. Move them to the top of the file with the other imports.
Remove lines 36-38 and add them after line 26:
import tensorrt_llm.profiler as profiler +from tensorrt_llm.inputs import (ConversationMessage, MultimodalDataTracker, + add_multimodal_placeholders, convert_image_mode, + prompt_inputs) +from tensorrt_llm.inputs.utils import apply_chat_template as trtllm_apply_chat_template -from tensorrt_llm.inputs import prompt_inputsAlso remove the duplicate import on line 26.
167-176: Add robust error handling for model type extractionThe method accesses a private attribute
_hf_model_dirwhich could break with internal changes. Additionally, the generic exception handling might mask important errors.def _get_model_type(self, llm: Union[LLM, PyTorchLLM]) -> str: """Extract model type from the model configuration.""" try: - config_path = os.path.join(llm._hf_model_dir, 'config.json') + # Try to get the model directory path + if hasattr(llm, 'hf_model_dir'): + model_dir = llm.hf_model_dir + elif hasattr(llm, '_hf_model_dir'): + model_dir = llm._hf_model_dir + else: + logger.warning("Could not find model directory attribute") + return "unknown" + + config_path = os.path.join(model_dir, 'config.json') with open(config_path, 'r') as f: config = json.load(f) - return config['model_type'] - except (KeyError, FileNotFoundError, json.JSONDecodeError) as e: + return config.get('model_type', 'unknown') + except (FileNotFoundError, json.JSONDecodeError) as e: logger.warning(f"Could not determine model type: {e}") return "unknown"
196-201: Remove incomplete interleave implementationThe interleaved text and image logic contains dead code with just a split operation and ellipsis. This should be removed or properly implemented.
- if self.interleave: - # TODO: Implement interleaved text and image. - text.split(LM_EVAL_DEFAULT_IMAGE_PLACEHOLDER) - ... - else: - text = text.replace(LM_EVAL_DEFAULT_IMAGE_PLACEHOLDER, "") + # TODO: Implement interleaved text and image support + text = text.replace(LM_EVAL_DEFAULT_IMAGE_PLACEHOLDER, "")
253-255: Add validation for visual data in multimodal requestsThe code assumes
media_data["visual"]exists and contains valid images without any validation.- images = [ - convert_image_mode(img, "RGB") for img in media_data["visual"] - ] + images = [] + if media_data and "visual" in media_data: + try: + images = [ + convert_image_mode(img, "RGB") for img in media_data["visual"] + ] + except Exception as e: + logger.error(f"Failed to process visual data: {e}") + raise ValueError(f"Invalid visual data in request: {e}") + else: + logger.warning("No visual data found in multimodal request")
606-607: MMMU should explicitly set multimodal flagThe MMMU evaluator should explicitly pass
is_multimodal=Trueto ensure multimodal support is always enabled.def __init__(self, **kwargs): + kwargs['is_multimodal'] = True super().__init__("mmmu_val", **kwargs)
🧹 Nitpick comments (1)
tensorrt_llm/evaluate/lm_eval.py (1)
149-158: Fix line length issue in docstringLine 157 exceeds the 120 character limit.
- max_images: Maximum number of images per prompt (currently unlimited in TRT-LLM), set to 999 from lm_eval's default value. + max_images: Maximum number of images per prompt (currently unlimited in TRT-LLM), + set to 999 from lm_eval's default value.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tensorrt_llm/evaluate/lm_eval.py(10 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tensorrt_llm/evaluate/lm_eval.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tensorrt_llm/evaluate/lm_eval.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tensorrt_llm/evaluate/lm_eval.py
🪛 Ruff (0.12.2)
tensorrt_llm/evaluate/lm_eval.py
154-154: Line too long (134 > 120)
(E501)
🔇 Additional comments (3)
tensorrt_llm/evaluate/lm_eval.py (3)
44-46: LGTM! Clear documentation of the image placeholder constantGood addition of the LM_EVAL_DEFAULT_IMAGE_PLACEHOLDER constant with proper documentation referencing the source.
317-341: Elegant implementation of configuration adjustmentThe nested function
_adjust_configprovides a clean recursive approach for adjusting task configurations with proper handling of nested task dictionaries.
640-643: Good implementation ensuring MMMU multimodal configurationThe command method correctly forces multimodal and chat template flags for MMMU, ensuring the evaluator always runs in the correct mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor comments. But overall, LGTM.
15b44ec to
9bacb02
Compare
|
/bot run |
|
@syuoni, added a single test to verify it is working. I will add more tests on other models on the following up PRs. |
|
PR_Github #15579 [ run ] triggered by Bot |
|
PR_Github #15579 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! @yechank-nvidia
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
9bacb02 to
bd2f375
Compare
|
/bot run |
|
PR_Github #15753 [ run ] triggered by Bot |
|
PR_Github #15753 [ run ] completed with state |
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
/bot run |
|
PR_Github #15846 [ run ] triggered by Bot |
|
PR_Github #15846 [ run ] completed with state |
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Yuxin <yuxinz@nvidia.com>
Summary by CodeRabbit
New Features
Bug Fixes