KEMBAR78
[None][fix] Fix mm_placholder_counts extraction issue. by hyukn · Pull Request #7118 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

Conversation

@hyukn
Copy link
Collaborator

@hyukn hyukn commented Aug 21, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Corrected handling of multimodal placeholders in chat templates for LLaVA-Next–style models, ensuring placeholders align with each message as intended.
    • Resolves misalignment that could cause formatting issues or errors during conversation processing; no API changes required.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 21, 2025

📝 Walkthrough

Walkthrough

Adjusts apply_chat_template in tensorrt_llm/inputs/utils.py to pass a list containing mm_placeholder_counts to handle_placeholder_exceptions for proper per-message placeholder handling in models within PLACEHOLDER_EXCEPTIONS, specifically affecting the llava_next path. No public APIs changed.

Changes

Cohort / File(s) Summary of Changes
Placeholder handling in chat templates
tensorrt_llm/inputs/utils.py
Updated apply_chat_template to call handle_placeholder_exceptions with [mm_placeholder_counts] (list[dict]) instead of mm_placeholder_counts (dict), aligning with expected per-message processing for PLACEHOLDER_EXCEPTIONS (e.g., llava_next).

Sequence Diagram(s)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

Community want to contribute

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hyukn
Copy link
Collaborator Author

hyukn commented Aug 21, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16026 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/inputs/utils.py (1)

1-2: Add SPDX-style NVIDIA header to Python source

  • File: tensorrt_llm/inputs/utils.py
  • Prepend the canonical two-line SPDX header (as used by other Python modules such as tensorrt_llm/commands/eval.py) at the very top, before any imports:
+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ # SPDX-License-Identifier: Apache-2.0
  import asyncio
  import base64

This aligns with the repository’s coding guidelines and the headers already present in Python files under tensorrt_llm/commands/ and tensorrt_llm/_torch/modules/.

🧹 Nitpick comments (2)
tensorrt_llm/inputs/utils.py (2)

393-405: Add unit tests for llava_next placeholder reconstruction (0/1/N images).

Recommend tests that assert:

  • Content shape is [{"type":"text",...}, {"type":"image"}, ...] with correct image count.
  • No KeyError when there are zero images.
  • Works when placeholder is <image> or <|image|> (parametrize via registry stub/mocking).

I can draft pytest cases targeting handle_placeholder_exceptions and the default_multimodal_input_loaderapply_chat_template path. Want me to open a test PR?


396-404: Use the registry-provided placeholder and safe .get access

To avoid hard-coding “<image>” (and risking a KeyError if it ever changes or is missing), pull the placeholder string from the multimodal registry and fall back to 0 when absent. Also replace the backslash continuation with parentheses:

-    if model_type == "llava_next":
-        # we need to convert the flattened content back to conversation format
-        for conv, mm_placeholder_count in zip(conversation,
-                                              mm_placeholder_counts):
-            conv["content"] = [{"type": "text", "text": conv["content"]}, \
-                *[{"type": "image"} for _ in range(mm_placeholder_count['<image>'])]]
+    if model_type == "llava_next":
+        # convert the flattened content back to conversation format
+        for conv, mm_placeholder_count in zip(conversation, mm_placeholder_counts):
+            # lookup the exact image placeholder (e.g. "<image>") and default to 0
+            placeholder = MULTIMODAL_PLACEHOLDER_REGISTRY.get_placeholder(model_type, "image")
+            image_count = mm_placeholder_count.get(placeholder, 0)
+            conv["content"] = (
+                [{"type": "text", "text": conv["content"]}]
+                + [{"type": "image"} for _ in range(image_count)]
+            )

This ensures the code stays in sync with whatever placeholder the registry defines for llava_next and never KeyErrors if counts are empty.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2d40e87 and f8b69b9.

📒 Files selected for processing (1)
  • tensorrt_llm/inputs/utils.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tensorrt_llm/inputs/utils.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tensorrt_llm/inputs/utils.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tensorrt_llm/inputs/utils.py (2)

433-437: Correct arg shape fix for placeholder exceptions (llava_next) — good catch.

Wrapping mm_placeholder_counts in a list aligns with handle_placeholder_exceptions(..., mm_placeholder_counts: list[dict[str, int]]) and prevents zipping a dict (iterating keys) with conversation. This unblocks correct per-message handling for the llava_next path.


433-437: It looks like we’re unable to locate and inspect parse_chat_messages_coroutines in tensorrt_llm/serve/chat_utils.py. Without confirming how mm_placeholder_counts is structured (dict vs list) after parsing, we can’t be certain our padding/length-guard logic will cover all scenarios or might even break existing behavior. Please verify manually that:

  • parse_chat_messages_coroutines indeed returns mm_placeholder_counts as a single dict (not a list) for PLACEHOLDER_EXCEPTIONS models.
  • Multi-message conversations are never passed to apply_chat_template when model_type is in PLACEHOLDER_EXCEPTIONS (currently only "llava_next").
  • If lists are possible, adapt the guard accordingly to avoid wrapping an already-list in another list.

Once confirmed, you can safely apply the suggested non-breaking padding guard.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16026 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12048 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Aug 22, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16089 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16089 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12100 completed with status: 'SUCCESS'

@hyukn hyukn merged commit 983dd7e into NVIDIA:main Aug 22, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants