KEMBAR78
[None][feat] Extend VLM factory and add Mistral3 factory by 2ez4bz · Pull Request #7583 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

Conversation

@2ez4bz
Copy link
Collaborator

@2ez4bz 2ez4bz commented Sep 6, 2025

Summary by CodeRabbit

  • New Features

    • Added Mistral3 vision-language model support with dynamic image_sizes handling.
    • Improved checkpoint loading via parameter-name remapping for HF models.
    • Configurable example image dimensions for image-to-text models.
    • Generation now also stops on the model’s end_id.
    • Config validation ensures model factories exist.
    • Exposed the Mistral3 model in the public API.
  • Compatibility

    • Added export patches enabling torch.export/GM for Mistral3 and Pixtral vision paths.
  • Tests

    • Expanded test coverage for Mistral3 models and factory validation.

Description

This commit:

  • extends existing factory interfaces to enable Mistral3 in AutoDeploy.
  • adds a Mistral3 VLM factory.
  • adds various model patches for pixtral (the vision model) and mistral3 to make the VLM export compliant.
  • adjusts checkpoint loading code to take possible parameter name conversions into account.
  • fixes a sampling bug (the end_id needs to be take into account when sampling, but it is not included in the stop words' token IDs).

Test Coverage

  • adds tests for the added patches
  • adds tests for the new factory

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@2ez4bz 2ez4bz requested a review from a team as a code owner September 6, 2025 04:04
@2ez4bz 2ez4bz requested review from Fridah-nv and lucaslie and removed request for Fridah-nv September 6, 2025 04:04
@2ez4bz 2ez4bz changed the title [None][auto_deploy] Extend VLM factory and add Mistral3 factory [None][feat] Extend VLM factory and add Mistral3 factory Sep 6, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 6, 2025

📝 Walkthrough

Walkthrough

Adds registry validation for model_factory, exposes registry entries, expands HF factory with checkpoint key remapping and configurable example image dims, introduces a new Mistral3 VLM factory and export patches (Mistral3 and Pixtral) with custom ops for torch.export, updates shim to include end_id as a stop token, and adds corresponding tests.

Changes

Cohort / File(s) Summary
Config validation & registry
tensorrt_llm/_torch/auto_deploy/llm_args.py, tensorrt_llm/_torch/auto_deploy/models/factory.py
Adds AutoDeployConfig after-field validator to ensure model_factory exists in ModelFactoryRegistry; exposes ModelFactoryRegistry.entries() returning list of registered names.
Public API exposure
tensorrt_llm/_torch/auto_deploy/models/__init__.py
Adds mistral3 submodule to public imports.
HF model factory enhancements
tensorrt_llm/_torch/auto_deploy/models/hf.py
Adds checkpoint param-name remapping via pre/post load hooks, optional mapping propagation, regex-based key conversion; introduces _example_image_dims and uses dynamic image dims for example inputs; adjusts image-text example input handling.
New Mistral3 VLM factory
tensorrt_llm/_torch/auto_deploy/models/mistral3.py
Adds Mistral3VLM factory registered in ModelFactoryRegistry; defines extra input image_sizes derived from pixel_values batch; overrides example image dims to (64, 64); provides simple forward adapter.
Export patches: Mistral3 & Pixtral
tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py, tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py
Adds export patches: conditional forward path for Mistral3Model and flat image features; PixtralVisionModel forward re-route with custom ops (pixtral_process_patch_embeds, pixtral_unfold_to_2d_grid), attention mask generation, dtype guard for RMSNorm; reversible via ExportPatchRegistry.
Shim behavior
tensorrt_llm/_torch/auto_deploy/shim/demollm.py
Appends sampling_params.end_id to stop_tokens in generate_tokens_batched.
Tests: Mistral3 and config
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py, tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.py, tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py, tests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.py, tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py
Adds small model config for Mistral-Small-3.1-24B; tests extra inputs include image_sizes; tests export patches and GM parity; validates model_factory registry enforcement; adds AD build param for Mistral-Small.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant LlmArgs as AutoDeployConfig/LlmArgs
  participant Registry as ModelFactoryRegistry
  User->>LlmArgs: initialize(model_factory=...)
  LlmArgs->>Registry: entries()
  alt name in entries
    LlmArgs-->>User: initialization continues
  else name not found
    LlmArgs-->>User: raise ValidationError(ValueError msg)
  end
Loading
sequenceDiagram
  autonumber
  participant Factory as AutoModelForCausalLMFactory
  participant HF as HF Model
  participant Loader as Checkpoint Loader
  Note over Factory: Build/Load with key remapping
  Factory->>HF: install pre-load hook (remap keys)
  Factory->>HF: install post-load hook (reverse-map keys)
  Factory->>Loader: load state_dict into model
  Loader-->>HF: invokes hooks during load
  HF-->>Factory: loaded model
  Factory->>HF: remove hooks (finally)
Loading
sequenceDiagram
  autonumber
  participant Model as Mistral3Model (patched)
  participant Vision as VisionTower
  participant Export as torch.export/GM
  Note over Model: Conditional vision branch via torch.cond
  Model->>Model: has_image = image inputs present?
  alt has_image
    Model->>Vision: get_image_features(pixel_values, image_sizes)
    Vision-->>Model: image features (flat)
    Model->>Model: inject features into embeddings
  else no_image
    Model-->>Model: use text embeddings only
  end
  Model->>Export: compatible forward graph
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

Community want to contribute, AutoDeploy

Suggested reviewers

  • syuoni
  • Fridah-nv
  • galagam
  • nvchenghaoz
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Nitpick comments (14)
tensorrt_llm/_torch/auto_deploy/shim/demollm.py (2)

124-129: Include end_id safely (handle None and duplicates).

Guard against None and avoid duplicating the single-token stop sequence.

-        stop_tokens.append([sampling_params.end_id])
+        # Add EOS as a stop sequence if provided and not already present.
+        if sampling_params.end_id is not None and [sampling_params.end_id] not in stop_tokens:
+            stop_tokens.append([int(sampling_params.end_id)])

1-1: Missing NVIDIA Apache-2.0 header (2025).

Per repo guidelines, prepend the license header.

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py (2)

56-57: Typo in meta signature: max_widht.

Use max_width to mirror the op’s schema.

-    max_widht: int,
+    max_width: int,

1-7: Missing NVIDIA Apache-2.0 header (2025).

Add the license header at the top.

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

1-1: Missing NVIDIA Apache-2.0 header (2025).

Per repo standard, add the header.

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
tensorrt_llm/_torch/auto_deploy/models/__init__.py (1)

1-2: Avoid wildcard export and add license header.

  • Using from .factory import * makes the public surface implicit and brittle.
  • Per repo guidelines, prepend the NVIDIA Apache-2.0 header.

Apply header addition (example):

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from . import hf, mistral3, patches
-from .factory import *
+from .factory import (  # TODO: list explicit symbols instead of wildcard
+    *
+)
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.py (1)

14-15: Avoid relying on dict for Dim equivalence; compare stable representations.

__dict__ is brittle (e.g., __slots__ or internal changes). Compare repr instead, or specific attributes if available.

Apply:

-    # Unfortunately, direct object comparisons do not work.
-    assert pixel_values_dynamic_shape[0].__dict__ == image_sizes_dynamic_shape[0].__dict__
+    # Direct object comparisons may not work; compare stable representations.
+    assert repr(pixel_values_dynamic_shape[0]) == repr(image_sizes_dynamic_shape[0])
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)

217-221: Sort registry entries for deterministic error message.

Stable ordering aids testing and readability.

Apply:

-        if not ModelFactoryRegistry.has(value):
-            raise ValueError(
-                f"'{value}' does not exist in the model factory registry. Available values: "
-                f"{ModelFactoryRegistry.entries()}."
-            )
+        if not ModelFactoryRegistry.has(value):
+            available = ", ".join(sorted(ModelFactoryRegistry.entries()))
+            raise ValueError(
+                f"'{value}' does not exist in the model factory registry. "
+                f"Available values: [{available}]."
+            )
tensorrt_llm/_torch/auto_deploy/models/mistral3.py (1)

24-29: Guard against missing pixel_values entry

If a child overrides get_extra_inputs and renames/removes pixel_values, accessing it here will KeyError. Add a defensive check.

-        batch_dim = extra_inputs["pixel_values"][1]()[0]
+        assert "pixel_values" in extra_inputs, "pixel_values extra input is required"
+        batch_dim = extra_inputs["pixel_values"][1]()[0]
tensorrt_llm/_torch/auto_deploy/models/hf.py (3)

376-393: Regex remap: precompile patterns and guard overlapping substitutions.

Repeated re.sub per-key per-pattern can be expensive and order-dependent. Precompile once and skip if no match.

Apply:

-        conversion_mapping = self._checkpoint_conversion_mapping
+        conversion_mapping = self._checkpoint_conversion_mapping
         if conversion_mapping:
+            compiled = [(re.compile(p), r) for p, r in conversion_mapping.items()]
             keys_to_process = list(state_dict.keys())
             for key in keys_to_process:
-                new_key = key
-                for pattern, replacement in conversion_mapping.items():
-                    new_key = re.sub(pattern, replacement, new_key)
+                new_key = key
+                for pat, replacement in compiled:
+                    new_key_new, n = pat.subn(replacement, new_key)
+                    if n:
+                        new_key = new_key_new
             if new_key != key:
                 state_dict[new_key] = state_dict.pop(key)

395-432: Reverse-mapping is lossy; add sanity checks and safer reversal.

lstrip("^") and stripping (...) can mis-reverse patterns; enforce bijection and only rewrite when exactly one rule matches.

Apply:

-        reverse_key_mapping = {v: k for k, v in conversion_mapping.items()}
-        self._mapping = reverse_key_mapping
+        reverse_key_mapping = {v: k for k, v in conversion_mapping.items()}
+        # Only keep unique reversals
+        self._mapping = {k: v for k, v in reverse_key_mapping.items() if list(reverse_key_mapping.values()).count(v) == 1}

And in __call__:

-                for pattern, replacement in self._mapping.items():
-                    replacement = replacement.lstrip("^")
-                    replacement = re.sub(r"\(.*\)", "", replacement)
-                    new_key, n_replace = re.subn(pattern, replacement, key)
+                matches = []
+                for pattern, replacement in self._mapping.items():
+                    new_key_trial, n_replace = re.subn(pattern, replacement, key)
+                    if n_replace:
+                        matches.append((new_key_trial, n_replace))
+                if len(matches) == 1:
+                    new_key = matches[0][0]

563-565: Dynamic shape min(32) vs example image dims default(16).

The dynamic shape min is 32 while _example_image_dims defaults to (16, 16). Align to avoid shape-constraint contradictions in export.

Apply:

-        none_pixel_values = torch.zeros(0, 3, 336, 336)
+        none_pixel_values = torch.zeros(0, 3, 336, 336)
         return {"pixel_values": (none_pixel_values, _get_dynamic_shape)}

And in _get_dynamic_shape:

-                2: Dim("img_height", min=32, max=2048),
-                3: Dim("img_width", min=32, max=2048),
+                2: Dim("img_height", min=self._example_image_dims[0], max=2048),
+                3: Dim("img_width", min=self._example_image_dims[1], max=2048),
tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py (2)

77-87: Validity check is correct but non-obvious; consider readability.

Optional: replace XOR with equality check for clarity.

Apply:

-    if (input_ids is None) ^ (inputs_embeds is not None):
+    if (input_ids is None) == (inputs_embeds is None):
         raise ValueError("You must specify exactly one of input_ids or inputs_embeds")

103-118: Pixel dtype cast: prefer model/vision dtype over hardcoded bfloat16.

Unconditional bfloat16 can break on hardware or models not using bf16. Use the vision tower param dtype.

Apply:

-        pixel_values = pixel_values.to(torch.bfloat16)
+        target_dtype = next(self.vision_tower.parameters()).dtype if hasattr(self.vision_tower, "parameters") else pixel_values.dtype
+        pixel_values = pixel_values.to(target_dtype)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 74105a4 and 72d2793.

📒 Files selected for processing (13)
  • tensorrt_llm/_torch/auto_deploy/llm_args.py (1 hunks)
  • tensorrt_llm/_torch/auto_deploy/models/__init__.py (1 hunks)
  • tensorrt_llm/_torch/auto_deploy/models/factory.py (2 hunks)
  • tensorrt_llm/_torch/auto_deploy/models/hf.py (8 hunks)
  • tensorrt_llm/_torch/auto_deploy/models/mistral3.py (1 hunks)
  • tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py (1 hunks)
  • tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py (1 hunks)
  • tensorrt_llm/_torch/auto_deploy/shim/demollm.py (1 hunks)
  • tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1 hunks)
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.py (1 hunks)
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py (1 hunks)
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.py (2 hunks)
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tensorrt_llm/_torch/auto_deploy/llm_args.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.py
  • tensorrt_llm/_torch/auto_deploy/models/__init__.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py
  • tensorrt_llm/_torch/auto_deploy/shim/demollm.py
  • tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.py
  • tensorrt_llm/_torch/auto_deploy/models/factory.py
  • tensorrt_llm/_torch/auto_deploy/models/mistral3.py
  • tensorrt_llm/_torch/auto_deploy/models/hf.py
  • tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py
  • tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tensorrt_llm/_torch/auto_deploy/llm_args.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.py
  • tensorrt_llm/_torch/auto_deploy/models/__init__.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py
  • tensorrt_llm/_torch/auto_deploy/shim/demollm.py
  • tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.py
  • tensorrt_llm/_torch/auto_deploy/models/factory.py
  • tensorrt_llm/_torch/auto_deploy/models/mistral3.py
  • tensorrt_llm/_torch/auto_deploy/models/hf.py
  • tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py
  • tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tensorrt_llm/_torch/auto_deploy/llm_args.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.py
  • tensorrt_llm/_torch/auto_deploy/models/__init__.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py
  • tensorrt_llm/_torch/auto_deploy/shim/demollm.py
  • tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.py
  • tensorrt_llm/_torch/auto_deploy/models/factory.py
  • tensorrt_llm/_torch/auto_deploy/models/mistral3.py
  • tensorrt_llm/_torch/auto_deploy/models/hf.py
  • tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py
  • tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py
🧠 Learnings (3)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py
📚 Learning: 2025-08-18T08:42:02.640Z
Learnt from: samuellees
PR: NVIDIA/TensorRT-LLM#6974
File: tensorrt_llm/serve/scripts/benchmark_dataset.py:558-566
Timestamp: 2025-08-18T08:42:02.640Z
Learning: In TensorRT-LLM's RandomDataset (tensorrt_llm/serve/scripts/benchmark_dataset.py), when using --random-token-ids option, sequence length accuracy is prioritized over semantic correctness for benchmarking purposes. The encode/decode operations should use skip_special_tokens=True and add_special_tokens=False to ensure exact target token lengths.

Applied to files:

  • tensorrt_llm/_torch/auto_deploy/shim/demollm.py
📚 Learning: 2025-08-06T03:47:16.802Z
Learnt from: venkywonka
PR: NVIDIA/TensorRT-LLM#6650
File: tests/integration/test_lists/qa/llm_perf_cluster.yml:33-37
Timestamp: 2025-08-06T03:47:16.802Z
Learning: Ministral is a valid model name from Mistral AI, distinct from the regular Mistral models. In TensorRT-LLM test configurations, "ministral_8b" and "ministral_8b_fp8" are correct model identifiers and should not be changed to "mistral_8b".

Applied to files:

  • tensorrt_llm/_torch/auto_deploy/models/mistral3.py
🧬 Code graph analysis (9)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
tensorrt_llm/_torch/auto_deploy/models/factory.py (3)
  • ModelFactoryRegistry (266-288)
  • has (283-284)
  • entries (287-288)
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.py (3)
tensorrt_llm/_torch/auto_deploy/models/mistral3.py (2)
  • Mistral3VLM (12-56)
  • get_extra_inputs (13-29)
tensorrt_llm/_torch/auto_deploy/models/factory.py (2)
  • model (54-56)
  • get_extra_inputs (248-263)
tensorrt_llm/_torch/auto_deploy/models/hf.py (1)
  • get_extra_inputs (545-564)
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py (8)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
  • get_small_model_config (449-487)
examples/auto_deploy/build_and_run_ad.py (1)
  • ExperimentConfig (124-236)
tensorrt_llm/_torch/auto_deploy/llm_args.py (2)
  • LlmArgs (248-362)
  • create_factory (226-237)
tensorrt_llm/_torch/auto_deploy/export/interface.py (1)
  • apply_export_patches (217-260)
tensorrt_llm/_torch/auto_deploy/export/export.py (1)
  • torch_export_to_gm (198-273)
tensorrt_llm/_torch/auto_deploy/transformations/_graph.py (1)
  • move_to_device (132-139)
tensorrt_llm/_torch/auto_deploy/models/factory.py (3)
  • model (54-56)
  • build_model (63-102)
  • get_example_inputs (236-246)
tensorrt_llm/_torch/auto_deploy/models/hf.py (1)
  • get_example_inputs (493-543)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py (1)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
  • get_small_model_config (449-487)
tests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.py (1)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
  • LlmArgs (248-362)
tensorrt_llm/_torch/auto_deploy/models/mistral3.py (2)
tensorrt_llm/_torch/auto_deploy/models/factory.py (4)
  • ModelFactoryRegistry (266-288)
  • register (270-275)
  • get_extra_inputs (248-263)
  • model (54-56)
tensorrt_llm/_torch/auto_deploy/models/hf.py (5)
  • AutoModelForImageTextToTextFactory (435-571)
  • get_extra_inputs (545-564)
  • _simple_forward (118-125)
  • _simple_forward (474-491)
  • _example_image_dims (567-571)
tensorrt_llm/_torch/auto_deploy/models/hf.py (2)
examples/models/core/enc_dec/convert_checkpoint.py (1)
  • state_dict (1629-1630)
tensorrt_llm/_torch/auto_deploy/models/mistral3.py (1)
  • _example_image_dims (52-56)
tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py (3)
tensorrt_llm/_torch/auto_deploy/export/interface.py (2)
  • BaseExportPatch (38-130)
  • ExportPatchRegistry (166-213)
tensorrt_llm/functional.py (1)
  • masked_scatter (2539-2574)
tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py (2)
  • _apply_patch (217-225)
  • _revert_patch (227-231)
tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py (4)
tensorrt_llm/_torch/models/modeling_mistral.py (1)
  • Mistral3PatchMerger (508-557)
tensorrt_llm/_torch/models/modeling_pixtral.py (1)
  • PixtralVisionModel (170-256)
tensorrt_llm/_torch/auto_deploy/export/interface.py (2)
  • BaseExportPatch (38-130)
  • ExportPatchRegistry (166-213)
tensorrt_llm/functional.py (3)
  • sum (3253-3275)
  • repeat_interleave (6312-6336)
  • arange (1498-1569)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (19)
tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

286-289: Expose registry entries API — LGTM.

Simple, typed, and useful for validation.

tensorrt_llm/_torch/auto_deploy/models/__init__.py (1)

1-1: Expose mistral3 via package init — LGTM.

This ensures factory side-effects (registration) occur on import.

tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py (1)

87-95: New Mistral-Small config parameterization — LGTM.

Good coverage extension with sane backends for demollm path.

tensorrt_llm/_torch/auto_deploy/llm_args.py (1)

214-224: Registry validation on model_factory — LGTM.

Clear error with available entries improves UX and prevents silent misconfig.

tests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.py (2)

3-3: pydantic import for ValidationError — LGTM.


151-164: Negative test for non-registered factories — LGTM.

Covers both unknown and typo cases; message match is precise.

tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)

437-441: Confirm compile_backend value is supported in CI presets

"torch-simple" looks fine, but please confirm it’s recognized by the AD build/test harness for VLMs.

tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py (1)

1-3: Verify import paths resolve under pytest

Direct imports from _model_test_utils and from build_and_run_ad assume test path layout is on sys.path. If CI doesn’t set this up via conftest, prefer absolute package imports.

If needed:

-from _model_test_utils import get_small_model_config
-from build_and_run_ad import ExperimentConfig
+from tests.unittest._torch.auto_deploy._utils_test._model_test_utils import get_small_model_config
+from examples.auto_deploy.build_and_run_ad import ExperimentConfig
tensorrt_llm/_torch/auto_deploy/models/hf.py (8)

4-4: Import for regex remapping is appropriate.


103-107: Good: plumbs per-model checkpoint key remapping.

Clear comment and nullable mapping make this safe when unsupported by a given model.


177-177: Good: propagates mapping from model into factory.


348-360: Robust lifecycle around hooks and Accelerate load.

The try/finally ensures removal even on failure; using full_state_dict=False avoids interfering sync. Good.


508-523: Example inputs: image dims override is clean and testable.

Using _example_image_dims and two-image batches to avoid specialization is solid.


535-538: Dropping attention_mask may break models relying on it.

Since this is only for example inputs, it’s probably fine, but please ensure all targeted processors/models regenerate masks internally.

Run unit tests for all affected processors to confirm no regressions in export/tracing when attention_mask is absent.


566-572: Good: overridable example image dims with docstring.


337-346: Confirm PyTorch version before adding fallback
register_state_dict_post_hook exists in PyTorch 2.3+; only implement the legacy register_state_dict_hook branch if you must support older releases—otherwise drop it. Avoid poking at private attrs (_load_state_dict_pre_hooks/_state_dict_hooks); prefer the public hook-priority API.

tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py (3)

167-174: Patch application/restoration looks correct.


175-180: Revert logic is clean and symmetric.


96-120: No change needed for masked_scatter usage
Tensor.masked_scatter(mask, source) is a supported out-of-place method in PyTorch (there is no top-level torch.masked_scatter(input, mask, source)).

Likely an incorrect or invalid review comment.

@2ez4bz
Copy link
Collaborator Author

2ez4bz commented Sep 6, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17849 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17849 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13362 completed with status: 'FAILURE'

Copy link
Member

@lucaslie lucaslie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thank you for bringing this to the finish line 🙌

This commit:

* extends existing factory interfaces to enable Mistral3 in AutoDeploy.
* adds a Mistral3 VLM factory.
* adds various model patches for pixtral (the vision model) and mistral3
  to make the VLM export compliant.
* adjusts checkpoint loading code to take possible parameter name
  conversions into account.
* fixes a sampling bug (the `end_id` needs to be take into account when
  sampling, but it is not included in the stop words' token IDs).

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
@2ez4bz 2ez4bz force-pushed the dev-mistral3-autodeploy-factory branch from 72d2793 to 797df55 Compare September 8, 2025 04:09
@2ez4bz 2ez4bz enabled auto-merge (squash) September 8, 2025 04:10
@2ez4bz
Copy link
Collaborator Author

2ez4bz commented Sep 8, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18069 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18069 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13542 completed with status: 'FAILURE'

@2ez4bz
Copy link
Collaborator Author

2ez4bz commented Sep 8, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18104 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18104 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13568 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@2ez4bz 2ez4bz merged commit c53d181 into NVIDIA:main Sep 9, 2025
5 checks passed
gergely-magyar pushed a commit to gergely-magyar/TensorRT-LLM that referenced this pull request Sep 9, 2025
This commit:

* extends existing factory interfaces to enable Mistral3 in AutoDeploy.
* adds a Mistral3 VLM factory.
* adds various model patches for pixtral (the vision model) and mistral3
  to make the VLM export compliant.
* adjusts checkpoint loading code to take possible parameter name
  conversions into account.
* fixes a sampling bug (the `end_id` needs to be take into account when
  sampling, but it is not included in the stop words' token IDs).

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Gergely Magyar <gergely.magyar@visma.com>
Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025
This commit:

* extends existing factory interfaces to enable Mistral3 in AutoDeploy.
* adds a Mistral3 VLM factory.
* adds various model patches for pixtral (the vision model) and mistral3
  to make the VLM export compliant.
* adjusts checkpoint loading code to take possible parameter name
  conversions into account.
* fixes a sampling bug (the `end_id` needs to be take into account when
  sampling, but it is not included in the stop words' token IDs).

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants