-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[None][feat] Extend VLM factory and add Mistral3 factory #7583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds registry validation for model_factory, exposes registry entries, expands HF factory with checkpoint key remapping and configurable example image dims, introduces a new Mistral3 VLM factory and export patches (Mistral3 and Pixtral) with custom ops for torch.export, updates shim to include end_id as a stop token, and adds corresponding tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant LlmArgs as AutoDeployConfig/LlmArgs
participant Registry as ModelFactoryRegistry
User->>LlmArgs: initialize(model_factory=...)
LlmArgs->>Registry: entries()
alt name in entries
LlmArgs-->>User: initialization continues
else name not found
LlmArgs-->>User: raise ValidationError(ValueError msg)
end
sequenceDiagram
autonumber
participant Factory as AutoModelForCausalLMFactory
participant HF as HF Model
participant Loader as Checkpoint Loader
Note over Factory: Build/Load with key remapping
Factory->>HF: install pre-load hook (remap keys)
Factory->>HF: install post-load hook (reverse-map keys)
Factory->>Loader: load state_dict into model
Loader-->>HF: invokes hooks during load
HF-->>Factory: loaded model
Factory->>HF: remove hooks (finally)
sequenceDiagram
autonumber
participant Model as Mistral3Model (patched)
participant Vision as VisionTower
participant Export as torch.export/GM
Note over Model: Conditional vision branch via torch.cond
Model->>Model: has_image = image inputs present?
alt has_image
Model->>Vision: get_image_features(pixel_values, image_sizes)
Vision-->>Model: image features (flat)
Model->>Model: inject features into embeddings
else no_image
Model-->>Model: use text embeddings only
end
Model->>Export: compatible forward graph
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 19
🧹 Nitpick comments (14)
tensorrt_llm/_torch/auto_deploy/shim/demollm.py (2)
124-129: Include end_id safely (handle None and duplicates).Guard against None and avoid duplicating the single-token stop sequence.
- stop_tokens.append([sampling_params.end_id]) + # Add EOS as a stop sequence if provided and not already present. + if sampling_params.end_id is not None and [sampling_params.end_id] not in stop_tokens: + stop_tokens.append([int(sampling_params.end_id)])
1-1: Missing NVIDIA Apache-2.0 header (2025).Per repo guidelines, prepend the license header.
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License.tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py (2)
56-57: Typo in meta signature:max_widht.Use
max_widthto mirror the op’s schema.- max_widht: int, + max_width: int,
1-7: Missing NVIDIA Apache-2.0 header (2025).Add the license header at the top.
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License.tensorrt_llm/_torch/auto_deploy/models/factory.py (1)
1-1: Missing NVIDIA Apache-2.0 header (2025).Per repo standard, add the header.
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License.tensorrt_llm/_torch/auto_deploy/models/__init__.py (1)
1-2: Avoid wildcard export and add license header.
- Using
from .factory import *makes the public surface implicit and brittle.- Per repo guidelines, prepend the NVIDIA Apache-2.0 header.
Apply header addition (example):
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + from . import hf, mistral3, patches -from .factory import * +from .factory import ( # TODO: list explicit symbols instead of wildcard + * +)tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.py (1)
14-15: Avoid relying on dict for Dim equivalence; compare stable representations.
__dict__is brittle (e.g.,__slots__or internal changes). Compare repr instead, or specific attributes if available.Apply:
- # Unfortunately, direct object comparisons do not work. - assert pixel_values_dynamic_shape[0].__dict__ == image_sizes_dynamic_shape[0].__dict__ + # Direct object comparisons may not work; compare stable representations. + assert repr(pixel_values_dynamic_shape[0]) == repr(image_sizes_dynamic_shape[0])tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
217-221: Sort registry entries for deterministic error message.Stable ordering aids testing and readability.
Apply:
- if not ModelFactoryRegistry.has(value): - raise ValueError( - f"'{value}' does not exist in the model factory registry. Available values: " - f"{ModelFactoryRegistry.entries()}." - ) + if not ModelFactoryRegistry.has(value): + available = ", ".join(sorted(ModelFactoryRegistry.entries())) + raise ValueError( + f"'{value}' does not exist in the model factory registry. " + f"Available values: [{available}]." + )tensorrt_llm/_torch/auto_deploy/models/mistral3.py (1)
24-29: Guard against missing pixel_values entryIf a child overrides get_extra_inputs and renames/removes pixel_values, accessing it here will KeyError. Add a defensive check.
- batch_dim = extra_inputs["pixel_values"][1]()[0] + assert "pixel_values" in extra_inputs, "pixel_values extra input is required" + batch_dim = extra_inputs["pixel_values"][1]()[0]tensorrt_llm/_torch/auto_deploy/models/hf.py (3)
376-393: Regex remap: precompile patterns and guard overlapping substitutions.Repeated
re.subper-key per-pattern can be expensive and order-dependent. Precompile once and skip if no match.Apply:
- conversion_mapping = self._checkpoint_conversion_mapping + conversion_mapping = self._checkpoint_conversion_mapping if conversion_mapping: + compiled = [(re.compile(p), r) for p, r in conversion_mapping.items()] keys_to_process = list(state_dict.keys()) for key in keys_to_process: - new_key = key - for pattern, replacement in conversion_mapping.items(): - new_key = re.sub(pattern, replacement, new_key) + new_key = key + for pat, replacement in compiled: + new_key_new, n = pat.subn(replacement, new_key) + if n: + new_key = new_key_new if new_key != key: state_dict[new_key] = state_dict.pop(key)
395-432: Reverse-mapping is lossy; add sanity checks and safer reversal.
lstrip("^")and stripping(...)can mis-reverse patterns; enforce bijection and only rewrite when exactly one rule matches.Apply:
- reverse_key_mapping = {v: k for k, v in conversion_mapping.items()} - self._mapping = reverse_key_mapping + reverse_key_mapping = {v: k for k, v in conversion_mapping.items()} + # Only keep unique reversals + self._mapping = {k: v for k, v in reverse_key_mapping.items() if list(reverse_key_mapping.values()).count(v) == 1}And in
__call__:- for pattern, replacement in self._mapping.items(): - replacement = replacement.lstrip("^") - replacement = re.sub(r"\(.*\)", "", replacement) - new_key, n_replace = re.subn(pattern, replacement, key) + matches = [] + for pattern, replacement in self._mapping.items(): + new_key_trial, n_replace = re.subn(pattern, replacement, key) + if n_replace: + matches.append((new_key_trial, n_replace)) + if len(matches) == 1: + new_key = matches[0][0]
563-565: Dynamic shape min(32) vs example image dims default(16).The dynamic shape min is 32 while
_example_image_dimsdefaults to (16, 16). Align to avoid shape-constraint contradictions in export.Apply:
- none_pixel_values = torch.zeros(0, 3, 336, 336) + none_pixel_values = torch.zeros(0, 3, 336, 336) return {"pixel_values": (none_pixel_values, _get_dynamic_shape)}And in
_get_dynamic_shape:- 2: Dim("img_height", min=32, max=2048), - 3: Dim("img_width", min=32, max=2048), + 2: Dim("img_height", min=self._example_image_dims[0], max=2048), + 3: Dim("img_width", min=self._example_image_dims[1], max=2048),tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py (2)
77-87: Validity check is correct but non-obvious; consider readability.Optional: replace XOR with equality check for clarity.
Apply:
- if (input_ids is None) ^ (inputs_embeds is not None): + if (input_ids is None) == (inputs_embeds is None): raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
103-118: Pixel dtype cast: prefer model/vision dtype over hardcoded bfloat16.Unconditional
bfloat16can break on hardware or models not using bf16. Use the vision tower param dtype.Apply:
- pixel_values = pixel_values.to(torch.bfloat16) + target_dtype = next(self.vision_tower.parameters()).dtype if hasattr(self.vision_tower, "parameters") else pixel_values.dtype + pixel_values = pixel_values.to(target_dtype)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (13)
tensorrt_llm/_torch/auto_deploy/llm_args.py(1 hunks)tensorrt_llm/_torch/auto_deploy/models/__init__.py(1 hunks)tensorrt_llm/_torch/auto_deploy/models/factory.py(2 hunks)tensorrt_llm/_torch/auto_deploy/models/hf.py(8 hunks)tensorrt_llm/_torch/auto_deploy/models/mistral3.py(1 hunks)tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py(1 hunks)tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py(1 hunks)tensorrt_llm/_torch/auto_deploy/shim/demollm.py(1 hunks)tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py(1 hunks)tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.py(1 hunks)tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py(1 hunks)tests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.py(2 hunks)tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tensorrt_llm/_torch/auto_deploy/llm_args.pytests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.pytensorrt_llm/_torch/auto_deploy/models/__init__.pytests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.pytensorrt_llm/_torch/auto_deploy/shim/demollm.pytests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.pytests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.pytests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.pytensorrt_llm/_torch/auto_deploy/models/factory.pytensorrt_llm/_torch/auto_deploy/models/mistral3.pytensorrt_llm/_torch/auto_deploy/models/hf.pytensorrt_llm/_torch/auto_deploy/models/patches/mistral3.pytensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tensorrt_llm/_torch/auto_deploy/llm_args.pytests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.pytensorrt_llm/_torch/auto_deploy/models/__init__.pytests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.pytensorrt_llm/_torch/auto_deploy/shim/demollm.pytests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.pytests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.pytests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.pytensorrt_llm/_torch/auto_deploy/models/factory.pytensorrt_llm/_torch/auto_deploy/models/mistral3.pytensorrt_llm/_torch/auto_deploy/models/hf.pytensorrt_llm/_torch/auto_deploy/models/patches/mistral3.pytensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tensorrt_llm/_torch/auto_deploy/llm_args.pytests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.pytensorrt_llm/_torch/auto_deploy/models/__init__.pytests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.pytensorrt_llm/_torch/auto_deploy/shim/demollm.pytests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.pytests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.pytests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.pytensorrt_llm/_torch/auto_deploy/models/factory.pytensorrt_llm/_torch/auto_deploy/models/mistral3.pytensorrt_llm/_torch/auto_deploy/models/hf.pytensorrt_llm/_torch/auto_deploy/models/patches/mistral3.pytensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py
🧠 Learnings (3)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py
📚 Learning: 2025-08-18T08:42:02.640Z
Learnt from: samuellees
PR: NVIDIA/TensorRT-LLM#6974
File: tensorrt_llm/serve/scripts/benchmark_dataset.py:558-566
Timestamp: 2025-08-18T08:42:02.640Z
Learning: In TensorRT-LLM's RandomDataset (tensorrt_llm/serve/scripts/benchmark_dataset.py), when using --random-token-ids option, sequence length accuracy is prioritized over semantic correctness for benchmarking purposes. The encode/decode operations should use skip_special_tokens=True and add_special_tokens=False to ensure exact target token lengths.
Applied to files:
tensorrt_llm/_torch/auto_deploy/shim/demollm.py
📚 Learning: 2025-08-06T03:47:16.802Z
Learnt from: venkywonka
PR: NVIDIA/TensorRT-LLM#6650
File: tests/integration/test_lists/qa/llm_perf_cluster.yml:33-37
Timestamp: 2025-08-06T03:47:16.802Z
Learning: Ministral is a valid model name from Mistral AI, distinct from the regular Mistral models. In TensorRT-LLM test configurations, "ministral_8b" and "ministral_8b_fp8" are correct model identifiers and should not be changed to "mistral_8b".
Applied to files:
tensorrt_llm/_torch/auto_deploy/models/mistral3.py
🧬 Code graph analysis (9)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
tensorrt_llm/_torch/auto_deploy/models/factory.py (3)
ModelFactoryRegistry(266-288)has(283-284)entries(287-288)
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3.py (3)
tensorrt_llm/_torch/auto_deploy/models/mistral3.py (2)
Mistral3VLM(12-56)get_extra_inputs(13-29)tensorrt_llm/_torch/auto_deploy/models/factory.py (2)
model(54-56)get_extra_inputs(248-263)tensorrt_llm/_torch/auto_deploy/models/hf.py (1)
get_extra_inputs(545-564)
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py (8)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
get_small_model_config(449-487)examples/auto_deploy/build_and_run_ad.py (1)
ExperimentConfig(124-236)tensorrt_llm/_torch/auto_deploy/llm_args.py (2)
LlmArgs(248-362)create_factory(226-237)tensorrt_llm/_torch/auto_deploy/export/interface.py (1)
apply_export_patches(217-260)tensorrt_llm/_torch/auto_deploy/export/export.py (1)
torch_export_to_gm(198-273)tensorrt_llm/_torch/auto_deploy/transformations/_graph.py (1)
move_to_device(132-139)tensorrt_llm/_torch/auto_deploy/models/factory.py (3)
model(54-56)build_model(63-102)get_example_inputs(236-246)tensorrt_llm/_torch/auto_deploy/models/hf.py (1)
get_example_inputs(493-543)
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py (1)
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
get_small_model_config(449-487)
tests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.py (1)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
LlmArgs(248-362)
tensorrt_llm/_torch/auto_deploy/models/mistral3.py (2)
tensorrt_llm/_torch/auto_deploy/models/factory.py (4)
ModelFactoryRegistry(266-288)register(270-275)get_extra_inputs(248-263)model(54-56)tensorrt_llm/_torch/auto_deploy/models/hf.py (5)
AutoModelForImageTextToTextFactory(435-571)get_extra_inputs(545-564)_simple_forward(118-125)_simple_forward(474-491)_example_image_dims(567-571)
tensorrt_llm/_torch/auto_deploy/models/hf.py (2)
examples/models/core/enc_dec/convert_checkpoint.py (1)
state_dict(1629-1630)tensorrt_llm/_torch/auto_deploy/models/mistral3.py (1)
_example_image_dims(52-56)
tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py (3)
tensorrt_llm/_torch/auto_deploy/export/interface.py (2)
BaseExportPatch(38-130)ExportPatchRegistry(166-213)tensorrt_llm/functional.py (1)
masked_scatter(2539-2574)tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py (2)
_apply_patch(217-225)_revert_patch(227-231)
tensorrt_llm/_torch/auto_deploy/models/patches/pixtral.py (4)
tensorrt_llm/_torch/models/modeling_mistral.py (1)
Mistral3PatchMerger(508-557)tensorrt_llm/_torch/models/modeling_pixtral.py (1)
PixtralVisionModel(170-256)tensorrt_llm/_torch/auto_deploy/export/interface.py (2)
BaseExportPatch(38-130)ExportPatchRegistry(166-213)tensorrt_llm/functional.py (3)
sum(3253-3275)repeat_interleave(6312-6336)arange(1498-1569)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (19)
tensorrt_llm/_torch/auto_deploy/models/factory.py (1)
286-289: Expose registry entries API — LGTM.Simple, typed, and useful for validation.
tensorrt_llm/_torch/auto_deploy/models/__init__.py (1)
1-1: Expose mistral3 via package init — LGTM.This ensures factory side-effects (registration) occur on import.
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py (1)
87-95: New Mistral-Small config parameterization — LGTM.Good coverage extension with sane backends for demollm path.
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
214-224: Registry validation on model_factory — LGTM.Clear error with available entries improves UX and prevents silent misconfig.
tests/unittest/_torch/auto_deploy/unit/singlegpu/shim/test_llm_config.py (2)
3-3: pydantic import for ValidationError — LGTM.
151-164: Negative test for non-registered factories — LGTM.Covers both unknown and typo cases; message match is precise.
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py (1)
437-441: Confirm compile_backend value is supported in CI presets
"torch-simple"looks fine, but please confirm it’s recognized by the AD build/test harness for VLMs.tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py (1)
1-3: Verify import paths resolve under pytestDirect imports
from _model_test_utilsandfrom build_and_run_adassume test path layout is on sys.path. If CI doesn’t set this up via conftest, prefer absolute package imports.If needed:
-from _model_test_utils import get_small_model_config -from build_and_run_ad import ExperimentConfig +from tests.unittest._torch.auto_deploy._utils_test._model_test_utils import get_small_model_config +from examples.auto_deploy.build_and_run_ad import ExperimentConfigtensorrt_llm/_torch/auto_deploy/models/hf.py (8)
4-4: Import for regex remapping is appropriate.
103-107: Good: plumbs per-model checkpoint key remapping.Clear comment and nullable mapping make this safe when unsupported by a given model.
177-177: Good: propagates mapping from model into factory.
348-360: Robust lifecycle around hooks and Accelerate load.The try/finally ensures removal even on failure; using
full_state_dict=Falseavoids interfering sync. Good.
508-523: Example inputs: image dims override is clean and testable.Using
_example_image_dimsand two-image batches to avoid specialization is solid.
535-538: Dropping attention_mask may break models relying on it.Since this is only for example inputs, it’s probably fine, but please ensure all targeted processors/models regenerate masks internally.
Run unit tests for all affected processors to confirm no regressions in export/tracing when attention_mask is absent.
566-572: Good: overridable example image dims with docstring.
337-346: Confirm PyTorch version before adding fallback
register_state_dict_post_hook exists in PyTorch 2.3+; only implement the legacy register_state_dict_hook branch if you must support older releases—otherwise drop it. Avoid poking at private attrs (_load_state_dict_pre_hooks/_state_dict_hooks); prefer the public hook-priority API.tensorrt_llm/_torch/auto_deploy/models/patches/mistral3.py (3)
167-174: Patch application/restoration looks correct.
175-180: Revert logic is clean and symmetric.
96-120: No change needed for masked_scatter usage
Tensor.masked_scatter(mask, source)is a supported out-of-place method in PyTorch (there is no top-leveltorch.masked_scatter(input, mask, source)).Likely an incorrect or invalid review comment.
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py
Show resolved
Hide resolved
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py
Show resolved
Hide resolved
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_mistral3_patches.py
Show resolved
Hide resolved
|
/bot run |
|
PR_Github #17849 [ run ] triggered by Bot |
|
PR_Github #17849 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thank you for bringing this to the finish line 🙌
This commit: * extends existing factory interfaces to enable Mistral3 in AutoDeploy. * adds a Mistral3 VLM factory. * adds various model patches for pixtral (the vision model) and mistral3 to make the VLM export compliant. * adjusts checkpoint loading code to take possible parameter name conversions into account. * fixes a sampling bug (the `end_id` needs to be take into account when sampling, but it is not included in the stop words' token IDs). Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
72d2793 to
797df55
Compare
|
/bot run |
|
PR_Github #18069 [ run ] triggered by Bot |
|
PR_Github #18069 [ run ] completed with state |
|
/bot run |
|
PR_Github #18104 [ run ] triggered by Bot |
|
PR_Github #18104 [ run ] completed with state |
This commit: * extends existing factory interfaces to enable Mistral3 in AutoDeploy. * adds a Mistral3 VLM factory. * adds various model patches for pixtral (the vision model) and mistral3 to make the VLM export compliant. * adjusts checkpoint loading code to take possible parameter name conversions into account. * fixes a sampling bug (the `end_id` needs to be take into account when sampling, but it is not included in the stop words' token IDs). Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Gergely Magyar <gergely.magyar@visma.com>
This commit: * extends existing factory interfaces to enable Mistral3 in AutoDeploy. * adds a Mistral3 VLM factory. * adds various model patches for pixtral (the vision model) and mistral3 to make the VLM export compliant. * adjusts checkpoint loading code to take possible parameter name conversions into account. * fixes a sampling bug (the `end_id` needs to be take into account when sampling, but it is not included in the stop words' token IDs). Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Summary by CodeRabbit
New Features
Compatibility
Tests
Description
This commit:
end_idneeds to be take into account when sampling, but it is not included in the stop words' token IDs).Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.