KEMBAR78
NVIDIA-Nemotron-Nano-12B-v2 support by lucaslie · Pull Request #147 · nv-auto-deploy/TensorRT-LLM · GitHub
Skip to content

Conversation

@lucaslie
Copy link
Collaborator

@lucaslie lucaslie commented Sep 27, 2025

see title

config.yaml

# model: ibm-ai-platform/Bamba-9B-v2
model: nvidia/NVIDIA-Nemotron-Nano-12B-v2
args:
  world_size: 1
  runtime: trtllm
  compile_backend: torch-opt
  attn_backend: flashinfer
  model_factory: AutoModelForCausalLM
  skip_loading_weights: false
  model_kwargs:
    # num_hidden_layers: 10 # min number of layers without errors
    torch_dtype: bfloat16
  • works with torch-opt + trtllm
  • NO manual patch required --> just use the stock checkpoint

2ez4bz and others added 4 commits September 26, 2025 17:00
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
@Copilot Copilot AI review requested due to automatic review settings September 27, 2025 00:24
@lucaslie lucaslie self-assigned this Sep 27, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the NVIDIA-Nemotron-Nano-12B-v2 model to the TensorRT-LLM auto deploy system.

  • Introduces model-specific patches for Nemotron-H architecture including RMS norm implementations and forward pass modifications
  • Adds configuration and test support for the new model in testing infrastructure
  • Improves attention pattern matching to handle additional grouped attention patterns
  • Fixes scale validation logic in attention custom ops to properly handle None values

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tensorrt_llm/_torch/auto_deploy/models/patches/nemotron_h.py New patch file implementing Nemotron-H specific model modifications and RMS norm replacement
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py Adds test configuration for NVIDIA-Nemotron-Nano-12B-v2 model
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py Integrates new model into build tests with transformers mode skip
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hybrid_patches.py Parameterizes hybrid patch tests to include Nemotron model
tensorrt_llm/_torch/auto_deploy/transform/library/attention.py Adds new grouped attention pattern matching for additional attention configurations
tensorrt_llm/_torch/auto_deploy/custom_ops/*.py Fixes scale validation to properly handle None values in attention operations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@lucaslie lucaslie merged commit 7bbc0ed into feat/ad_linear_attention Sep 27, 2025
3 of 7 checks passed
lucaslie added a commit that referenced this pull request Sep 29, 2025
* [None][feat] Add patches for NemotronH

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

* [None][test] unittest for nemotron_h

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

* nemotron-h support finished

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

* added anticapted path for new models on llm_models trt-llm CI

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

---------

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
nvchenghaoz pushed a commit that referenced this pull request Oct 1, 2025
* [None][feat] Add patches for NemotronH

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

* [None][test] unittest for nemotron_h

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

* nemotron-h support finished

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

* added anticapted path for new models on llm_models trt-llm CI

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

---------

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
nvchenghaoz pushed a commit that referenced this pull request Oct 3, 2025
* [None][feat] Add patches for NemotronH

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

* [None][test] unittest for nemotron_h

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

* nemotron-h support finished

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

* added anticapted path for new models on llm_models trt-llm CI

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

---------

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants