NVIDIA-Nemotron-Nano-12B-v2 support #147

lucaslie · 2025-09-27T00:24:30Z

see title

config.yaml

# model: ibm-ai-platform/Bamba-9B-v2
model: nvidia/NVIDIA-Nemotron-Nano-12B-v2
args:
  world_size: 1
  runtime: trtllm
  compile_backend: torch-opt
  attn_backend: flashinfer
  model_factory: AutoModelForCausalLM
  skip_loading_weights: false
  model_kwargs:
    # num_hidden_layers: 10 # min number of layers without errors
    torch_dtype: bfloat16

works with torch-opt + trtllm
NO manual patch required --> just use the stock checkpoint

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

Copilot

Pull Request Overview

This PR adds support for the NVIDIA-Nemotron-Nano-12B-v2 model to the TensorRT-LLM auto deploy system.

Introduces model-specific patches for Nemotron-H architecture including RMS norm implementations and forward pass modifications
Adds configuration and test support for the new model in testing infrastructure
Improves attention pattern matching to handle additional grouped attention patterns
Fixes scale validation logic in attention custom ops to properly handle None values

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`tensorrt_llm/_torch/auto_deploy/models/patches/nemotron_h.py`	New patch file implementing Nemotron-H specific model modifications and RMS norm replacement
`tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py`	Adds test configuration for NVIDIA-Nemotron-Nano-12B-v2 model
`tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py`	Integrates new model into build tests with transformers mode skip
`tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hybrid_patches.py`	Parameterizes hybrid patch tests to include Nemotron model
`tensorrt_llm/_torch/auto_deploy/transform/library/attention.py`	Adds new grouped attention pattern matching for additional attention configurations
`tensorrt_llm/_torch/auto_deploy/custom_ops/*.py`	Fixes scale validation to properly handle None values in attention operations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

tensorrt_llm/_torch/auto_deploy/models/patches/nemotron_h.py

tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py

tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py

tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hybrid_patches.py

* [None][feat] Add patches for NemotronH Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> * [None][test] unittest for nemotron_h Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> * nemotron-h support finished Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * added anticapted path for new models on llm_models trt-llm CI Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz and others added 4 commits September 26, 2025 17:00

[None][feat] Add patches for NemotronH

d6c03b4

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

[None][test] unittest for nemotron_h

22e7b69

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

nemotron-h support finished

062c631

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

added anticapted path for new models on llm_models trt-llm CI

751e53a

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

Copilot AI review requested due to automatic review settings September 27, 2025 00:24

lucaslie self-assigned this Sep 27, 2025

lucaslie requested review from 2ez4bz, nvchenghaoz and suyoggupta September 27, 2025 00:24

Copilot AI reviewed Sep 27, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/models/patches/nemotron_h.py Show resolved Hide resolved

tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py Show resolved Hide resolved

tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py Show resolved Hide resolved

lucaslie commented Sep 27, 2025

View reviewed changes

tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py Show resolved Hide resolved

lucaslie commented Sep 27, 2025

View reviewed changes

tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hybrid_patches.py Show resolved Hide resolved

nvchenghaoz approved these changes Sep 27, 2025

View reviewed changes

lucaslie mentioned this pull request Sep 27, 2025

[None][feat] Add patches for NemotronH #141

Closed

1 task

lucaslie merged commit 7bbc0ed into feat/ad_linear_attention Sep 27, 2025
3 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NVIDIA-Nemotron-Nano-12B-v2 support #147

NVIDIA-Nemotron-Nano-12B-v2 support #147

Uh oh!

lucaslie commented Sep 27, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NVIDIA-Nemotron-Nano-12B-v2 support #147

NVIDIA-Nemotron-Nano-12B-v2 support #147

Uh oh!

Conversation

lucaslie commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucaslie commented Sep 27, 2025 •

edited

Loading