-
Notifications
You must be signed in to change notification settings - Fork 1
NVIDIA-Nemotron-Nano-12B-v2 support #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the NVIDIA-Nemotron-Nano-12B-v2 model to the TensorRT-LLM auto deploy system.
- Introduces model-specific patches for Nemotron-H architecture including RMS norm implementations and forward pass modifications
- Adds configuration and test support for the new model in testing infrastructure
- Improves attention pattern matching to handle additional grouped attention patterns
- Fixes scale validation logic in attention custom ops to properly handle None values
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
tensorrt_llm/_torch/auto_deploy/models/patches/nemotron_h.py |
New patch file implementing Nemotron-H specific model modifications and RMS norm replacement |
tests/unittest/_torch/auto_deploy/_utils_test/_model_test_utils.py |
Adds test configuration for NVIDIA-Nemotron-Nano-12B-v2 model |
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py |
Integrates new model into build tests with transformers mode skip |
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hybrid_patches.py |
Parameterizes hybrid patch tests to include Nemotron model |
tensorrt_llm/_torch/auto_deploy/transform/library/attention.py |
Adds new grouped attention pattern matching for additional attention configurations |
tensorrt_llm/_torch/auto_deploy/custom_ops/*.py |
Fixes scale validation to properly handle None values in attention operations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_build_small_single.py
Show resolved
Hide resolved
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hybrid_patches.py
Show resolved
Hide resolved
* [None][feat] Add patches for NemotronH Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> * [None][test] unittest for nemotron_h Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> * nemotron-h support finished Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * added anticapted path for new models on llm_models trt-llm CI Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
* [None][feat] Add patches for NemotronH Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> * [None][test] unittest for nemotron_h Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> * nemotron-h support finished Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * added anticapted path for new models on llm_models trt-llm CI Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
* [None][feat] Add patches for NemotronH Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> * [None][test] unittest for nemotron_h Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> * nemotron-h support finished Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * added anticapted path for new models on llm_models trt-llm CI Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
see title
config.yaml
torch-opt
+trtllm