-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-6541][test] Add NIM perf test cases #7924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
|
/bot run |
|
/bot run |
📝 WalkthroughWalkthroughAdds a new GPU-aware perf test list Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CI as CI Runner
participant Env as Test Env (GPU Nodes)
participant Selector as Test Selector (YAML)
participant PyTest as pytest
CI->>Env: Detect system_gpu_count
Env-->>Selector: system_gpu_count (1/2/4/8+)
Selector->>PyTest: Expand tests from matching YAML block
PyTest->>PyTest: Run perf/test_perf.py::test_perf with params
PyTest-->>CI: Report results (incl. timeouts where set)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
tests/integration/test_lists/qa/llm_perf_cluster_nim.yml (2)
59-59: Verify model alias: starcoder_15b vs starcoder_15.5b.Elsewhere we use
starcoder_15.5b. Please confirm the correct identifier to avoid mismatches.
123-123: gpus:4 under 8+ block — confirm intent.This is the only entry in the 8+ section allocating 4 GPUs. If unintentional, switch to
gpus:8; if intentional (resource shaping), ignore.Potential fix:
- - perf/test_perf.py::test_perf[deepseek_r1_0528_fp4-bench-pytorch-float4-kv_frac:0.85-input_output_len:1000,2000-reqs:3000-ep:8-tp:8-gpus:4] TIMEOUT(120) + - perf/test_perf.py::test_perf[deepseek_r1_0528_fp4-bench-pytorch-float4-kv_frac:0.85-input_output_len:1000,2000-reqs:3000-ep:8-tp:8-gpus:8] TIMEOUT(120)tests/integration/test_lists/qa/llm_perf_nim.yml (1)
200-201: Don’t gate a non‑FP8 test inside an FP8‑only block.This PyTorch bf16 Mistral test will be skipped on devices without
supports_fp8, reducing coverage. Move it to the 2‑GPU (non‑FP8) section or add FP8 quant if that was intended.Options:
- Move to the “2 gpus test” block (gte:2) and add
gpus:2ortp:2if needed.- Or, if it’s meant as FP8, change to
bench-pytorch-float8and add-quant:fp8.Example move (if keeping bf16):
- # torch backend - - perf/test_perf.py::test_perf[mistral_7b_v0.1-bench-pytorch-float16-input_output_len:128,128] + # torch backend (bf16) — not FP8, relocate under “2 gpus test”And add under the 2‑GPU block:
+ # torch backend + - perf/test_perf.py::test_perf[mistral_7b_v0.1-bench-pytorch-float16-input_output_len:128,128-gpus:2]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/integration/test_lists/qa/llm_perf_cluster_nim.yml(1 hunks)tests/integration/test_lists/qa/llm_perf_nim.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: fredricz-20070104
PR: NVIDIA/TensorRT-LLM#7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.
📚 Learning: 2025-08-26T09:49:04.956Z
Learnt from: pengbowang-nv
PR: NVIDIA/TensorRT-LLM#7192
File: tests/integration/test_lists/test-db/l0_dgx_b200.yml:56-72
Timestamp: 2025-08-26T09:49:04.956Z
Learning: In TensorRT-LLM test configuration files, the test scheduling system handles wildcard matching with special rules that prevent duplicate test execution even when the same tests appear in multiple yaml files with overlapping GPU wildcards (e.g., "*b200*" and "*gb200*").
Applied to files:
tests/integration/test_lists/qa/llm_perf_cluster_nim.ymltests/integration/test_lists/qa/llm_perf_nim.yml
📚 Learning: 2025-09-09T09:40:45.658Z
Learnt from: fredricz-20070104
PR: NVIDIA/TensorRT-LLM#7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.
Applied to files:
tests/integration/test_lists/qa/llm_perf_cluster_nim.ymltests/integration/test_lists/qa/llm_perf_nim.yml
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/test_lists/qa/llm_perf_nim.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
|
PR_Github #19680 [ run ] triggered by Bot |
|
PR_Github #19681 [ run ] triggered by Bot |
|
PR_Github #19680 [ run ] completed with state |
|
/bot run |
|
PR_Github #19682 [ run ] triggered by Bot |
|
PR_Github #19681 [ run ] completed with state |
|
PR_Github #19682 [ run ] completed with state |
Summary by CodeRabbit
Add NIM perf test cases, according to the suggestion of Joyjit Daw.
Add two perf test case files, one is for single node, the other is for cluster.