[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell #6710

lfr-0531 · 2025-08-07T11:58:09Z

Description

Add DeepSeek-R1 FP8 accuracy tests on Blackwell.

Test Coverage

accuracy/test_llm_api_pytorch.py::TestDeepSeekR1::test_fp8_blockscale[throughput]

Summary by CodeRabbit

Tests
- Added accuracy and throughput tests for DeepSeek‑R1 with FP8_BLOCK_SCALES + FP8 KV-cache; test lists and timeout entries updated.
- Test runtime/configuration now varies conditionally based on hardware platform version.
Documentation
- Added accuracy reference entries reporting results for the new quantization configurations.
Chores
- Expanded CI multi‑node PyTorch test matrix with an additional GB200 multi‑node stage.

coderabbitai · 2025-08-07T11:58:17Z

📝 Walkthrough

Walkthrough

Test adjusts MOE and KV-cache initialization in TestDeepSeekR1.test_fp8_blockscale based on SM version; accuracy YAMLs add FP8 kv_cache entries for DeepSeek-R1; the test is added to QA lists and GB200 multi-node timeout; Jenkins multi-node splits increased by one.

Changes

Cohort / File(s)	Change Summary
Conditional Test Config Update `tests/integration/defs/accuracy/test_llm_api_pytorch.py`	Made `moe_config` and `kv_cache_config` conditional on `get_sm_version()` inside `TestDeepSeekR1.test_fp8_blockscale`, and added `moe_config` into `pytorch_config`.
Accuracy Reference Additions `tests/integration/defs/accuracy/references/gsm8k.yaml`, `tests/integration/defs/accuracy/references/mmlu.yaml`	Added `kv_cache_quant_algo: FP8` entries for `deepseek-ai/DeepSeek-R1` under `FP8_BLOCK_SCALES` (retained existing entries); added a new Qwen3 quant entry in MMLU.
Test List Registration `tests/integration/test_lists/qa/llm_function_full.txt`	Inserted `accuracy/test_llm_api_pytorch.py::TestDeepSeekR1::test_fp8_blockscale[throughput]` into the QA LLM function list.
Test Timeout Update `tests/integration/test_lists/test-db/l0_gb200_multi_nodes.yml`	Added the new test to the 180s timeout list for GB200 8-GPU multi-node PyTorch post-merge tests.
Jenkins Multi-node Stage `jenkins/L0_Test.groovy`	Added a new `GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-7` entry and incremented split_count to 7 for GB200 multi-node PyTorch post-merge splits.

Sequence Diagram(s)

sequenceDiagram
    participant Runner
    participant TestDeepSeekR1
    participant SMChecker

    Runner->>TestDeepSeekR1: run test_fp8_blockscale
    TestDeepSeekR1->>SMChecker: get_sm_version()
    SMChecker-->>TestDeepSeekR1: sm_version
    alt sm_version == 100
        TestDeepSeekR1->>TestDeepSeekR1: set moe_config(backend="DEEPGEMM", max_num_tokens=16384)\nset kv_cache_config(free_gpu_memory_fraction=0.6)
    else
        TestDeepSeekR1->>TestDeepSeekR1: set moe_config(default)\nset kv_cache_config(free_gpu_memory_fraction=0.9)
    end
    TestDeepSeekR1->>Runner: proceed with selected configs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related PRs

fix: Fix DeepSeek R1 CI #6129 — modifies TestDeepSeekR1 MOE and kv_cache-related configuration handling.
test: add accuracy reference #6479 — updates accuracy references and FP8 KV-cache accuracy/test entries.

Suggested labels

Community want to contribute

Suggested reviewers

litaotju
yizhang-nv
brb-nv
yuxianq
syuoni

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b9781e and 16f1394.

📒 Files selected for processing (2)

tests/integration/defs/accuracy/test_llm_api_pytorch.py (1 hunks)
tests/integration/test_lists/test-db/l0_dgx_b200.yml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tests/integration/defs/accuracy/test_llm_api_pytorch.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tests/integration/defs/accuracy/test_llm_api_pytorch.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/integration/test_lists/test-db/l0_dgx_b200.yml
tests/integration/defs/accuracy/test_llm_api_pytorch.py

📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...

Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

tests/integration/test_lists/test-db/l0_dgx_b200.yml

🧬 Code Graph Analysis (1)

tests/integration/defs/accuracy/test_llm_api_pytorch.py (3)

tensorrt_llm/_utils.py (1)

get_sm_version (681-683)

tests/integration/defs/conftest.py (1)

get_sm_version (1857-1860)

tensorrt_llm/llmapi/llm_args.py (1)

MoeConfig (166-188)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

tests/integration/defs/accuracy/test_llm_api_pytorch.py (1)

1634-1642: LGTM: Appropriate hardware-specific MoE backend selection.

The conditional logic correctly selects the DEEPGEMM backend for Blackwell B200 GPUs (SM version 100) while falling back to the default MoeConfig for other hardware. This aligns with the PR objective of adding DeepSeek-R1 FP8 accuracy tests specifically for Blackwell platform.

tests/integration/test_lists/test-db/l0_dgx_b200.yml

lfr-0531 · 2025-08-07T12:35:29Z

/bot run --post-merge

tensorrt-cicd · 2025-08-07T12:40:38Z

PR_Github #14470 [ run ] triggered by Bot

lfr-0531 · 2025-08-07T13:15:19Z

/bot kill

tensorrt-cicd · 2025-08-07T13:20:29Z

PR_Github #14475 [ kill ] triggered by Bot

tensorrt-cicd · 2025-08-07T13:20:30Z

PR_Github #14470 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-08-07T13:21:01Z

PR_Github #14475 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit da19e6e

tests/integration/test_lists/test-db/l0_dgx_b200.yml

lfr-0531 · 2025-08-07T15:34:34Z

/bot run --post-merge

tensorrt-cicd · 2025-08-07T15:40:30Z

PR_Github #14490 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-07T22:03:14Z

PR_Github #14490 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #10945 completed with status: 'FAILURE'

lfr-0531 · 2025-08-08T03:08:44Z

/bot run --post-merge --disable-fail-fast

tensorrt-cicd · 2025-08-08T03:13:54Z

PR_Github #14545 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-08T09:41:31Z

PR_Github #14545 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10989 completed with status: 'FAILURE'

lfr-0531 · 2025-08-10T02:55:41Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-10T03:01:17Z

PR_Github #14688 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-10T03:24:16Z

PR_Github #14688 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11086 completed with status: 'SUCCESS'

lfr-0531 · 2025-08-10T13:56:16Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2025-08-10T14:01:47Z

PR_Github #14709 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-10T16:47:31Z

PR_Github #14709 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11101 completed with status: 'FAILURE'

lfr-0531 · 2025-08-11T03:40:54Z

/bot run --disable-fail-fast --post-merge --only-multi-gpu-test

tensorrt-cicd · 2025-08-11T03:45:53Z

PR_Github #14740 [ run ] triggered by Bot

lfr-0531 · 2025-08-11T07:33:12Z

/bot kill

tensorrt-cicd · 2025-08-11T07:38:14Z

PR_Github #14770 [ kill ] triggered by Bot

tensorrt-cicd · 2025-08-15T02:35:53Z

PR_Github #15375 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-15T07:32:00Z

PR_Github #15375 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #11591 completed with status: 'FAILURE'

lfr-0531 · 2025-08-15T15:11:39Z

/bot run

tensorrt-cicd · 2025-08-15T15:17:12Z

PR_Github #15453 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-15T16:44:19Z

PR_Github #15453 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11643 completed with status: 'FAILURE'

lfr-0531 · 2025-08-16T02:02:51Z

/bot run

tensorrt-cicd · 2025-08-16T02:08:33Z

PR_Github #15501 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-16T02:50:55Z

PR_Github #15501 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11670 completed with status: 'FAILURE'

lfr-0531 · 2025-08-16T07:39:45Z

/bot run

tensorrt-cicd · 2025-08-16T07:45:12Z

PR_Github #15507 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-16T15:06:08Z

PR_Github #15507 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11675 completed with status: 'FAILURE'

lfr-0531 · 2025-08-17T10:07:45Z

/bot run

tensorrt-cicd · 2025-08-17T10:13:50Z

PR_Github #15533 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-17T16:26:43Z

PR_Github #15533 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11698 completed with status: 'FAILURE'

lfr-0531 · 2025-08-18T02:51:59Z

/bot run

tensorrt-cicd · 2025-08-18T02:57:41Z

PR_Github #15572 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-18T06:24:36Z

PR_Github #15572 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11727 completed with status: 'FAILURE'

lfr-0531 · 2025-08-18T06:37:22Z

/bot run

tensorrt-cicd · 2025-08-18T06:42:51Z

PR_Github #15603 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-18T12:17:12Z

PR_Github #15603 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11746 completed with status: 'FAILURE'

lfr-0531 · 2025-08-18T15:39:20Z

/bot run

tensorrt-cicd · 2025-08-18T15:44:49Z

PR_Github #15630 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-19T04:03:01Z

PR_Github #15630 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11767 completed with status: 'SUCCESS'

lfr-0531 requested review from Barry-Delaney and litaotju August 7, 2025 11:58

coderabbitai bot reviewed Aug 7, 2025

View reviewed changes

tests/integration/test_lists/test-db/l0_dgx_b200.yml Outdated Show resolved Hide resolved

litaotju reviewed Aug 7, 2025

View reviewed changes

tests/integration/test_lists/test-db/l0_dgx_b200.yml Outdated Show resolved Hide resolved

lfr-0531 force-pushed the user/fanrongl/add_r1_fp8_blackwell_acc_test branch from cda210a to 3563e05 Compare August 7, 2025 15:33

litaotju changed the title ~~[None][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell~~ [TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell Aug 7, 2025

litaotju approved these changes Aug 7, 2025

View reviewed changes

Merge branch 'main' into user/fanrongl/add_r1_fp8_blackwell_acc_test

97cf84a

Merge branch 'main' into user/fanrongl/add_r1_fp8_blackwell_acc_test

12c0373

lfr-0531 merged commit 816a120 into NVIDIA:main Aug 19, 2025
4 checks passed

This was referenced Aug 20, 2025

[None][infra] Prepare for single GPU GB200 test pipeline #7073

Merged

[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss #7192

Merged

coderabbitai bot mentioned this pull request Aug 29, 2025

[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill #7365

Merged

1 task

lfr-0531 deleted the user/fanrongl/add_r1_fp8_blackwell_acc_test branch September 22, 2025 07:14

[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell #6710

[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell #6710

Uh oh!

Conversation

lfr-0531 commented Aug 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lfr-0531 commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

lfr-0531 commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

Uh oh!

lfr-0531 commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

lfr-0531 commented Aug 8, 2025

Uh oh!

tensorrt-cicd commented Aug 8, 2025

Uh oh!

tensorrt-cicd commented Aug 8, 2025

Uh oh!

lfr-0531 commented Aug 10, 2025

Uh oh!

tensorrt-cicd commented Aug 10, 2025

Uh oh!

tensorrt-cicd commented Aug 10, 2025

Uh oh!

lfr-0531 commented Aug 10, 2025

Uh oh!

tensorrt-cicd commented Aug 10, 2025

Uh oh!

tensorrt-cicd commented Aug 10, 2025

Uh oh!

lfr-0531 commented Aug 11, 2025

Uh oh!

tensorrt-cicd commented Aug 11, 2025

Uh oh!

lfr-0531 commented Aug 11, 2025

Uh oh!

tensorrt-cicd commented Aug 11, 2025

Uh oh!

tensorrt-cicd commented Aug 15, 2025

Uh oh!

tensorrt-cicd commented Aug 15, 2025

Uh oh!

lfr-0531 commented Aug 15, 2025

Uh oh!

lfr-0531 commented Aug 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 7, 2025 •

edited

Loading