-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[None][docs] refine docs for accuracy evaluation of gpt-oss models #7252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds documentation to the GPT‑OSS on TRT‑LLM quick-start guide explaining required TRT‑LLM server flags and gpt‑oss evaluation options, provides a mapping table of reasoning‑effort to parallel config and limits, and includes an example eval command. Changes are documentation-only and inserted in two locations on the page. Changes
Sequence Diagram(s)Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md (2)
237-237: Capitalize product names and tighten the instruction sentenceUse consistent branding and flag notation. Also add the missing serial comma.
-You need to set `enable_attention_dp`, `tp_size`, `ep_size`, `max_batch_size` and `max_num_tokens` when launching the trtllm server and set `reasoning-effort` when launching evaluation in gpt-oss. Below are some reference configurations for accuracy evaluation on B200. +Set `enable_attention_dp`, `tp_size`, `ep_size`, `max_batch_size`, and `max_num_tokens` when launching the TRT-LLM server, and set `--reasoning-effort` when launching evaluations in GPT-OSS. Below are reference configurations for accuracy evaluation on B200.
239-245: Fix table grammar/formatting and clarify what DP/TP/EP mean
- Grammar/formatting: header-casing, spacing around operators, thousands separators, and clearer alternatives (“or …”) improve readability and address the grammar hints.
- Add a short note to disambiguate DP vs TP/EP and to remind readers to align server flags with the table values.
-| **reasoning-effort** | **parallel configuration** | **max_batch_size** | **max_num_tokens** | -|:--------------------:|:--------------------------:|:------------------:|:------------------:| -| low/medium | DP+TP8+EP8 / DP+TP4+EP4 | 128 | 32768 | -| high | DP+TP8+EP8 / DP+TP4+EP4 | 2 | 133120 | -| low/medium | TP8 / TP4 | 1024 | 32768 | -| high | TP8 / TP4 | 16 | 133120 | +| Reasoning effort | Parallel config | Max batch size | Max num tokens | +|:----------------:|:---------------------------------|---------------:|---------------:| +| low, medium | DP + TP8 + EP8 (or DP + TP4 + EP4) | 128 | 33,120 | +| high | DP + TP8 + EP8 (or DP + TP4 + EP4) | 2 | 133,120 | +| low, medium | TP8 (or TP4) | 1,024 | 32,768 | +| high | TP8 (or TP4) | 16 | 133,120 | + +Note: DP here refers to attention data parallel. Enable it via `enable_attention_dp: true` in the YAML and launch across multiple GPUs; TP maps to `--tp_size`, EP maps to `--ep_size`. Ensure your server `--max_num_tokens` is set to at least the value shown in the table for the chosen configuration.Note: If 33,120 in the first row was intended to be 32,768 (32K), please correct it accordingly.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
🪛 LanguageTool
docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
[grammar] ~239-~239: There might be a mistake here.
Context: ...max_batch_size* | max_num_tokens | |:--------------------:|:---------------...
(QB_NEW_EN)
[grammar] ~240-~240: There might be a mistake here.
Context: ...-----------------:|:------------------:| | low/medium | DP+TP8+EP8 / DP...
(QB_NEW_EN)
[grammar] ~241-~241: There might be a mistake here.
Context: ...28 | 32768 | | high | DP+TP8+EP8 / DP...
(QB_NEW_EN)
[grammar] ~242-~242: There might be a mistake here.
Context: ... | 133120 | | low/medium | TP8 / TP4 ...
(QB_NEW_EN)
[grammar] ~243-~243: There might be a mistake here.
Context: ...024 | 32768 | | high | TP8 / TP4 ...
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
Show resolved
Hide resolved
docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
Outdated
Show resolved
Hide resolved
docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
Outdated
Show resolved
Hide resolved
|
@binghanc , thanks for the contribution~
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md (1)
246-255: Point evals to your local server; fix dataset/model naming and hyphenation
- Prepend env vars so
gpt_oss.evalshits your local TRT‑LLM endpoint (otherwise it may call OpenAI by default).- In prose: “reasoning effort” and “AIME 2025”.
- Use the served model id consistently:
openai/gpt-oss-120b(server and Basic Test use this).- Keep the dataset flag
--eval gpqa,aime25but align the prose to “AIME 2025”.-Below is an example command for evaluating the accuracy of gpt-oss-120b with low and medium reasoning-effort on GPQA and AIME2025. +Below is an example command for evaluating the accuracy of gpt-oss-120b with low and medium reasoning effort on GPQA and AIME 2025. ```shell -# execute this command in gpt-oss -python -m gpt_oss.evals \ +# In the gpt-oss repository, target your local TRT-LLM server: +export OPENAI_API_KEY="EMPTY" # Any non-empty value is accepted +export OPENAI_BASE_URL="http://localhost:8000/v1" + +# Then run: +python -m gpt_oss.evals \ --sampler chat_completions \ --eval gpqa,aime25 \ - --model gpt-oss-120b \ + --model openai/gpt-oss-120b \ --reasoning-effort low,medium
+Reminder: Ensure the TRT-LLM server was launched with a
--max_num_tokensvalue compatible with the table above.</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro **💡 Knowledge Base configuration:** - MCP integration is disabled by default for public repositories - Jira integration is disabled by default for public repositories - Linear integration is disabled by default for public repositories You can enable these sources in your CodeRabbit configuration. <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 9bcc11b7a7ccc586705f98ff9cffd16cb3fd63fe and 2ac1ef6200253ec2ced877524ce1b9a79d04e5a6. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (1)</summary> <details> <summary>📚 Learning: 2025-07-28T17:06:08.621Z</summary>Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.**Applied to files:** - `docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md` </details> </details><details> <summary>🪛 LanguageTool</summary> <details> <summary>docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md</summary> [grammar] ~239-~239: There might be a mistake here. Context: ...*max_batch_size** | **max_num_tokens** | |:--------------------:|:---------------... (QB_NEW_EN) --- [grammar] ~240-~240: There might be a mistake here. Context: ...-----------------:|:------------------:| | low/medium | DEP8 / DEP4 ... (QB_NEW_EN) --- [grammar] ~241-~241: There might be a mistake here. Context: ...28 | 32768 | | high | DEP8 / DEP4 ... (QB_NEW_EN) --- [grammar] ~242-~242: There might be a mistake here. Context: ... | 133120 | | low/medium | TP8 / TP4 ... (QB_NEW_EN) --- [grammar] ~243-~243: There might be a mistake here. Context: ...024 | 32768 | | high | TP8 / TP4 ... (QB_NEW_EN) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary> * GitHub Check: Pre-commit Check </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
Show resolved
Hide resolved
2ac1ef6 to
0a257e8
Compare
|
/bot skip |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand.
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot skip --comments "Modifying the document does not require code testing" |
Signed-off-by: binghanc <binghanc@nvidia.com>
Signed-off-by: binghanc <binghanc@nvidia.com>
6a89fc4 to
0e7fdcd
Compare
|
/bot skip --comment "docs change only" |
|
PR_Github #17959 [ skip ] triggered by Bot |
|
PR_Github #17959 [ skip ] completed with state |
…VIDIA#7252) Signed-off-by: 176802681+binghanc@users.noreply.github.com
add docs for accuracy evaluation of gpt-oss models.
Summary by CodeRabbit