KEMBAR78
[TRTLLM-7183][test] Feature fix model issue for disagg serving by fredricz-20070104 · Pull Request #7785 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

Conversation

@fredricz-20070104
Copy link
Collaborator

@fredricz-20070104 fredricz-20070104 commented Sep 17, 2025

Summary by CodeRabbit

  • Tests
    • Added a readiness check in performance tests to wait for the server’s health endpoint before running client/benchmark steps, improving stability.
    • Increased server startup and readiness timeouts to accommodate larger models and reduce test flakiness.
    • Updated the perf sanity suite to use FP8-specific variants and adjusted model targets for better coverage and relevance.
    • Overall improves reliability and predictability of performance test runs without changing public interfaces.

a) Fix timeout issue for the test case.
b) Fix model name issue for the test case.
c) Add wait for endpoint ready function to make sure disagg_serve endpoint is in working status when start testing

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
@fredricz-20070104 fredricz-20070104 changed the title [TRTLLM-7871][test]Feature fix model issue for disagg serving [TRTLLM-7183][test]Feature fix model issue for disagg serving Sep 17, 2025
@fredricz-20070104 fredricz-20070104 changed the title [TRTLLM-7183][test]Feature fix model issue for disagg serving [TRTLLM-7183][test] Feature fix model issue for disagg serving Sep 17, 2025
@fredricz-20070104
Copy link
Collaborator Author

/bot run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 17, 2025

📝 Walkthrough

Walkthrough

Increases server start timeout from 1800s to 3600s in perf test command construction, adds an HTTP health readiness wait before running client/benchmark in disaggregated perf tests, and updates QA perf sanity YAML to swap specific FP8 and model variant test entries.

Changes

Cohort / File(s) Summary of changes
Perf test lifecycle adjustments
tests/integration/defs/perf/test_perf.py
Doubles server start timeout to 3600 seconds in disaggregated client command path. No other logic changes.
Disaggregated perf utils: readiness gate
tests/integration/defs/perf/utils.py
Adds wait_for_endpoint_ready(url: str, timeout: int = 600) polling HTTP 200 via requests; integrates a pre-run health check http://localhost:8000/health with 1800s timeout into run_cmd after server launch, before client/benchmark.
QA perf sanity list updates
tests/integration/test_lists/qa/llm_perf_sanity.yml
Replaces two test entries: remove deepseek_v3_lite-disagg_server and llama_v3.1_70b-disagg_server; add deepseek_v3_lite_fp8-disagg_server and llama_v3.1_8b-disagg_server in two sections.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor T as Test Runner
  participant P as PerfDisaggScriptTestCmds
  participant S as Disagg Server (proc)
  participant H as Health Endpoint (/health)
  participant C as Client
  participant B as Benchmark

  T->>P: run_cmd()
  P->>S: launch server process
  Note over P,S: New step: readiness gate
  P->>H: poll GET /health (<=1800s)
  alt 200 OK
    H-->>P: 200 Ready
    P->>C: execute client
    P->>B: execute benchmark
  else timeout / error
    H-->>P: non-200/timeout
    P->>T: report error/timeout
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description only contains a short @coderabbitai summary with three brief points and does not follow the repository template: it is missing the detailed "Description" explaining the issue and fix, a "Test Coverage" section naming the relevant tests, and a completed "PR Checklist" as required by the template. Please expand the PR description to match the template by adding a clear Description with implementation details and rationale (mention the increased server timeout and the new wait_for_endpoint_ready behavior), a Test Coverage section listing affected tests/files (e.g., tests/integration/defs/perf/test_perf.py, tests/integration/defs/perf/utils.py, and the updated YAML entries), and a completed PR Checklist including any manual verification steps or risks.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "[TRTLLM-7871][test]Feature fix model issue for disagg serving" is related to the PR's main changes—fixing model-name issues, addressing a timeout, and adding an endpoint-readiness wait—so it conveys the primary intent, but its phrasing ("Feature fix") is awkward and not fully aligned with the repository's conventional style.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/integration/defs/perf/utils.py (1)

334-369: Guard process cleanup to avoid UnboundLocalError on early failures.

If any context in the multi-with fails to enter, server_proc/ctx_workers_proc/gen_workers_proc may be undefined; the finally block will then raise, hiding the root cause.

-        try:
+        ctx_workers_proc = None
+        gen_workers_proc = None
+        server_proc = None
+        try:
             with (  # Start ctx workers
                 open('output_ctx.log', 'w') as output_ctx,
                 popen(self.ctx_cmd,
                       stdout=output_ctx,
                       stderr=subprocess.STDOUT,
                       env=venv._new_env,
                       shell=True) as ctx_workers_proc,
                 # Start gen workers
                 open('output_gen.log', 'w') as output_gen,
                 popen(self.gen_cmd,
                       stdout=output_gen,
                       stderr=subprocess.STDOUT,
                       env=venv._new_env,
                       shell=True) as gen_workers_proc,
                 # Start server
                 open('output_server.log', 'w') as output_server,
                 popen(self.server_cmd,
                       stdout=output_server,
                       stderr=subprocess.STDOUT,
                       env=venv._new_env,
                       shell=True) as server_proc):
                 ...
         finally:
-            server_proc.terminate()
-            ctx_workers_proc.terminate()
-            gen_workers_proc.terminate()
-            server_proc.wait()
-            ctx_workers_proc.wait()
-            gen_workers_proc.wait()
+            for proc in (server_proc, ctx_workers_proc, gen_workers_proc):
+                if proc is not None:
+                    proc.terminate()
+                    proc.wait()
🧹 Nitpick comments (2)
tests/integration/defs/perf/utils.py (2)

358-361: Remove unnecessary f-string; consider sourcing host/port from server_config.

Fixes F541 and avoids drift between config and hard-coded URL. Optional: derive host/port from the generated server_config.yaml.

-                self.wait_for_endpoint_ready(
-                    f"http://localhost:8000/health",
-                    timeout=1800)  # 30 minutes for large models
+                self.wait_for_endpoint_ready(
+                    "http://localhost:8000/health",
+                    timeout=1800)  # 30 minutes for large models

1-2: Update header year to 2025 to match guidelines.

Keep the header range current across all source files.

-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6983e8a and 3b4d2a9.

📒 Files selected for processing (3)
  • tests/integration/defs/perf/test_perf.py (1 hunks)
  • tests/integration/defs/perf/utils.py (3 hunks)
  • tests/integration/test_lists/qa/llm_perf_sanity.yml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tests/integration/defs/perf/test_perf.py
  • tests/integration/defs/perf/utils.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tests/integration/defs/perf/test_perf.py
  • tests/integration/defs/perf/utils.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tests/integration/defs/perf/test_perf.py
  • tests/integration/defs/perf/utils.py
🧬 Code graph analysis (1)
tests/integration/defs/perf/utils.py (1)
tests/integration/defs/stress_test/stress_test.py (1)
  • url (79-81)
🪛 Ruff (0.12.2)
tests/integration/defs/perf/utils.py

326-326: Probable use of requests call without timeout

(S113)


329-329: Do not catch blind exception: Exception

(BLE001)


359-359: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tests/integration/defs/perf/test_perf.py (1)

2009-2010: LGTM: extended server start timeout to 3600s.

Aligned with longer startup for large models; consistent with server -t/-r flags.

tests/integration/test_lists/qa/llm_perf_sanity.yml (1)

223-225: LGTM: swapped disagg tests to FP8/8B variants.

Entries map to existing MODEL_PATH_DICT keys and the disagg label format is valid.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18904 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18904 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14170 completed with status: 'FAILURE'

@ruodil
Copy link
Collaborator

ruodil commented Sep 18, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #19072 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #19072 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14308 completed with status: 'FAILURE'

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
@fredricz-20070104
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #19181 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #19181 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14395 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@ruodil ruodil merged commit fc4e6d3 into NVIDIA:main Sep 19, 2025
5 checks passed
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Sep 19, 2025
…A#7785)

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025
…A#7785)

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
MrGeva pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Sep 21, 2025
…A#7785)

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants