-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-7183][test] Feature fix model issue for disagg serving #7785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
|
/bot run |
📝 WalkthroughWalkthroughIncreases server start timeout from 1800s to 3600s in perf test command construction, adds an HTTP health readiness wait before running client/benchmark in disaggregated perf tests, and updates QA perf sanity YAML to swap specific FP8 and model variant test entries. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor T as Test Runner
participant P as PerfDisaggScriptTestCmds
participant S as Disagg Server (proc)
participant H as Health Endpoint (/health)
participant C as Client
participant B as Benchmark
T->>P: run_cmd()
P->>S: launch server process
Note over P,S: New step: readiness gate
P->>H: poll GET /health (<=1800s)
alt 200 OK
H-->>P: 200 Ready
P->>C: execute client
P->>B: execute benchmark
else timeout / error
H-->>P: non-200/timeout
P->>T: report error/timeout
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration/defs/perf/utils.py (1)
334-369: Guard process cleanup to avoid UnboundLocalError on early failures.If any context in the multi-with fails to enter, server_proc/ctx_workers_proc/gen_workers_proc may be undefined; the finally block will then raise, hiding the root cause.
- try: + ctx_workers_proc = None + gen_workers_proc = None + server_proc = None + try: with ( # Start ctx workers open('output_ctx.log', 'w') as output_ctx, popen(self.ctx_cmd, stdout=output_ctx, stderr=subprocess.STDOUT, env=venv._new_env, shell=True) as ctx_workers_proc, # Start gen workers open('output_gen.log', 'w') as output_gen, popen(self.gen_cmd, stdout=output_gen, stderr=subprocess.STDOUT, env=venv._new_env, shell=True) as gen_workers_proc, # Start server open('output_server.log', 'w') as output_server, popen(self.server_cmd, stdout=output_server, stderr=subprocess.STDOUT, env=venv._new_env, shell=True) as server_proc): ... finally: - server_proc.terminate() - ctx_workers_proc.terminate() - gen_workers_proc.terminate() - server_proc.wait() - ctx_workers_proc.wait() - gen_workers_proc.wait() + for proc in (server_proc, ctx_workers_proc, gen_workers_proc): + if proc is not None: + proc.terminate() + proc.wait()
🧹 Nitpick comments (2)
tests/integration/defs/perf/utils.py (2)
358-361: Remove unnecessary f-string; consider sourcing host/port from server_config.Fixes F541 and avoids drift between config and hard-coded URL. Optional: derive host/port from the generated server_config.yaml.
- self.wait_for_endpoint_ready( - f"http://localhost:8000/health", - timeout=1800) # 30 minutes for large models + self.wait_for_endpoint_ready( + "http://localhost:8000/health", + timeout=1800) # 30 minutes for large models
1-2: Update header year to 2025 to match guidelines.Keep the header range current across all source files.
-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/integration/defs/perf/test_perf.py(1 hunks)tests/integration/defs/perf/utils.py(3 hunks)tests/integration/test_lists/qa/llm_perf_sanity.yml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tests/integration/defs/perf/test_perf.pytests/integration/defs/perf/utils.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tests/integration/defs/perf/test_perf.pytests/integration/defs/perf/utils.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tests/integration/defs/perf/test_perf.pytests/integration/defs/perf/utils.py
🧬 Code graph analysis (1)
tests/integration/defs/perf/utils.py (1)
tests/integration/defs/stress_test/stress_test.py (1)
url(79-81)
🪛 Ruff (0.12.2)
tests/integration/defs/perf/utils.py
326-326: Probable use of requests call without timeout
(S113)
329-329: Do not catch blind exception: Exception
(BLE001)
359-359: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tests/integration/defs/perf/test_perf.py (1)
2009-2010: LGTM: extended server start timeout to 3600s.Aligned with longer startup for large models; consistent with server -t/-r flags.
tests/integration/test_lists/qa/llm_perf_sanity.yml (1)
223-225: LGTM: swapped disagg tests to FP8/8B variants.Entries map to existing MODEL_PATH_DICT keys and the disagg label format is valid.
|
PR_Github #18904 [ run ] triggered by Bot |
|
PR_Github #18904 [ run ] completed with state |
|
/bot run |
|
PR_Github #19072 [ run ] triggered by Bot |
|
PR_Github #19072 [ run ] completed with state |
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
|
/bot run |
|
PR_Github #19181 [ run ] triggered by Bot |
|
PR_Github #19181 [ run ] completed with state |
…A#7785) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
…A#7785) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
…A#7785) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Summary by CodeRabbit
a) Fix timeout issue for the test case.
b) Fix model name issue for the test case.
c) Add wait for endpoint ready function to make sure disagg_serve endpoint is in working status when start testing