KEMBAR78
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. by bo-nv · Pull Request #7503 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

Conversation

@bo-nv
Copy link
Collaborator

@bo-nv bo-nv commented Sep 3, 2025

Summary by CodeRabbit

  • New Features

    • Added support for disaggregated-server performance runs with orchestration of context, generation, server, client and benchmark steps.
    • Introduced disaggregated metrics (median E2EL and median TTFT) with parsing and thresholds.
  • Tests

    • Extended perf pipeline to collect, parse, aggregate, and report disaggregated-server metrics.
    • Added configurable options for disagg runs (worker counts) and new disagg test entries in the sanity test list.
    • Result output masks detailed disaggregated command content for cleaner logs.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 3, 2025

📝 Walkthrough

Walkthrough

Adds disaggregated-server support to perf tests: new metrics (DISAGG_SERVER_E2EL, DISAGG_SERVER_TTFT), parsing/thresholds, PerfTestConfig disagg runtime/config and command generation, a PerfDisaggScriptTestCmds orchestration to launch ctx/gen/server processes plus client/benchmark, and integration into metric collection and result output.

Changes

Cohort / File(s) Summary
Perf test definitions & disagg integration
tests/integration/defs/perf/test_perf.py
Adds DISAGG_SERVER_METRICS and DISAGG_SERVER_METRICS_LOG_QUERIES; extends PERF_METRIC_LOG_QUERIES and PERF_METRIC_THRESHOLD for disagg metrics; adds yaml import; extends PerfTestConfig with is_disagg_server, ctx_server_workers, gen_server_workers, disagg-specific loaders/serializers/validators and many private helpers for YAML/config generation and command/benchmark builders; get_commands now returns disagg command container and uses disagg metrics in collection.
Perf utilities, commands & metric types
tests/integration/defs/perf/utils.py
Adds PerfMetricType.DISAGG_SERVER_E2EL and DISAGG_SERVER_TTFT; introduces PerfDisaggScriptTestCmds NamedTuple with ctx_cmd, gen_cmd, server_cmd, client_cmd, benchmark_cmd; implements run_cmd to concurrently start ctx/gen/server, run client/benchmark, aggregate output, and return redacted command info; get_cmd_str returns placeholder for disagg commands.
Test lists
tests/integration/test_lists/qa/llm_perf_sanity.yml
Adds two new disagg-server test entries referencing the disagg runtime variants.
Minor imports / misc
tests/integration/defs/perf/test_perf.py, tests/integration/defs/perf/utils.py
Imports and public export adjustments (yaml, PerfDisaggScriptTestCmds); diagnostic print exposure for disagg path; result writing redacts serialized disagg commands.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CI as Test Runner
    participant Config as PerfTestConfig
    participant Cmds as PerfDisaggScriptTestCmds
    participant Ctx as Context Server (ctx)
    participant Gen as Generation Server (gen)
    participant Srv as Disagg Server
    participant Bench as Client / Benchmark

    CI->>Config: set_runtime_configs(disagg_server,...)
    CI->>Config: get_commands()
    Config-->>CI: PerfDisaggScriptTestCmds(ctx_cmd, gen_cmd, server_cmd, client_cmd, benchmark_cmd)

    CI->>Cmds: run_cmd(...)
    note over Cmds: concurrently start background processes
    Cmds->>Ctx: start ctx_cmd
    Cmds->>Gen: start gen_cmd
    Cmds->>Srv: start server_cmd
    Cmds->>Bench: run client_cmd -> benchmark_cmd
    Bench-->>Cmds: stdout (metrics)
    Cmds->>CI: aggregated output (metrics), redacted command
    Cmds->>Ctx: terminate
    Cmds->>Gen: terminate
    Cmds->>Srv: terminate
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • StanleySun639
  • LarryXFly
  • chzblych
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (7)
tests/integration/defs/perf/utils.py (4)

387-389: Docstring formatting inconsistency.

The one-line docstring should be on a single line as per Python conventions (PEP 257).

-    def run_disagg_server(self,
-                          full_test_name: str,
-                          venv: Optional[PythonVenvRunnerImpl],
-                          gpu_clock_lock: GPUClockLock,
-                          session_data_writer: SessionDataWriter,
-                          output_dir: str,
-                          outputs: Dict[int, str] = {},
-                          original_test_name: str = None,
-                          cmd_idx: int = 0,
-                          **kwargs) -> List[str]:
-        """
-        Run the disaggregated server and write the results to the output csv and/or yaml files.
-        """
+    def run_disagg_server(self,
+                          full_test_name: str,
+                          venv: Optional[PythonVenvRunnerImpl],
+                          gpu_clock_lock: GPUClockLock,
+                          session_data_writer: SessionDataWriter,
+                          output_dir: str,
+                          outputs: Dict[int, str] = {},
+                          original_test_name: str = None,
+                          cmd_idx: int = 0,
+                          **kwargs) -> List[str]:
+        """Run the disaggregated server and write the results to the output csv and/or yaml files."""

409-409: Unused variable tmpDir can be removed.

The variable tmpDir is assigned but never used in the function. The temp_wd context manager is likely managing the working directory change automatically.

-        tmpDir = temp_wd(self.get_working_dir())

474-479: Consider more graceful process termination.

The processes are terminated unconditionally in the finally block. Consider checking process status first and attempting a graceful shutdown before termination.

             finally:
-                server_proc.terminate()
-                ctx_workers_proc.terminate()
-                gen_workers_proc.terminate()
-                server_proc.wait()
-                ctx_workers_proc.wait()
-                gen_workers_proc.wait()
+                # Gracefully shutdown processes
+                for proc, name in [(server_proc, "server"), 
+                                  (ctx_workers_proc, "ctx_workers"),
+                                  (gen_workers_proc, "gen_workers")]:
+                    if proc.poll() is None:  # Process is still running
+                        print_info(f"Terminating {name} process...")
+                        proc.terminate()
+                        try:
+                            proc.wait(timeout=5)
+                        except subprocess.TimeoutExpired:
+                            print_warning(f"{name} process didn't terminate gracefully, killing...")
+                            proc.kill()
+                            proc.wait()

637-638: Consider making the placeholder string more descriptive.

The serialized command is replaced with a generic placeholder. Consider making it more informative about what's being hidden.

-        # serialized_cmd = self.get_commands().get_cmd_str(cmd_idx)
-        serialized_cmd = "placeholder for disaggregated server"
+        # serialized_cmd = self.get_commands().get_cmd_str(cmd_idx)  
+        serialized_cmd = "[REDACTED: disaggregated server multi-process command sequence]"
tests/integration/defs/perf/test_perf.py (3)

597-702: Large block of commented-out code should be removed or documented.

There's a substantial amount of commented-out code in the load_from_str method. If this code is no longer needed, it should be removed. If it's being kept for reference or future use, add a clear comment explaining why.

Would you like me to help clean up this commented code or convert it to proper documentation if it's meant to serve as a reference?


1823-2108: Consider breaking down the DisaggregatedServerPerfTest class.

The DisaggregatedServerPerfTest class is quite large (285 lines) with multiple responsibilities. Consider extracting configuration generation and command building into separate helper classes for better maintainability.

Consider using a builder pattern or factory pattern for creating the various configurations (ctx_config, gen_config, server_config) and commands. This would improve testability and make the code more modular.


1849-1901: Many commented configuration options in worker config generation.

The gen_worker_config method contains many commented-out configuration options. These should either be removed if not needed, or properly documented if they represent future configuration options.

     def gen_worker_config(self):
+        """Generate worker configurations for context and generation servers.
+        
+        Note: Additional configuration options are available but not currently used:
+        - max_batch_size, max_num_tokens, max_seq_len
+        - tensor_parallel_size, pipeline_parallel_size
+        - enable_attention_dp, kv_cache_config
+        See documentation for full configuration options.
+        """
         ctx_config = {
-            # 'max_batch_size': ctx_batch_size,
-            # 'max_num_tokens': ctx_max_num_tokens,
             # ... (remove other commented lines)
             'disable_overlap_scheduler': True,
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ae51368 and bbd3288.

📒 Files selected for processing (2)
  • tests/integration/defs/perf/test_perf.py (9 hunks)
  • tests/integration/defs/perf/utils.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Filenames compiled into a target must be case-insensitively unique

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use spaces, not tabs; indent 4 spaces

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
🧬 Code graph analysis (2)
tests/integration/defs/perf/utils.py (6)
tests/integration/defs/perf/misc.py (1)
  • check_output (42-52)
tests/integration/defs/trt_test_alternative.py (1)
  • popen (195-214)
tests/integration/defs/triton_server/local_venv.py (1)
  • PythonVenvRunnerImpl (14-113)
tests/integration/defs/perf/gpu_clock_lock.py (2)
  • GPUClockLock (67-360)
  • InvalidGPUMonitoringResultError (37-38)
tests/integration/defs/perf/session_data_writer.py (1)
  • SessionDataWriter (26-80)
tests/integration/defs/perf/test_perf.py (4)
  • get_commands (1396-1486)
  • get_commands (1993-1997)
  • get_perf_result (1488-1566)
  • get_perf_result (2084-2098)
tests/integration/defs/perf/test_perf.py (1)
tests/integration/defs/perf/utils.py (11)
  • PerfMetricType (85-104)
  • AbstractPerfScriptTestClass (312-720)
  • set_runtime_configs (327-331)
  • get_test_name (318-324)
  • get_commands (334-338)
  • run_disagg_server (387-498)
  • get_result_state (375-379)
  • get_perf_result (341-345)
  • get_threshold (348-352)
  • get_absolute_threshold (354-359)
  • get_metric_type (361-366)
🪛 Ruff (0.12.2)
tests/integration/defs/perf/utils.py

389-390: One-line docstring should fit on one line

Reformat to one line

(D200)


400-400: Local variable tmpDir is assigned to but never used

Remove assignment to unused variable tmpDir

(F841)

tests/integration/defs/perf/test_perf.py

2014-2015: One-line docstring should fit on one line

Reformat to one line

(D200)


2055-2055: Redefinition of unused set_runtime_configs from line 1818

(F811)


2090-2091: One-line docstring should fit on one line

Reformat to one line

(D200)


2101-2102: One-line docstring should fit on one line

Reformat to one line

(D200)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/integration/defs/perf/utils.py (2)

103-104: LGTM! New metric types added correctly.

The new disaggregated server metric types DISAGG_SERVER_E2EL and DISAGG_SERVER_TTFT are appropriately added to the PerfMetricType enum following the established naming pattern.


29-30: Verified correct popen import: defs.trt_test_alternative defines popen with proper session handling and cleanup logic, so the import is correct.

tests/integration/defs/perf/test_perf.py (3)

25-25: LGTM! YAML import added for config generation.

The yaml import is appropriately added to support the disaggregated server configuration file generation.


273-278: LGTM! Disaggregated server metrics properly integrated.

The new DISAGG_SERVER_METRICS_LOG_QUERIES dictionary and DISAGG_SERVER_METRICS list are correctly defined with appropriate regex patterns for parsing E2EL and TTFT metrics.


1941-1942: Review GPU allocation in perf tests
In tests/integration/defs/perf/test_perf.py:1941–1942, overriding CUDA_VISIBLE_DEVICES inline can conflict with MPI or other GPU resource managers. Confirm these commands correctly isolate GPUs in your multi-GPU environment or switch to your harness’s native GPU-selection mechanism.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
tests/integration/defs/perf/utils.py (1)

312-356: Add timeout handling for disaggregated server processes.

The disaggregated server orchestration launches multiple processes but doesn't have explicit timeout handling for each process. Consider adding timeout parameters to prevent indefinite hangs.

Apply this diff to add timeout handling:

         finally:
-            server_proc.terminate()
-            ctx_workers_proc.terminate()
-            gen_workers_proc.terminate()
-            server_proc.wait()
-            ctx_workers_proc.wait()
-            gen_workers_proc.wait()
+            # Terminate processes gracefully with timeout
+            for proc, name in [(server_proc, 'server'), 
+                                (ctx_workers_proc, 'ctx_workers'),
+                                (gen_workers_proc, 'gen_workers')]:
+                if proc is not None:
+                    proc.terminate()
+                    try:
+                        proc.wait(timeout=60)
+                    except subprocess.TimeoutExpired:
+                        print_error(f"Process {name} did not terminate gracefully, killing...")
+                        proc.kill()
+                        proc.wait()
tests/integration/defs/perf/test_perf.py (2)

311-312: Review threshold values for disaggregated server metrics.

The thresholds for DISAGG_SERVER_E2EL and DISAGG_SERVER_TTFT are set to (0.1, 50) with a comment "Ignore < 50ms?". Please confirm these are the intended values for your performance regression detection.


774-776: Add validation for disaggregated server configurations.

The validate method returns early for disaggregated server configurations, bypassing all validation logic. Consider adding at least minimal validation for disaggregated server-specific parameters.

🧹 Nitpick comments (3)
tests/integration/defs/perf/test_perf.py (3)

1570-1570: Remove or guard debug print statement.

The print_info call outputs the entire command output which could be verbose and unnecessary for production use.

-        print_info(outputs[cmd_idx].split("\n"))
+        # Debug output - uncomment if needed for troubleshooting
+        # print_info(outputs[cmd_idx].split("\n"))

1899-1945: Extract magic numbers into constants for configuration generation.

The disaggregated worker configuration contains several hardcoded values that should be configurable or at least defined as constants.

+    # Configuration constants
+    _DEFAULT_MAX_BATCH_SIZE = 32
+    _DEFAULT_MAX_NUM_TOKENS = 4096
+    _DEFAULT_MAX_SEQ_LEN = 4096
+    _DEFAULT_FREE_GPU_MEM_FRACTION = 0.5
+    _DEFAULT_CUDA_GRAPH_BATCH_SIZES = [1, 2, 4, 8, 16, 32]
+
     def _gen_disagg_worker_config(self):
         ctx_config = {
-            'max_batch_size': 32,
-            'max_num_tokens': 4096,
-            'max_seq_len': 4096,
+            'max_batch_size': self._DEFAULT_MAX_BATCH_SIZE,
+            'max_num_tokens': self._DEFAULT_MAX_NUM_TOKENS,
+            'max_seq_len': self._DEFAULT_MAX_SEQ_LEN,
             'tensor_parallel_size': self._config.ctx_tp_size,
             'enable_attention_dp': self._config.ctx_dp_size > 1,
             'print_iter_log': True,
             'disable_overlap_scheduler': True,
             'kv_cache_config': {
                 'enable_block_reuse': False,
-                'free_gpu_memory_fraction': 0.5,
+                'free_gpu_memory_fraction': self._DEFAULT_FREE_GPU_MEM_FRACTION,
                 'dtype': 'fp8',
             },

2008-2046: Consider making benchmark parameters configurable.

The benchmark command uses hardcoded values for input/output lengths and number of prompts. Consider making these configurable based on the test configuration.

     def _get_disagg_benchmark_command(self):
         benchmark_script = os.path.join(self._llm_root, "tensorrt_llm", "serve",
                                         "scripts", "benchmark_serving.py")
         model_path = MODEL_PATH_DICT[self._config.model_name]
         model_dir = os.path.join(llm_models_root(), model_path)
         shared_gpt_path = os.path.join(
             llm_models_root(), "datasets",
             "ShareGPT_V3_unfiltered_cleaned_split.json")
+        
+        # Use test configuration values if available, otherwise use defaults
+        input_len = self._config.input_lens[0] if self._config.input_lens else 1024
+        output_len = self._config.output_lens[0] if self._config.output_lens else 1024
+        num_prompts = self._config.num_reqs if hasattr(self._config, 'num_reqs') else 320
+        
         benchmark_cmd = [
             'python3',
             benchmark_script,
             '--model',
             model_dir,
             '--tokenizer',
             model_dir,
             '--dataset-name',
             'random',
             '--dataset-path',
             shared_gpt_path,
             '--random-input-len',
-            '1024',
+            str(input_len),
             '--random-output-len',
-            '1024',
+            str(output_len),
             '--random-prefix-len',
             '0',
             '--num-prompts',
-            '320',
+            str(num_prompts),
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bbd3288 and 0753f77.

📒 Files selected for processing (2)
  • tests/integration/defs/perf/test_perf.py (15 hunks)
  • tests/integration/defs/perf/utils.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Filenames compiled into a target must be case-insensitively unique

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use spaces, not tabs; indent 4 spaces

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
🧬 Code graph analysis (2)
tests/integration/defs/perf/utils.py (2)
tests/integration/defs/perf/misc.py (1)
  • check_output (42-52)
tests/integration/defs/trt_test_alternative.py (1)
  • popen (195-214)
tests/integration/defs/perf/test_perf.py (3)
tests/integration/defs/common.py (3)
  • convert_weights (163-522)
  • get_cpp_benchmark (711-726)
  • quantize_data (575-625)
tests/integration/defs/conftest.py (2)
  • get_llm_root (166-176)
  • llm_models_root (77-83)
tests/integration/defs/perf/utils.py (3)
  • PerfDisaggScriptTestCmds (312-355)
  • PerfMetricType (85-104)
  • PerfScriptTestCmds (107-191)
🪛 Ruff (0.12.2)
tests/integration/defs/perf/test_perf.py

2030-2031: One-line docstring should fit on one line

Reformat to one line

(D200)


2041-2042: One-line docstring should fit on one line

Reformat to one line

(D200)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (9)
tests/integration/defs/perf/utils.py (1)

103-104: LGTM! New disaggregated server metrics added correctly.

The new DISAGG_SERVER_E2EL and DISAGG_SERVER_TTFT metrics are properly added to the PerfMetricType enum following the existing naming convention.

tests/integration/defs/perf/test_perf.py (8)

25-25: LGTM! Import added for YAML configuration generation.

The yaml import is properly added to support configuration file generation for disaggregated server tests.


257-260: LGTM! Disaggregated server metric regex patterns defined.

The regex patterns for E2EL and TTFT metrics are correctly defined and match the expected output format.


464-468: LGTM! Disaggregated server configuration attributes added.

The new attributes for disaggregated server configuration are properly initialized in the constructor.


962-963: LGTM! Benchmark script set to None for disaggregated server.

Correctly sets benchmark_script to None for disaggregated server runtime since it doesn't use a traditional benchmark script.


1453-1460: LGTM! Disaggregated server command generation integrated.

The code correctly generates and returns PerfDisaggScriptTestCmds for disaggregated server tests with all required deployment commands.


1750-1764: LGTM! Disaggregated server metrics properly integrated.

The code correctly handles disaggregated server metrics by using DISAGG_SERVER_METRICS and the appropriate log query patterns.


2056-2056: Fix hardcoded test runner instantiation.

The run_perf_test function now always creates a MultiMetricPerfTest instance. This should be conditional based on the test configuration.

The code shows MultiMetricPerfTest which is correct, not DisaggregatedServerPerfTest as mentioned in past comments. The actual implementation handles disaggregated server tests within MultiMetricPerfTest through the configuration's runtime property.


509-511: Remove debug print statement in to_string method.

The method correctly delegates to _to_string_disagg for disaggregated server runtime, but contains an incorrect return statement placement.

         elif self.runtime == "disagg_server":  # trtllm-serve
             entries.append(f"disagg_server")
             return self._to_string_disagg(entries)

The code is actually correct - it properly returns early for disagg_server case. My apologies for the confusion.

@bo-nv
Copy link
Collaborator Author

bo-nv commented Sep 4, 2025

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
tests/integration/defs/perf/utils.py (2)

319-351: Harden teardown: guard unbound locals and add wait timeouts with kill fallback.

Unconditional .terminate()/.wait() can raise UnboundLocalError if setup fails and can hang without timeouts.

Apply:

     def run_cmd(self, cmd_idx: int, venv) -> str:
         output = ""
+        server_proc = ctx_workers_proc = gen_workers_proc = None
+        TIMEOUT_SEC = 60
         try:
             with (  # Start ctx workers
@@
                 check_output(self.client_cmd, env=venv._new_env)
                 output += check_output(self.benchmark_cmd, env=venv._new_env)
         finally:
-            server_proc.terminate()
-            ctx_workers_proc.terminate()
-            gen_workers_proc.terminate()
-            server_proc.wait()
-            ctx_workers_proc.wait()
-            gen_workers_proc.wait()
+            for p in (server_proc, ctx_workers_proc, gen_workers_proc):
+                if p is None:
+                    continue
+                try:
+                    p.terminate()
+                    p.wait(timeout=TIMEOUT_SEC)
+                except Exception:
+                    try:
+                        p.kill()
+                        p.wait(timeout=10)
+                    except Exception:
+                        pass
         return output

354-356: Return a string and document placeholder rationale.

get_cmd_str returns a list here but other paths return a string; this can break result serialization. Also, add a short comment with the TODO/tracking ID for the placeholder.

Apply:

-    def get_cmd_str(self, cmd_idx) -> List[str]:
-        return ["disaggregated server tests, please check config files"]
+    def get_cmd_str(self, cmd_idx) -> List[str]:
+        # TODO(TRACK-ID): disagg command serialization not yet implemented; emit placeholder for now.
+        return "disaggregated server tests, please check config files"
tests/integration/defs/perf/test_perf.py (2)

766-779: Add minimal validation for disagg configs.

Early return skips all validation. Assert sane worker counts to fail fast.

Apply:

-        if self.runtime == "disagg_server":
-            # TODO: validate disaggregated server config
-            return
+        if self.runtime == "disagg_server":
+            assert self.ctx_server_workers >= 0 and self.gen_server_workers >= 0, "Worker counts must be non-negative"
+            assert (self.ctx_server_workers + self.gen_server_workers) > 0, "At least one disagg worker must be configured"
+            return

1901-1947: Duplicate key in ctx_config ('disable_overlap_scheduler').

Key appears twice; Python keeps the last, hiding the first. Remove the duplicate.

Apply:

         ctx_config = {
@@
-            'disable_overlap_scheduler': True,
             'kv_cache_config': {
                 'enable_block_reuse': False,
                 # 'free_gpu_memory_fraction': ctx_free_gpu_memory_fraction,
                 'free_gpu_memory_fraction': 0.5,
                 'dtype': 'fp8',
             },
-            'disable_overlap_scheduler': True,
+            'disable_overlap_scheduler': True,
🧹 Nitpick comments (2)
tests/integration/defs/perf/test_perf.py (2)

1568-1574: Avoid dumping entire output logs on every metric parse.

This is noisy and bloats logs. Consider gating behind a debug flag.

Apply:

-        print_info(outputs[cmd_idx].split("\n"))
+        if os.environ.get("TRTLLM_DEBUG") == "1":
+            print_info(outputs[cmd_idx].split("\n"))

278-284: Remove unused DISAGG_SERVER_METRICS_LOG_QUERIES.

get_metric_regex reads from PERF_METRIC_LOG_QUERIES; this map is redundant.

Apply:

- DISAGG_SERVER_METRICS_LOG_QUERIES = {
-     PerfMetricType.DISAGG_SERVER_E2EL:
-     re.compile(r"Median E2EL \(ms\):\s*(\d+\.?\d*)"),
-     PerfMetricType.DISAGG_SERVER_TTFT:
-     re.compile(r"Median TTFT \(ms\):\s*(\d+\.?\d*)"),
- }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0753f77 and b35f82d.

📒 Files selected for processing (3)
  • tests/integration/defs/perf/test_perf.py (15 hunks)
  • tests/integration/defs/perf/utils.py (3 hunks)
  • tests/integration/test_lists/qa/llm_perf_sanity.yml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Filenames compiled into a target must be case-insensitively unique

Files:

  • tests/integration/test_lists/qa/llm_perf_sanity.yml
  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use spaces, not tabs; indent 4 spaces

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/integration/defs/perf/utils.py
  • tests/integration/defs/perf/test_perf.py
🧬 Code graph analysis (2)
tests/integration/defs/perf/utils.py (3)
tests/integration/defs/perf/misc.py (1)
  • check_output (42-52)
tests/integration/defs/trt_test_alternative.py (1)
  • popen (195-214)
tests/integration/defs/triton_server/local_venv.py (1)
  • run_cmd (95-107)
tests/integration/defs/perf/test_perf.py (3)
tests/integration/defs/common.py (2)
  • get_cpp_benchmark (711-726)
  • quantize_data (575-625)
tests/integration/defs/conftest.py (2)
  • get_llm_root (166-176)
  • llm_models_root (77-83)
tests/integration/defs/perf/utils.py (5)
  • AbstractPerfScriptTestClass (358-652)
  • PerfBenchScriptTestCmds (205-309)
  • PerfDisaggScriptTestCmds (312-355)
  • PerfMetricType (85-104)
  • PerfScriptTestCmds (107-191)
🪛 Ruff (0.12.2)
tests/integration/defs/perf/test_perf.py

2035-2036: One-line docstring should fit on one line

Reformat to one line

(D200)


2046-2047: One-line docstring should fit on one line

Reformat to one line

(D200)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tests/integration/defs/perf/utils.py (1)

29-30: Confirm check_output supports env
The imported defs.trt_test_alternative.check_output is defined as def check_output(*popenargs, timeout=None, start_new_session=True, **kwargs): and forwards all **kwargs (including env) to the underlying Popen call. No changes required.

tests/integration/defs/perf/test_perf.py (1)

311-315: Revisit absolute thresholds for disagg metrics (units).

50 ms ignore window for E2EL/TTFT is large; confirm intended unit and target. Consider 5 ms (or a project-specific µs-scale value).

Apply if appropriate:

-    PerfMetricType.DISAGG_SERVER_E2EL: (0.1, 50),  # Ignore E2EL regression < 50ms
-    PerfMetricType.DISAGG_SERVER_TTFT: (0.1, 50),  # Ignore TTFT regression < 50ms
+    PerfMetricType.DISAGG_SERVER_E2EL: (0.1, 5),   # Ignore E2EL regression < 5ms
+    PerfMetricType.DISAGG_SERVER_TTFT: (0.1, 5),   # Ignore TTFT regression < 5ms

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17672 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17672 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13285 completed with status: 'FAILURE'

Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
@bo-nv bo-nv force-pushed the user/bo/main-add-disaggring-perf-tests branch from b35f82d to c5be58f Compare September 5, 2025 02:04
@bo-nv
Copy link
Collaborator Author

bo-nv commented Sep 5, 2025

/bot run

2 similar comments
@bo-nv
Copy link
Collaborator Author

bo-nv commented Sep 5, 2025

/bot run

@bo-nv
Copy link
Collaborator Author

bo-nv commented Sep 5, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17732 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17732 [ run ] completed with state DISABLED
L0 testing is limited to prioritized users. User bo-nv is not in the prioritized list. L0 testing cannot be triggered.

@bo-nv
Copy link
Collaborator Author

bo-nv commented Sep 5, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17749 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17749 [ run ] completed with state DISABLED
L0 testing is limited to prioritized users. User bo-nv is not in the prioritized list. L0 testing cannot be triggered.

@bo-nv
Copy link
Collaborator Author

bo-nv commented Sep 5, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17766 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17766 [ run ] completed with state DISABLED
L0 testing is limited to prioritized users. User bo-nv is not in the prioritized list. L0 testing cannot be triggered.

@bo-nv
Copy link
Collaborator Author

bo-nv commented Sep 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17916 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17916 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13423 completed with status: 'FAILURE'

@bo-nv
Copy link
Collaborator Author

bo-nv commented Sep 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17948 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17948 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13451 completed with status: 'SUCCESS'

@bo-nv bo-nv changed the title [TRTLLM-7871][infra] (DO NOT REVIEW) initial changes [TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. Sep 8, 2025
@bo-nv bo-nv requested a review from Shixiaowei02 September 8, 2025 02:11
@bo-nv bo-nv self-assigned this Sep 8, 2025
@bo-nv bo-nv requested review from LarryXFly and ruodil September 8, 2025 02:14
@bo-nv
Copy link
Collaborator Author

bo-nv commented Sep 8, 2025

@Shixiaowei02 @LarryXFly @ruodil please help review, thanks!

@bo-nv bo-nv merged commit bf57829 into NVIDIA:main Sep 10, 2025
8 of 9 checks passed
Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025
MrGeva pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Sep 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants