-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. #7503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. #7503
Conversation
📝 WalkthroughWalkthroughAdds disaggregated-server support to perf tests: new metrics (DISAGG_SERVER_E2EL, DISAGG_SERVER_TTFT), parsing/thresholds, PerfTestConfig disagg runtime/config and command generation, a PerfDisaggScriptTestCmds orchestration to launch ctx/gen/server processes plus client/benchmark, and integration into metric collection and result output. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CI as Test Runner
participant Config as PerfTestConfig
participant Cmds as PerfDisaggScriptTestCmds
participant Ctx as Context Server (ctx)
participant Gen as Generation Server (gen)
participant Srv as Disagg Server
participant Bench as Client / Benchmark
CI->>Config: set_runtime_configs(disagg_server,...)
CI->>Config: get_commands()
Config-->>CI: PerfDisaggScriptTestCmds(ctx_cmd, gen_cmd, server_cmd, client_cmd, benchmark_cmd)
CI->>Cmds: run_cmd(...)
note over Cmds: concurrently start background processes
Cmds->>Ctx: start ctx_cmd
Cmds->>Gen: start gen_cmd
Cmds->>Srv: start server_cmd
Cmds->>Bench: run client_cmd -> benchmark_cmd
Bench-->>Cmds: stdout (metrics)
Cmds->>CI: aggregated output (metrics), redacted command
Cmds->>Ctx: terminate
Cmds->>Gen: terminate
Cmds->>Srv: terminate
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (7)
tests/integration/defs/perf/utils.py (4)
387-389: Docstring formatting inconsistency.The one-line docstring should be on a single line as per Python conventions (PEP 257).
- def run_disagg_server(self, - full_test_name: str, - venv: Optional[PythonVenvRunnerImpl], - gpu_clock_lock: GPUClockLock, - session_data_writer: SessionDataWriter, - output_dir: str, - outputs: Dict[int, str] = {}, - original_test_name: str = None, - cmd_idx: int = 0, - **kwargs) -> List[str]: - """ - Run the disaggregated server and write the results to the output csv and/or yaml files. - """ + def run_disagg_server(self, + full_test_name: str, + venv: Optional[PythonVenvRunnerImpl], + gpu_clock_lock: GPUClockLock, + session_data_writer: SessionDataWriter, + output_dir: str, + outputs: Dict[int, str] = {}, + original_test_name: str = None, + cmd_idx: int = 0, + **kwargs) -> List[str]: + """Run the disaggregated server and write the results to the output csv and/or yaml files."""
409-409: Unused variabletmpDircan be removed.The variable
tmpDiris assigned but never used in the function. Thetemp_wdcontext manager is likely managing the working directory change automatically.- tmpDir = temp_wd(self.get_working_dir())
474-479: Consider more graceful process termination.The processes are terminated unconditionally in the
finallyblock. Consider checking process status first and attempting a graceful shutdown before termination.finally: - server_proc.terminate() - ctx_workers_proc.terminate() - gen_workers_proc.terminate() - server_proc.wait() - ctx_workers_proc.wait() - gen_workers_proc.wait() + # Gracefully shutdown processes + for proc, name in [(server_proc, "server"), + (ctx_workers_proc, "ctx_workers"), + (gen_workers_proc, "gen_workers")]: + if proc.poll() is None: # Process is still running + print_info(f"Terminating {name} process...") + proc.terminate() + try: + proc.wait(timeout=5) + except subprocess.TimeoutExpired: + print_warning(f"{name} process didn't terminate gracefully, killing...") + proc.kill() + proc.wait()
637-638: Consider making the placeholder string more descriptive.The serialized command is replaced with a generic placeholder. Consider making it more informative about what's being hidden.
- # serialized_cmd = self.get_commands().get_cmd_str(cmd_idx) - serialized_cmd = "placeholder for disaggregated server" + # serialized_cmd = self.get_commands().get_cmd_str(cmd_idx) + serialized_cmd = "[REDACTED: disaggregated server multi-process command sequence]"tests/integration/defs/perf/test_perf.py (3)
597-702: Large block of commented-out code should be removed or documented.There's a substantial amount of commented-out code in the
load_from_strmethod. If this code is no longer needed, it should be removed. If it's being kept for reference or future use, add a clear comment explaining why.Would you like me to help clean up this commented code or convert it to proper documentation if it's meant to serve as a reference?
1823-2108: Consider breaking down the DisaggregatedServerPerfTest class.The
DisaggregatedServerPerfTestclass is quite large (285 lines) with multiple responsibilities. Consider extracting configuration generation and command building into separate helper classes for better maintainability.Consider using a builder pattern or factory pattern for creating the various configurations (ctx_config, gen_config, server_config) and commands. This would improve testability and make the code more modular.
1849-1901: Many commented configuration options in worker config generation.The
gen_worker_configmethod contains many commented-out configuration options. These should either be removed if not needed, or properly documented if they represent future configuration options.def gen_worker_config(self): + """Generate worker configurations for context and generation servers. + + Note: Additional configuration options are available but not currently used: + - max_batch_size, max_num_tokens, max_seq_len + - tensor_parallel_size, pipeline_parallel_size + - enable_attention_dp, kv_cache_config + See documentation for full configuration options. + """ ctx_config = { - # 'max_batch_size': ctx_batch_size, - # 'max_num_tokens': ctx_max_num_tokens, # ... (remove other commented lines) 'disable_overlap_scheduler': True,
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
tests/integration/defs/perf/test_perf.py(9 hunks)tests/integration/defs/perf/utils.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Filenames compiled into a target must be case-insensitively unique
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use spaces, not tabs; indent 4 spaces
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
🧬 Code graph analysis (2)
tests/integration/defs/perf/utils.py (6)
tests/integration/defs/perf/misc.py (1)
check_output(42-52)tests/integration/defs/trt_test_alternative.py (1)
popen(195-214)tests/integration/defs/triton_server/local_venv.py (1)
PythonVenvRunnerImpl(14-113)tests/integration/defs/perf/gpu_clock_lock.py (2)
GPUClockLock(67-360)InvalidGPUMonitoringResultError(37-38)tests/integration/defs/perf/session_data_writer.py (1)
SessionDataWriter(26-80)tests/integration/defs/perf/test_perf.py (4)
get_commands(1396-1486)get_commands(1993-1997)get_perf_result(1488-1566)get_perf_result(2084-2098)
tests/integration/defs/perf/test_perf.py (1)
tests/integration/defs/perf/utils.py (11)
PerfMetricType(85-104)AbstractPerfScriptTestClass(312-720)set_runtime_configs(327-331)get_test_name(318-324)get_commands(334-338)run_disagg_server(387-498)get_result_state(375-379)get_perf_result(341-345)get_threshold(348-352)get_absolute_threshold(354-359)get_metric_type(361-366)
🪛 Ruff (0.12.2)
tests/integration/defs/perf/utils.py
389-390: One-line docstring should fit on one line
Reformat to one line
(D200)
400-400: Local variable tmpDir is assigned to but never used
Remove assignment to unused variable tmpDir
(F841)
tests/integration/defs/perf/test_perf.py
2014-2015: One-line docstring should fit on one line
Reformat to one line
(D200)
2055-2055: Redefinition of unused set_runtime_configs from line 1818
(F811)
2090-2091: One-line docstring should fit on one line
Reformat to one line
(D200)
2101-2102: One-line docstring should fit on one line
Reformat to one line
(D200)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/integration/defs/perf/utils.py (2)
103-104: LGTM! New metric types added correctly.The new disaggregated server metric types
DISAGG_SERVER_E2ELandDISAGG_SERVER_TTFTare appropriately added to thePerfMetricTypeenum following the established naming pattern.
29-30: Verified correctpopenimport:defs.trt_test_alternativedefinespopenwith proper session handling and cleanup logic, so the import is correct.tests/integration/defs/perf/test_perf.py (3)
25-25: LGTM! YAML import added for config generation.The yaml import is appropriately added to support the disaggregated server configuration file generation.
273-278: LGTM! Disaggregated server metrics properly integrated.The new
DISAGG_SERVER_METRICS_LOG_QUERIESdictionary andDISAGG_SERVER_METRICSlist are correctly defined with appropriate regex patterns for parsing E2EL and TTFT metrics.
1941-1942: Review GPU allocation in perf tests
In tests/integration/defs/perf/test_perf.py:1941–1942, overridingCUDA_VISIBLE_DEVICESinline can conflict with MPI or other GPU resource managers. Confirm these commands correctly isolate GPUs in your multi-GPU environment or switch to your harness’s native GPU-selection mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
tests/integration/defs/perf/utils.py (1)
312-356: Add timeout handling for disaggregated server processes.The disaggregated server orchestration launches multiple processes but doesn't have explicit timeout handling for each process. Consider adding timeout parameters to prevent indefinite hangs.
Apply this diff to add timeout handling:
finally: - server_proc.terminate() - ctx_workers_proc.terminate() - gen_workers_proc.terminate() - server_proc.wait() - ctx_workers_proc.wait() - gen_workers_proc.wait() + # Terminate processes gracefully with timeout + for proc, name in [(server_proc, 'server'), + (ctx_workers_proc, 'ctx_workers'), + (gen_workers_proc, 'gen_workers')]: + if proc is not None: + proc.terminate() + try: + proc.wait(timeout=60) + except subprocess.TimeoutExpired: + print_error(f"Process {name} did not terminate gracefully, killing...") + proc.kill() + proc.wait()tests/integration/defs/perf/test_perf.py (2)
311-312: Review threshold values for disaggregated server metrics.The thresholds for
DISAGG_SERVER_E2ELandDISAGG_SERVER_TTFTare set to (0.1, 50) with a comment "Ignore < 50ms?". Please confirm these are the intended values for your performance regression detection.
774-776: Add validation for disaggregated server configurations.The
validatemethod returns early for disaggregated server configurations, bypassing all validation logic. Consider adding at least minimal validation for disaggregated server-specific parameters.
🧹 Nitpick comments (3)
tests/integration/defs/perf/test_perf.py (3)
1570-1570: Remove or guard debug print statement.The
print_infocall outputs the entire command output which could be verbose and unnecessary for production use.- print_info(outputs[cmd_idx].split("\n")) + # Debug output - uncomment if needed for troubleshooting + # print_info(outputs[cmd_idx].split("\n"))
1899-1945: Extract magic numbers into constants for configuration generation.The disaggregated worker configuration contains several hardcoded values that should be configurable or at least defined as constants.
+ # Configuration constants + _DEFAULT_MAX_BATCH_SIZE = 32 + _DEFAULT_MAX_NUM_TOKENS = 4096 + _DEFAULT_MAX_SEQ_LEN = 4096 + _DEFAULT_FREE_GPU_MEM_FRACTION = 0.5 + _DEFAULT_CUDA_GRAPH_BATCH_SIZES = [1, 2, 4, 8, 16, 32] + def _gen_disagg_worker_config(self): ctx_config = { - 'max_batch_size': 32, - 'max_num_tokens': 4096, - 'max_seq_len': 4096, + 'max_batch_size': self._DEFAULT_MAX_BATCH_SIZE, + 'max_num_tokens': self._DEFAULT_MAX_NUM_TOKENS, + 'max_seq_len': self._DEFAULT_MAX_SEQ_LEN, 'tensor_parallel_size': self._config.ctx_tp_size, 'enable_attention_dp': self._config.ctx_dp_size > 1, 'print_iter_log': True, 'disable_overlap_scheduler': True, 'kv_cache_config': { 'enable_block_reuse': False, - 'free_gpu_memory_fraction': 0.5, + 'free_gpu_memory_fraction': self._DEFAULT_FREE_GPU_MEM_FRACTION, 'dtype': 'fp8', },
2008-2046: Consider making benchmark parameters configurable.The benchmark command uses hardcoded values for input/output lengths and number of prompts. Consider making these configurable based on the test configuration.
def _get_disagg_benchmark_command(self): benchmark_script = os.path.join(self._llm_root, "tensorrt_llm", "serve", "scripts", "benchmark_serving.py") model_path = MODEL_PATH_DICT[self._config.model_name] model_dir = os.path.join(llm_models_root(), model_path) shared_gpt_path = os.path.join( llm_models_root(), "datasets", "ShareGPT_V3_unfiltered_cleaned_split.json") + + # Use test configuration values if available, otherwise use defaults + input_len = self._config.input_lens[0] if self._config.input_lens else 1024 + output_len = self._config.output_lens[0] if self._config.output_lens else 1024 + num_prompts = self._config.num_reqs if hasattr(self._config, 'num_reqs') else 320 + benchmark_cmd = [ 'python3', benchmark_script, '--model', model_dir, '--tokenizer', model_dir, '--dataset-name', 'random', '--dataset-path', shared_gpt_path, '--random-input-len', - '1024', + str(input_len), '--random-output-len', - '1024', + str(output_len), '--random-prefix-len', '0', '--num-prompts', - '320', + str(num_prompts),
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
tests/integration/defs/perf/test_perf.py(15 hunks)tests/integration/defs/perf/utils.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Filenames compiled into a target must be case-insensitively unique
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use spaces, not tabs; indent 4 spaces
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
🧬 Code graph analysis (2)
tests/integration/defs/perf/utils.py (2)
tests/integration/defs/perf/misc.py (1)
check_output(42-52)tests/integration/defs/trt_test_alternative.py (1)
popen(195-214)
tests/integration/defs/perf/test_perf.py (3)
tests/integration/defs/common.py (3)
convert_weights(163-522)get_cpp_benchmark(711-726)quantize_data(575-625)tests/integration/defs/conftest.py (2)
get_llm_root(166-176)llm_models_root(77-83)tests/integration/defs/perf/utils.py (3)
PerfDisaggScriptTestCmds(312-355)PerfMetricType(85-104)PerfScriptTestCmds(107-191)
🪛 Ruff (0.12.2)
tests/integration/defs/perf/test_perf.py
2030-2031: One-line docstring should fit on one line
Reformat to one line
(D200)
2041-2042: One-line docstring should fit on one line
Reformat to one line
(D200)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (9)
tests/integration/defs/perf/utils.py (1)
103-104: LGTM! New disaggregated server metrics added correctly.The new
DISAGG_SERVER_E2ELandDISAGG_SERVER_TTFTmetrics are properly added to thePerfMetricTypeenum following the existing naming convention.tests/integration/defs/perf/test_perf.py (8)
25-25: LGTM! Import added for YAML configuration generation.The yaml import is properly added to support configuration file generation for disaggregated server tests.
257-260: LGTM! Disaggregated server metric regex patterns defined.The regex patterns for E2EL and TTFT metrics are correctly defined and match the expected output format.
464-468: LGTM! Disaggregated server configuration attributes added.The new attributes for disaggregated server configuration are properly initialized in the constructor.
962-963: LGTM! Benchmark script set to None for disaggregated server.Correctly sets
benchmark_scripttoNonefor disaggregated server runtime since it doesn't use a traditional benchmark script.
1453-1460: LGTM! Disaggregated server command generation integrated.The code correctly generates and returns
PerfDisaggScriptTestCmdsfor disaggregated server tests with all required deployment commands.
1750-1764: LGTM! Disaggregated server metrics properly integrated.The code correctly handles disaggregated server metrics by using
DISAGG_SERVER_METRICSand the appropriate log query patterns.
2056-2056: Fix hardcoded test runner instantiation.The
run_perf_testfunction now always creates aMultiMetricPerfTestinstance. This should be conditional based on the test configuration.The code shows
MultiMetricPerfTestwhich is correct, notDisaggregatedServerPerfTestas mentioned in past comments. The actual implementation handles disaggregated server tests withinMultiMetricPerfTestthrough the configuration'sruntimeproperty.
509-511: Remove debug print statement in to_string method.The method correctly delegates to
_to_string_disaggfor disaggregated server runtime, but contains an incorrect return statement placement.elif self.runtime == "disagg_server": # trtllm-serve entries.append(f"disagg_server") return self._to_string_disagg(entries)The code is actually correct - it properly returns early for disagg_server case. My apologies for the confusion.
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (4)
tests/integration/defs/perf/utils.py (2)
319-351: Harden teardown: guard unbound locals and add wait timeouts with kill fallback.Unconditional
.terminate()/.wait()can raise UnboundLocalError if setup fails and can hang without timeouts.Apply:
def run_cmd(self, cmd_idx: int, venv) -> str: output = "" + server_proc = ctx_workers_proc = gen_workers_proc = None + TIMEOUT_SEC = 60 try: with ( # Start ctx workers @@ check_output(self.client_cmd, env=venv._new_env) output += check_output(self.benchmark_cmd, env=venv._new_env) finally: - server_proc.terminate() - ctx_workers_proc.terminate() - gen_workers_proc.terminate() - server_proc.wait() - ctx_workers_proc.wait() - gen_workers_proc.wait() + for p in (server_proc, ctx_workers_proc, gen_workers_proc): + if p is None: + continue + try: + p.terminate() + p.wait(timeout=TIMEOUT_SEC) + except Exception: + try: + p.kill() + p.wait(timeout=10) + except Exception: + pass return output
354-356: Return a string and document placeholder rationale.
get_cmd_strreturns a list here but other paths return a string; this can break result serialization. Also, add a short comment with the TODO/tracking ID for the placeholder.Apply:
- def get_cmd_str(self, cmd_idx) -> List[str]: - return ["disaggregated server tests, please check config files"] + def get_cmd_str(self, cmd_idx) -> List[str]: + # TODO(TRACK-ID): disagg command serialization not yet implemented; emit placeholder for now. + return "disaggregated server tests, please check config files"tests/integration/defs/perf/test_perf.py (2)
766-779: Add minimal validation for disagg configs.Early return skips all validation. Assert sane worker counts to fail fast.
Apply:
- if self.runtime == "disagg_server": - # TODO: validate disaggregated server config - return + if self.runtime == "disagg_server": + assert self.ctx_server_workers >= 0 and self.gen_server_workers >= 0, "Worker counts must be non-negative" + assert (self.ctx_server_workers + self.gen_server_workers) > 0, "At least one disagg worker must be configured" + return
1901-1947: Duplicate key inctx_config('disable_overlap_scheduler').Key appears twice; Python keeps the last, hiding the first. Remove the duplicate.
Apply:
ctx_config = { @@ - 'disable_overlap_scheduler': True, 'kv_cache_config': { 'enable_block_reuse': False, # 'free_gpu_memory_fraction': ctx_free_gpu_memory_fraction, 'free_gpu_memory_fraction': 0.5, 'dtype': 'fp8', }, - 'disable_overlap_scheduler': True, + 'disable_overlap_scheduler': True,
🧹 Nitpick comments (2)
tests/integration/defs/perf/test_perf.py (2)
1568-1574: Avoid dumping entire output logs on every metric parse.This is noisy and bloats logs. Consider gating behind a debug flag.
Apply:
- print_info(outputs[cmd_idx].split("\n")) + if os.environ.get("TRTLLM_DEBUG") == "1": + print_info(outputs[cmd_idx].split("\n"))
278-284: Remove unusedDISAGG_SERVER_METRICS_LOG_QUERIES.
get_metric_regexreads fromPERF_METRIC_LOG_QUERIES; this map is redundant.Apply:
- DISAGG_SERVER_METRICS_LOG_QUERIES = { - PerfMetricType.DISAGG_SERVER_E2EL: - re.compile(r"Median E2EL \(ms\):\s*(\d+\.?\d*)"), - PerfMetricType.DISAGG_SERVER_TTFT: - re.compile(r"Median TTFT \(ms\):\s*(\d+\.?\d*)"), - }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
tests/integration/defs/perf/test_perf.py(15 hunks)tests/integration/defs/perf/utils.py(3 hunks)tests/integration/test_lists/qa/llm_perf_sanity.yml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Filenames compiled into a target must be case-insensitively unique
Files:
tests/integration/test_lists/qa/llm_perf_sanity.ymltests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use spaces, not tabs; indent 4 spaces
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/integration/defs/perf/utils.pytests/integration/defs/perf/test_perf.py
🧬 Code graph analysis (2)
tests/integration/defs/perf/utils.py (3)
tests/integration/defs/perf/misc.py (1)
check_output(42-52)tests/integration/defs/trt_test_alternative.py (1)
popen(195-214)tests/integration/defs/triton_server/local_venv.py (1)
run_cmd(95-107)
tests/integration/defs/perf/test_perf.py (3)
tests/integration/defs/common.py (2)
get_cpp_benchmark(711-726)quantize_data(575-625)tests/integration/defs/conftest.py (2)
get_llm_root(166-176)llm_models_root(77-83)tests/integration/defs/perf/utils.py (5)
AbstractPerfScriptTestClass(358-652)PerfBenchScriptTestCmds(205-309)PerfDisaggScriptTestCmds(312-355)PerfMetricType(85-104)PerfScriptTestCmds(107-191)
🪛 Ruff (0.12.2)
tests/integration/defs/perf/test_perf.py
2035-2036: One-line docstring should fit on one line
Reformat to one line
(D200)
2046-2047: One-line docstring should fit on one line
Reformat to one line
(D200)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tests/integration/defs/perf/utils.py (1)
29-30: Confirmcheck_outputsupportsenv
The importeddefs.trt_test_alternative.check_outputis defined asdef check_output(*popenargs, timeout=None, start_new_session=True, **kwargs):and forwards all**kwargs(includingenv) to the underlyingPopencall. No changes required.tests/integration/defs/perf/test_perf.py (1)
311-315: Revisit absolute thresholds for disagg metrics (units).
50ms ignore window for E2EL/TTFT is large; confirm intended unit and target. Consider 5 ms (or a project-specific µs-scale value).Apply if appropriate:
- PerfMetricType.DISAGG_SERVER_E2EL: (0.1, 50), # Ignore E2EL regression < 50ms - PerfMetricType.DISAGG_SERVER_TTFT: (0.1, 50), # Ignore TTFT regression < 50ms + PerfMetricType.DISAGG_SERVER_E2EL: (0.1, 5), # Ignore E2EL regression < 5ms + PerfMetricType.DISAGG_SERVER_TTFT: (0.1, 5), # Ignore TTFT regression < 5ms
|
PR_Github #17672 [ run ] triggered by Bot |
|
PR_Github #17672 [ run ] completed with state |
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
b35f82d to
c5be58f
Compare
|
/bot run |
2 similar comments
|
/bot run |
|
/bot run |
|
PR_Github #17732 [ run ] triggered by Bot |
|
PR_Github #17732 [ run ] completed with state |
|
/bot run |
|
PR_Github #17749 [ run ] triggered by Bot |
|
PR_Github #17749 [ run ] completed with state |
|
/bot run |
|
PR_Github #17766 [ run ] triggered by Bot |
|
PR_Github #17766 [ run ] completed with state |
|
/bot run |
|
PR_Github #17916 [ run ] triggered by Bot |
|
PR_Github #17916 [ run ] completed with state |
|
/bot run |
|
PR_Github #17948 [ run ] triggered by Bot |
|
PR_Github #17948 [ run ] completed with state |
|
@Shixiaowei02 @LarryXFly @ruodil please help review, thanks! |
…ests. (NVIDIA#7503) Signed-off-by: Bo Deng <deemod@nvidia.com>
…ests. (NVIDIA#7503) Signed-off-by: Bo Deng <deemod@nvidia.com>
Summary by CodeRabbit
New Features
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.