KEMBAR78
[TRTLLM-7733][feat] Executor changes to support helix parallelism by brb-nv · Pull Request #7972 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

Conversation

@brb-nv
Copy link
Collaborator

@brb-nv brb-nv commented Sep 24, 2025

Description

This MR makes initial executor changes to support helix parallelism which is a form of context parallelism.

Main things to note:

  • Helix parallelism is a decode-only feature run with disaggregated serving on gen server.
  • We partition the sequence length into blocks and then partition blocks those across CP ranks. Currently, we assign contiguous blocks to the same CP rank. Alternatively, we could assign blocks in a round-robin fashion.
  • In prepare_inputs, we do an allgather among CP ranks to find the full prompt length for the sequence to determine the query token's position_id. This is important to get the right rope embeddings for the query token.
  • Feature is not exposed through trtllm-serve, quickstart_advanced etc. This will be done when all other follow-up changes are merged.

Open items to be addressed in future MR:

  • For a given query token at decode, only one of the CP ranks should write the token's KV to its KV cache.

Test Coverage

Unit tests:

$ pytest tests/unittest/_torch/executor/test_pytorch_model_engine.py::PyTorchModelEngineTestCase::test_prepare_tp_inputs_with_helix_parallelism -s -v
$ pytest tests/unittest/_torch/executor/test_executor_request_queue.py::test_merge_helix_requests_insufficient_blocks_error -s -v
$ pytest tests/unittest/_torch/executor/test_executor_request_queue.py::test_merge_helix_requests_without_padding -s -v
$ pytest tests/unittest/_torch/executor/test_executor_request_queue.py::test_merge_helix_requests_with_padding -s -v

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@brb-nv brb-nv force-pushed the user/brb/executor-changes-for-helix branch 5 times, most recently from 6714ba0 to 7ece934 Compare September 25, 2025 23:08
@brb-nv brb-nv marked this pull request as ready for review September 25, 2025 23:08
@brb-nv brb-nv requested review from a team as code owners September 25, 2025 23:08
@brb-nv
Copy link
Collaborator Author

brb-nv commented Sep 25, 2025

/bot run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 25, 2025

📝 Walkthrough

Walkthrough

Introduces Helix cross-parallelism across executor request merging, model input preparation, and state updates; adds CP communicator creation and CP allgather support; replaces single has_cp with has_cp_ulysses/has_cp_helix; extends LlmRequest creation to accept position_ids; adds tests for Helix merging and TP input preparation.

Changes

Cohort / File(s) Summary of Changes
Distributed CP communicator and capability flags
tensorrt_llm/_torch/distributed/communicator.py
Replaces has_cp with has_cp_ulysses and has_cp_helix; creates CP communicator in __init__; adds create_cp_comm() and cp_allgather(obj); updates broadcast logic to use Ulysses-specific check and message.
Helix request merging path
tensorrt_llm/_torch/pyexecutor/executor_request_queue.py
Adds _merge_helix_requests(self, new_requests, tokens_per_block) to partition tokens/position_ids into blocks per CP rank, pad/trim, and attach child requests; updates _merge_requests to route HELIX to this path and refine unsupported cp_type error.
Model engine HELIX handling
tensorrt_llm/_torch/pyexecutor/model_engine.py, tensorrt_llm/_torch/pyexecutor/py_executor.py
Skips warmup when cp_type is set; explicitly handles HELIX in input preparation and request-state updates by routing through TP paths; clarifies unsupported cp_type assertions/errors; ensures TP preparation/update is invoked post-branch.
Position IDs plumbed to LlmRequest
tensorrt_llm/_torch/pyexecutor/llm_request.py
Adds optional position_ids parameter to executor_request_to_llm_request and forwards it to LlmRequest.
Tests: Helix merge and TP prepare
tests/unittest/_torch/executor/test_executor_request_queue.py, tests/unittest/_torch/executor/test_pytorch_model_engine.py
Adds HELIX merging tests (happy path and insufficient blocks error); adds TP inputs with HELIX test using mocked cp_allgather; extends imports (e.g., MagicMock, AttentionMetadata, CpType).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App as PyExecutor/ModelEngine
  participant Dist as Distributed
  participant CPComm as CP Communicator
  participant TP as TP Path

  rect rgb(245,248,255)
  note over App: Initialization
  App->>Dist: initialize mapping
  Dist->>Dist: create_cp_comm()
  Dist->>CPComm: instantiate from cp_group
  end

  rect rgb(245,255,245)
  note over App: Input preparation (HELIX)
  App->>App: _prepare_inputs(...)
  alt cp_type == STAR
    App->>TP: _prepare_star_inputs(...)
  else cp_type == HELIX
    App->>TP: _prepare_tp_inputs(...)
    TP->>Dist: cp_allgather(token_counts/metadata)
    Dist->>CPComm: allgather(obj)
    CPComm-->>Dist: gathered obj per CP rank
    Dist-->>TP: aggregated results
  else other cp_type
    App->>App: assert/raise unsupported
  end
  end
Loading
sequenceDiagram
  autonumber
  participant Queue as ExecutorRequestQueue
  participant R as RequestQueueItem
  participant Builder as LlmRequest Builder
  participant Child as Child Requests

  rect rgb(255,248,240)
  note over Queue: HELIX request merge
  Queue->>Queue: _merge_requests(new_requests, cp_config)
  alt cp_type == HELIX
    Queue->>Queue: _merge_helix_requests(..., tokens_per_block)
    loop per CP rank
      Queue->>Queue: slice tokens/position_ids into blocks
      Queue->>Queue: pad/trim last-rank padding
      Queue->>Builder: executor_request_to_llm_request(..., position_ids=...)
      Builder-->>Queue: LlmRequest per rank
      Queue->>Child: attach child requests
    end
    Queue-->>Queue: req_with_children (per-rank)
  else STAR/others
    Queue->>Queue: existing merge paths
  end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "[TRTLLM-7733][feat] Executor changes to support helix parallelism" clearly summarizes the primary change by indicating the ticket, feature type, and the addition of executor support for helix parallelism in a concise and accurate way.
Description Check ✅ Passed The pull request description includes the required ## Description, ## Test Coverage, and ## PR Checklist sections with clear explanations of what and why changes were made, lists the relevant unit tests for coverage, and follows the repository’s template structure including bot help instructions.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tensorrt_llm/_torch/pyexecutor/model_engine.py (2)

1526-1554: HELIX: avoid per‑beam allgather and fix position_id to be global.

  • Calling cp_allgather inside the beam loop is redundant and can cause unnecessary collectives.
  • More importantly, generation position_ids for HELIX should use the global position (sum across CP ranks). Compute once per request and reuse across beams.

This also sets up a single global position_id you can reuse for MRoPE generation below (see next comment).

Apply:

             request_ids.append(request.py_request_id)
             beam_width = request.sampling_config.beam_width
-            for beam in range(beam_width):
+            # Compute local and global position_id once per request (avoid per-beam collectives)
+            past_seen_token_num = (request.max_beam_num_tokens - 1
+                                   if (new_tokens_device is None or request.is_dummy or request.py_batch_idx is None)
+                                   else request.max_beam_num_tokens)
+            helix_position_id = past_seen_token_num
+            if self.mapping.has_cp_helix():
+                helix_position_id = sum(self.dist.cp_allgather(past_seen_token_num))
+
+            for beam in range(beam_width):
                 # the request has no previous tensor:
                 # (1) new_tokens_device is None, which means overlap scheduler is disabled; or
                 # (2) a dummy request; or
                 # (3) the first step in the generation server of disaggregated serving
                 if new_tokens_device is None or request.is_dummy or request.py_batch_idx is None:
                     # skip adding input_ids of CUDA graph dummy requests so that new_tokens_device
                     # can be aligned to the correct positions.
                     if not request.is_cuda_graph_dummy:
                         input_ids.append(request.get_last_tokens(beam))
-                    past_seen_token_num = request.max_beam_num_tokens - 1
                 else:
                     # the request has previous tensor
                     # previous_batch_indices is used per request, not per beam
                     # Only append it once for the first beam of each request
                     first_beam = 0
                     if beam == first_beam:
                         previous_batch_indices.append(request.py_batch_idx)
-                    past_seen_token_num = request.max_beam_num_tokens
-                position_id = past_seen_token_num
-                if self.mapping.has_cp_helix():
-                    # Do an allgather among CP ranks to get the complete sequence length seen by all CP ranks.
-                    past_seen_token_nums = self.dist.cp_allgather(
-                        past_seen_token_num)
-                    position_id = sum(past_seen_token_nums)
-                position_ids.append(position_id)
+                position_ids.append(helix_position_id)
                 num_cached_tokens_per_seq.append(past_seen_token_num)

1566-1572: HELIX + MRoPE: position deltas must add to global position, not local.

Generation MRoPE position ids currently add deltas to the local past_seen_token_num, diverging from the global position_ids you just fixed. Use the same global position.

Apply:

-                        gen_mrope_position_ids = (past_seen_token_num +
+                        gen_mrope_position_ids = (helix_position_id +
                                                   mrope_position_deltas).expand(
                                                       3, 1, 1)
tensorrt_llm/_torch/distributed/communicator.py (1)

452-460: Restore broadcast return value (and cover Helix CP)

The refactor drops the return statements, so broadcast() now always yields None, breaking every call site that relied on the object being returned. It also forgets to handle Helix CP, so a Helix-only mapping falls through the else branch and returns None. Please reinstate the returns and cover both CP flavors:

-        if self.mapping.has_cp_ulysses():
-            self.broadcast_cp(obj, root)
-        elif self.mapping.has_tp():
-            self.broadcast_tp(obj, root)
-        else:
-            pass
+        if self.mapping.has_cp_ulysses() or self.mapping.has_cp_helix():
+            return self.broadcast_cp(obj, root)
+        if self.mapping.has_tp():
+            return self.broadcast_tp(obj, root)
+        return obj

Without this, Helix ranks lose their config broadcast entirely.

🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/model_engine.py (1)

754-757: Avoid print; use logger and narrow the warmup skip condition.

Use logger instead of print. Also consider skipping warmup only when CP is actually active (cp_size > 1) and for known types that need it (e.g., STAR/HELIX) rather than any cp_type present.

Proposed change:

-        if cp_type is not None:
-            print("[ModelEngine::warmup] Skipping warmup for cp_type: ",
-                  cp_type.name)
+        if self.mapping.cp_size > 1 and cp_type in (CpType.STAR, CpType.HELIX):
+            logger.info(f"[ModelEngine.warmup] Skipping warmup for cp_type={cp_type.name}")
             return
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1eb6531 and 7ece934.

📒 Files selected for processing (7)
  • tensorrt_llm/_torch/distributed/communicator.py (4 hunks)
  • tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (2 hunks)
  • tensorrt_llm/_torch/pyexecutor/llm_request.py (2 hunks)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py (3 hunks)
  • tensorrt_llm/_torch/pyexecutor/py_executor.py (1 hunks)
  • tests/unittest/_torch/executor/test_executor_request_queue.py (2 hunks)
  • tests/unittest/_torch/executor/test_pytorch_model_engine.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tests/unittest/_torch/executor/test_executor_request_queue.py
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
  • tensorrt_llm/_torch/distributed/communicator.py
  • tests/unittest/_torch/executor/test_pytorch_model_engine.py
  • tensorrt_llm/_torch/pyexecutor/llm_request.py
  • tensorrt_llm/_torch/pyexecutor/executor_request_queue.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tests/unittest/_torch/executor/test_executor_request_queue.py
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
  • tensorrt_llm/_torch/distributed/communicator.py
  • tests/unittest/_torch/executor/test_pytorch_model_engine.py
  • tensorrt_llm/_torch/pyexecutor/llm_request.py
  • tensorrt_llm/_torch/pyexecutor/executor_request_queue.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tests/unittest/_torch/executor/test_executor_request_queue.py
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
  • tensorrt_llm/_torch/distributed/communicator.py
  • tests/unittest/_torch/executor/test_pytorch_model_engine.py
  • tensorrt_llm/_torch/pyexecutor/llm_request.py
  • tensorrt_llm/_torch/pyexecutor/executor_request_queue.py
🧬 Code graph analysis (6)
tests/unittest/_torch/executor/test_executor_request_queue.py (4)
tensorrt_llm/mapping.py (1)
  • CpType (21-29)
tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (2)
  • RequestQueueItem (24-38)
  • _merge_helix_requests (601-661)
tensorrt_llm/_torch/distributed/communicator.py (4)
  • rank (29-30)
  • cp_size (45-46)
  • cp_rank (57-58)
  • cp_config (97-98)
tensorrt_llm/_torch/pyexecutor/llm_request.py (1)
  • LlmRequest (293-464)
tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
tensorrt_llm/mapping.py (1)
  • CpType (21-29)
tensorrt_llm/_torch/pyexecutor/model_engine.py (2)
tensorrt_llm/_torch/distributed/communicator.py (2)
  • has_cp_helix (93-94)
  • cp_allgather (380-381)
tensorrt_llm/mapping.py (2)
  • has_cp_helix (414-416)
  • CpType (21-29)
tensorrt_llm/_torch/distributed/communicator.py (2)
tensorrt_llm/_utils.py (1)
  • mpi_comm (482-483)
tensorrt_llm/mapping.py (1)
  • cp_group (376-377)
tests/unittest/_torch/executor/test_pytorch_model_engine.py (5)
tensorrt_llm/_torch/attention_backend/interface.py (1)
  • AttentionMetadata (40-336)
tensorrt_llm/_torch/pyexecutor/scheduler.py (1)
  • ScheduledRequests (18-39)
tensorrt_llm/mapping.py (2)
  • CpType (21-29)
  • Mapping (32-519)
tests/unittest/_torch/executor/test_executor_request_queue.py (1)
  • mock_dist (17-33)
tensorrt_llm/_torch/pyexecutor/resource_manager.py (1)
  • KVCacheManager (142-1022)
tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (3)
tensorrt_llm/_torch/distributed/communicator.py (3)
  • cp_size (45-46)
  • cp_rank (57-58)
  • cp_config (97-98)
tensorrt_llm/mapping.py (2)
  • cp_rank (351-353)
  • CpType (21-29)
tensorrt_llm/_torch/pyexecutor/llm_request.py (1)
  • executor_request_to_llm_request (504-601)
🪛 Ruff (0.13.1)
tensorrt_llm/_torch/pyexecutor/py_executor.py

1718-1718: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

tensorrt_llm/_torch/pyexecutor/model_engine.py

2324-2324: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

tensorrt_llm/_torch/pyexecutor/executor_request_queue.py

618-620: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tensorrt_llm/_torch/pyexecutor/llm_request.py (1)

509-550: Position IDs plumbing looks good

Forwarding position_ids into the LlmRequest ctor is exactly what we need for the Helix path—no further concerns here.

tests/unittest/_torch/executor/test_pytorch_model_engine.py (1)

327-448: (informational) Helix TP input test exercises the right surface area.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20000 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20000 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15062 completed with status: 'FAILURE'

@brb-nv brb-nv force-pushed the user/brb/executor-changes-for-helix branch from 7ece934 to 068e1a7 Compare September 26, 2025 16:51
@brb-nv
Copy link
Collaborator Author

brb-nv commented Sep 26, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20111 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20111 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15154 completed with status: 'FAILURE'

Copy link
Collaborator

@nv-yilinf nv-yilinf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some minor comments

Copy link
Collaborator

@lfr-0531 lfr-0531 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~

@brb-nv brb-nv force-pushed the user/brb/executor-changes-for-helix branch from 068e1a7 to bcb3257 Compare October 1, 2025 17:18
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
@brb-nv brb-nv force-pushed the user/brb/executor-changes-for-helix branch from c58e82d to 251320e Compare October 1, 2025 17:33
@brb-nv
Copy link
Collaborator Author

brb-nv commented Oct 1, 2025

/bot run --disable-fail-fast

@brb-nv brb-nv enabled auto-merge (squash) October 1, 2025 17:36
@tensorrt-cicd
Copy link
Collaborator

PR_Github #20480 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20480 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15445 completed with status: 'ABORTED'

@brb-nv
Copy link
Collaborator Author

brb-nv commented Oct 1, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20490 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20490 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15454 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@brb-nv brb-nv merged commit bd3d0ad into NVIDIA:main Oct 2, 2025
5 checks passed
faradawn pushed a commit to faradawn/TensorRT-LLM that referenced this pull request Oct 2, 2025
…IDIA#7972)

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Faradawn Yang <faradawny@gmail.com>
evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request Oct 3, 2025
…IDIA#7972)

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
faradawn pushed a commit to faradawn/TensorRT-LLM that referenced this pull request Oct 3, 2025
…IDIA#7972)

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Faradawn Yang <faradawny@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants