[nvbug/5337601][fix] Fix disagg + speculative decoding #5525

mikeiovine · 2025-06-26T18:34:01Z

Description

Fix two issues with the two model flow:

SPEC_RESOURCE_MANAGER and DRAFT_KV_CACHE_MANAGER need to call prepare_resources on new requests.
prepare_draft_requests needs to be called for fitting disagg requests.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

Tabrizian · 2025-06-27T16:52:11Z

tensorrt_llm/_torch/pyexecutor/py_executor.py

                scheduled_batch, fitting_disagg_gen_init_requests, num_fitting_reqs = self._schedule(
                )

+                if self.draft_model_engine is not None or is_ngram:


I think this is not required, since we already added the draft tokens before calling the schedule. It is working correctly when I remove this change. Will add more targeted testing to make sure we don't incorrectly allocate kv cache in this scenario.

CC @pcastonguay / @lfr-0531 if I'm missing anything.

raayandhar · 2025-06-27T21:13:23Z

tensorrt_llm/_torch/pyexecutor/py_executor.py

+                    ResourceManagerType.SEQ_SLOT_MANAGER,
+                    ResourceManagerType.SPEC_RESOURCE_MANAGER,
+                    ResourceManagerType.DRAFT_KV_CACHE_MANAGER):
+                if resource_mgr_type in self.resource_manager.resource_managers:


Should we also check if self.resource_manager.resource_managers[resource_mgr_type] is not None? For example, when there is no draft model engine (e.g. ngram?) DRAFT_KV_CACHE_MANAGER is part of the resource manager dictionary, but is set to None, and as a result at this point we get a NoneType error when it calls .prepare_resources(). My impression is that this setting to None is by design?

(This might be a duplicate - I made this comment earlier but it seems to have disappeared or I can't find it. Also, oops, did not mean to review. Can delete if needed.)

Thanks, @raayandhar. I added this check. https://github.com/NVIDIA/TensorRT-LLM/pull/5558/files

Tabrizian · 2025-06-27T23:11:29Z

Closing in favour of https://github.com/NVIDIA/TensorRT-LLM/pull/5558/files

[nvbug/5337601][fix] Fix disagg + speculative decoding

f6f2c6d

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

mikeiovine requested a review from Tabrizian June 26, 2025 18:34

mikeiovine requested a review from a team as a code owner June 26, 2025 18:34

mikeiovine requested a review from schetlur-nv June 26, 2025 18:34

Tabrizian reviewed Jun 27, 2025

View reviewed changes

raayandhar reviewed Jun 27, 2025

View reviewed changes

Tabrizian closed this Jun 27, 2025

Tabrizian mentioned this pull request Jul 1, 2025

[nvbug/5337601][fix] Fix disagg + speculative decoding #5558

Merged

mikeiovine deleted the fix-disagg branch July 23, 2025 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[nvbug/5337601][fix] Fix disagg + speculative decoding #5525

[nvbug/5337601][fix] Fix disagg + speculative decoding #5525

Uh oh!

mikeiovine commented Jun 26, 2025

Uh oh!

Tabrizian Jun 27, 2025

Uh oh!

raayandhar Jun 27, 2025 •

edited

Loading

Uh oh!

Tabrizian Jun 27, 2025

Uh oh!

Tabrizian commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[nvbug/5337601][fix] Fix disagg + speculative decoding #5525

[nvbug/5337601][fix] Fix disagg + speculative decoding #5525

Uh oh!

Conversation

mikeiovine commented Jun 26, 2025

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

Tabrizian Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

raayandhar Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tabrizian Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Tabrizian commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

raayandhar Jun 27, 2025 •

edited

Loading