[None][doc] Update kvcache part #7549

nv-guomingz · 2025-09-05T01:27:23Z

Cherry-pick #7382 into 1.0 branch

Summary by CodeRabbit

New Features
- Added a KV cache retention configuration for per-token-range priority and configurable decode retention during generation.
Documentation
- Clarified KV cache defaults (memory fraction applies after weights load) and improved Python quick-start examples.
- Added examples showing KV cache configuration and disabling block reuse.
- Clarified retention policy semantics (priority ordering, duration/None behavior) and refined speculative-decoding guidance (overlap scheduler auto-disabled for two-model setups).

coderabbitai · 2025-09-05T01:27:28Z

📝 Walkthrough

Walkthrough

Documentation updates clarifying KV cache configuration and retention semantics, adding KvCacheRetentionConfig API examples, reformatting code blocks, and removing/relaxing speculative-decoding constraints (overlap scheduler auto-disabled for two-model setups). Minor wording and capitalization edits across related docs.

Changes

Cohort / File(s)	Summary of changes
Examples: KV cache config `docs/source/examples/kvcacheconfig.md`	Clarified default free GPU memory fraction (applies after weights load), emphasized passing `kv_cache_config` when creating the LLM, added `KvCacheConfig` example (`free_gpu_memory_fraction=0.7`) and post-creation example (`enable_block_reuse = False`), converted code blocks to Python fences, minor wording/capitalization edits.
Examples: KV cache retention config & API `docs/source/examples/kvcacheretentionconfig.md`	Introduced `KvCacheRetentionConfig` and nested `TokenRangeRetentionConfig` usage in examples; shows token-range priority (tokens 0–4 → priority 100), `decode_retention_priority=35`, `decode_duration_ms=None`, and passing `kv_cache_retention_config` to `llm.generate`; clarified defaults and single-vs-list config usage.
Features: KV cache behavior `docs/source/features/kvcache.md`	Clarified retention semantics (lower-priority blocks freed first; priority reverts to default after `duration_ms` from first reuse; `None` disables expiry for that part), updated example references and wording, minor editorial fixes.
Features: Speculative decoding `docs/source/features/speculative-decoding.md`	Removed explicit KV cache reuse / overlap-scheduling constraints in quick start and MTP-specific notes; consolidated two-model limitation to “overlap scheduler disabled automatically.”

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant L as LLM API
  participant E as Engine
  participant K as KV Cache Manager

  U->>L: llm.generate(prompts, kv_cache_config, kv_cache_retention_config)
  L->>E: initialize/dispatch with configs
  E->>K: allocate/tag KV blocks (respect kv_cache_config)
  rect rgb(220,235,255)
    note over K: Apply retention policy<br/>token-range priorities & default priority
    E->>K: tag blocks with priorities/durations
  end
  alt Reuse eligible
    E->>K: reuse high-priority blocks
  else Eviction needed
    K-->K: evict lower-priority blocks first
  end
  E-->>L: token stream
  L-->>U: generated output

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

[None][doc] add legacy section for tensorrt engine #6724 — related docs changes touching KvCacheConfig usage and examples.

Suggested labels

Documentation

Suggested reviewers

laikhtewari
kaiyux

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (8)

docs/source/features/speculative-decoding.md (3)
188-189: Grammar: add article and tighten phrasing.
“Two-model based speculation implementations do not support overlap scheduler. It will be disabled automatically.” → improve clarity.
-Two-model based speculation implementations do not support overlap scheduler. It will be disabled automatically.
+Two‑model–based speculation implementations do not support the overlap scheduler; it is disabled automatically.
43-44: Make overlap-scheduler guidance consistent with the “auto-disabled” statement.
Examples still pass disable_overlap_scheduler=True unconditionally. Either remove it for two‑model setups or gate it on eagle3_one_model.
-llm = LLM("/path/to/target_model", speculative_config=speculative_config, disable_overlap_scheduler=True)
+llm = LLM("/path/to/target_model", speculative_config=speculative_config)
-# Only need to disable overlap scheduler if eagle3_one_model is False.
-llm = LLM("/path/to/target_model", speculative_config=speculative_config, disable_overlap_scheduler=True)
+# Disable overlap scheduler only for the two‑model path.
+llm = LLM("/path/to/target_model",
+          speculative_config=speculative_config,
+          disable_overlap_scheduler=not eagle3_one_model)
-llm = LLM("/path/to/target_model", speculative_config=speculative_config, disable_overlap_scheduler=True)
+llm = LLM("/path/to/target_model", speculative_config=speculative_config)
Also consider updating the YAML example (Lines 134–140) to reflect this nuance.

Also applies to: 64-66, 86-87

228-233: Typos and grammar in acceptance description.
Fix missing word and “drat” typo.
-Currently, only greedy sampling is supported for speculative decoding. A draft token is accepted if
-matches the previously decoded token exactly.
+Currently, only greedy sampling is supported for speculative decoding. A draft token is accepted if
+it matches the previously decoded token exactly.
@@
-`[t, d1, d2, d3]`, where `d1`, `d2`, and `d3` are drat tokens.
+`[t, d1, d2, d3]`, where `d1`, `d2`, and `d3` are draft tokens.
docs/source/examples/kvcacheconfig.md (2)
3-3: Inline code formatting: don’t include “argument” inside backticks.
-Set KV cache behavior by providing the optional ```kv_cache_config argument``` when you create the LLM engine.
+Set KV cache behavior by providing the optional `kv_cache_config` argument when you create the LLM engine.
3-3: Consistency: “quickstart” vs. “quick start”.
Use one form throughout (“quick start” is used elsewhere).
-Consider the quickstart example found in ```examples/pytorch/quickstart.py```:
+Consider the quick start example found in `examples/pytorch/quickstart.py`:
-This code disables block reuse for the quick start example.
+This code disables block reuse for the quick start example.
(Apply the same spelling uniformly across the docs.)

Also applies to: 47-47
docs/source/examples/kvcacheretentionconfig.md (2)
31-31: Clarify token-range bounds (inclusive/exclusive).
You say “first four tokens” and pass (0, 4). If end is exclusive, that’s correct; please state it explicitly to avoid ambiguity.
-Assume you know that the first four tokens of each prompt represent a system prompt ...
+Assume you know that the first four tokens of each prompt (token indices 0–3; end index is exclusive) represent a system prompt ...
49-56: Variable naming style in example.
Prefer snake_case for Python variables.
-    tokenRangeRetentionConfig = KvCacheRetentionConfig.TokenRangeRetentionConfig(0, 4, 100, None)
-    kv_cache_retention_config = KvCacheRetentionConfig(
-        token_range_retention_configs=[tokenRangeRetentionConfig],
+    token_range_retention_config = KvCacheRetentionConfig.TokenRangeRetentionConfig(0, 4, 100, None)
+    kv_cache_retention_config = KvCacheRetentionConfig(
+        token_range_retention_configs=[token_range_retention_config],
docs/source/features/kvcache.md (1)
45-46: Class name casing.
Use KvCacheConfig (matching the API and link target), not “KVCacheConfig”.
-Many of the features in the KV cache system are optional or have user defined properties that alter how they work. Users can control KV cache features through class [KVCacheConfig](https://nvidia.github.io/TensorRT-LLM/llm-api/reference.html#tensorrt_llm.llmapi.KvCacheConfig).
+Many of the features in the KV cache system are optional or have user‑defined properties that alter how they work. Users can control KV cache features through class [KvCacheConfig](https://nvidia.github.io/TensorRT-LLM/llm-api/reference.html#tensorrt_llm.llmapi.KvCacheConfig).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e07fa9d and 9b44c22.

📒 Files selected for processing (4)

docs/source/examples/kvcacheconfig.md (2 hunks)
docs/source/examples/kvcacheretentionconfig.md (4 hunks)
docs/source/features/kvcache.md (2 hunks)
docs/source/features/speculative-decoding.md (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: thorjohnsen
PR: NVIDIA/TensorRT-LLM#6910
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-14T21:04:50.248Z
Learning: In KV cache onboarding logic during prefill in cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, when calculating which blocks fall within the attention window, use getTokensPerBlock() to advance token indices rather than block->getUniqueTokens().size(), because the calculation needs to consider the post-prefill state where blocks will be filled to capacity, not their current token count.

📚 Learning: 2025-08-14T21:04:50.248Z

Learnt from: thorjohnsen
PR: NVIDIA/TensorRT-LLM#6910
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-14T21:04:50.248Z
Learning: In KV cache onboarding logic during prefill in cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, when calculating which blocks fall within the attention window, use getTokensPerBlock() to advance token indices rather than block->getUniqueTokens().size(), because the calculation needs to consider the post-prefill state where blocks will be filled to capacity, not their current token count.

Applied to files:

docs/source/examples/kvcacheretentionconfig.md
docs/source/features/kvcache.md

📚 Learning: 2025-08-15T06:46:54.897Z

Learnt from: eopXD
PR: NVIDIA/TensorRT-LLM#6767
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-15T06:46:54.897Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp addToken function, newly allocated blocks are unshared by design. The beam search path in addToken (when sequence.getNumTokens() > windowSize) is currently broken/non-functional with SWA, so the block allocation doesn't follow a shared-then-unshared pattern.

Applied to files:

docs/source/features/kvcache.md

🪛 LanguageTool

docs/source/examples/kvcacheretentionconfig.md

[grammar] ~31-~31: There might be a mistake here.
Context: ... 35 on a scale from 1 to 100, where 100 is highest priority and 1 is lowest priori...

(QB_NEW_EN)

[grammar] ~31-~31: There might be a mistake here.
Context: ...00, where 100 is highest priority and 1 is lowest priority. Assume you know that t...

(QB_NEW_EN)

docs/source/features/speculative-decoding.md

[grammar] ~188-~188: There might be a mistake here.
Context: ...ased speculation implementations do not support overlap scheduler. It will be disabled ...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

docs/source/features/kvcache.md (1)

33-34: Cross‑doc consistency: reuse + speculative decoding.
This now claims “reuse across requests is supported by all speculative decoding models.” Please verify against the current backend behavior, and align examples in speculative‑decoding.md (which still set disable_overlap_scheduler=True in two‑model examples).

docs/source/examples/kvcacheconfig.md

docs/source/features/kvcache.md

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (11)

docs/source/features/speculative-decoding.md (5)
188-188: Polish grammar: add article and hyphenation.

Suggested: “Two-model-based speculation implementations do not support the overlap scheduler; it will be disabled automatically.”
-Two-model based speculation implementations do not support overlap scheduler. It will be disabled automatically.
+Two-model-based speculation implementations do not support the overlap scheduler; it will be disabled automatically.
43-44: Align examples with auto-disable behavior: remove redundant flag.

Since two-model setups auto-disable overlap scheduling, drop the explicit disable_overlap_scheduler=True to avoid confusion.
-llm = LLM("/path/to/target_model", speculative_config=speculative_config, disable_overlap_scheduler=True)
+llm = LLM("/path/to/target_model", speculative_config=speculative_config)
65-66: Make EAGLE 3 snippet reflect conditional need.

Either omit the flag entirely (recommended) or show it conditionally only when using two-model. Example below removes it for clarity.
-# Only need to disable overlap scheduler if eagle3_one_model is False.
-llm = LLM("/path/to/target_model", speculative_config=speculative_config, disable_overlap_scheduler=True)
+# Two-model setups auto-disable overlap scheduling.
+llm = LLM("/path/to/target_model", speculative_config=speculative_config)
86-87: NGram example: drop disable_overlap_scheduler=True.

Auto-disable applies to two-model algorithms; keep the example minimal.
-llm = LLM("/path/to/target_model", speculative_config=speculative_config, disable_overlap_scheduler=True)
+llm = LLM("/path/to/target_model", speculative_config=speculative_config)
134-140: YAML sample: remove disable_overlap_scheduler or note it’s auto/ignored.

To reduce user confusion, either delete the key or add a brief comment that it’s auto-disabled for two-model setups.
-disable_overlap_scheduler: true
 speculative_config:
   decoding_type: Eagle
   max_draft_len: 4
   speculative_model: /path/to/draft/model
docs/source/examples/kvcacheconfig.md (3)
3-3: Inline code formatting: remove triple backticks and “argument” from code span.

Use single backticks and keep prose outside the code span.
-Set KV cache behavior by providing the optional ```kv_cache_config argument``` when you create the LLM engine. Consider the quickstart example found in ```examples/pytorch/quickstart.py```:
+Set KV cache behavior by providing the optional `kv_cache_config` argument when you create the LLM engine. Consider the quick start example found in `examples/pytorch/quickstart.py`:
31-31: Consistent terminology: “quick start” (two words).

Matches usage elsewhere in the docs.
-You can reduce this value to 0.7 by adding the following lines to the quickstart example:
+You can reduce this value to 0.7 by adding the following lines to the quick start example:
39-39: Inline code formatting for class name.

Use single backticks, not triple.
-You can also set properties after you create ```KvCacheConfig```. For example:
+You can also set properties after you create `KvCacheConfig`. For example:
docs/source/examples/kvcacheretentionconfig.md (3)
3-3: Inline code formatting: use single backticks.

Applies to both occurrences on this line.
-You can change block priority by providing the optional ```kv_cache_retention_config``` argument when you submit a request to the LLM engine. Consider the quick start example found in ```examples/pytorch/quickstart.py```:
+You can change block priority by providing the optional `kv_cache_retention_config` argument when you submit a request to the LLM engine. Consider the quick start example found in `examples/pytorch/quickstart.py`:
49-56: PEP 8 naming + clarify token range bounds.

Use snake_case for variables in Python examples.

Please clarify whether the end index is inclusive or exclusive to prevent off-by-one errors.
-    tokenRangeRetentionConfig = KvCacheRetentionConfig.TokenRangeRetentionConfig(0, 4, 100, None)
-    kv_cache_retention_config = KvCacheRetentionConfig(
-        token_range_retention_configs=[tokenRangeRetentionConfig],
+    token_range_retention_config = KvCacheRetentionConfig.TokenRangeRetentionConfig(0, 4, 100, None)
+    kv_cache_retention_config = KvCacheRetentionConfig(
+        token_range_retention_configs=[token_range_retention_config],
         decode_retention_priority=35, # Set generated tokens to default priority
         decode_duration_ms=None)
Follow-up: If the constructor expects a half-open range [start, end), consider adding a note like “Indices are [start, end) (end exclusive).”

68-68: Inline code formatting: single backticks.
-This example uses a single ```kv_cache_retention_config``` object for all the prompts. You can also provide a list that must have the same length as the list of prompts.
+This example uses a single `kv_cache_retention_config` object for all the prompts. You can also provide a list that must have the same length as the list of prompts.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9b44c22 and 7a7f006.

📒 Files selected for processing (4)

docs/source/examples/kvcacheconfig.md (2 hunks)
docs/source/examples/kvcacheretentionconfig.md (4 hunks)
docs/source/features/kvcache.md (2 hunks)
docs/source/features/speculative-decoding.md (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

docs/source/features/kvcache.md

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-14T21:04:50.248Z

Learnt from: thorjohnsen
PR: NVIDIA/TensorRT-LLM#6910
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-14T21:04:50.248Z
Learning: In KV cache onboarding logic during prefill in cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, when calculating which blocks fall within the attention window, use getTokensPerBlock() to advance token indices rather than block->getUniqueTokens().size(), because the calculation needs to consider the post-prefill state where blocks will be filled to capacity, not their current token count.

Applied to files:

docs/source/examples/kvcacheretentionconfig.md

🪛 LanguageTool

docs/source/examples/kvcacheretentionconfig.md

[grammar] ~31-~31: There might be a mistake here.
Context: ... 35 on a scale from 1 to 100, where 100 is highest priority and 1 is lowest priori...

(QB_NEW_EN)

[grammar] ~31-~31: There might be a mistake here.
Context: ...00, where 100 is highest priority and 1 is lowest priority. Assume you know that t...

(QB_NEW_EN)

docs/source/features/speculative-decoding.md

[grammar] ~188-~188: There might be a mistake here.
Context: ...ased speculation implementations do not support overlap scheduler. It will be disabled ...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

nv-guomingz · 2025-09-05T07:16:20Z

/bot run

tensorrt-cicd · 2025-09-05T07:21:40Z

PR_Github #17758 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-05T07:21:41Z

PR_Github #17758 [ run ] completed with state DISABLED
L0 testing is limited to prioritized users. User nv-guomingz is not in the prioritized list. L0 testing cannot be triggered.

nv-guomingz · 2025-09-05T07:22:26Z

/bot skip --comment "docs only change"

tensorrt-cicd · 2025-09-05T07:27:44Z

PR_Github #17759 [ skip ] triggered by Bot

tensorrt-cicd · 2025-09-05T07:46:11Z

PR_Github #17759 [ skip ] completed with state SUCCESS
Skipping testing for commit 7a7f006

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

nv-guomingz requested review from laikhtewari and thorjohnsen September 5, 2025 01:27

nv-guomingz requested a review from a team as a code owner September 5, 2025 01:27

nv-guomingz changed the title ~~[None][Doc] Update kvcache part and rename TensorRT-LLM to TensorRT LLM.~~ [None][Doc] Update kvcache part Sep 5, 2025

nv-guomingz force-pushed the user/guomingz/1.0_doc_minor_update branch from 94a6315 to 9b44c22 Compare September 5, 2025 01:28

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

docs/source/examples/kvcacheconfig.md Outdated Show resolved Hide resolved

docs/source/features/kvcache.md Show resolved Hide resolved

nv-guomingz added the 1.0_doc label Sep 5, 2025

[None][Doc] Update kvcache part base on NVIDIA#7382.

7a7f006

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

nv-guomingz force-pushed the user/guomingz/1.0_doc_minor_update branch from 9b44c22 to 7a7f006 Compare September 5, 2025 06:47

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

chzblych approved these changes Sep 5, 2025

View reviewed changes

nv-guomingz changed the title ~~[None][Doc] Update kvcache part~~ [None][doc] Update kvcache part Sep 5, 2025

nv-guomingz enabled auto-merge (squash) September 5, 2025 07:25

nv-guomingz merged commit f9187b2 into NVIDIA:release/1.0 Sep 5, 2025
4 of 6 checks passed

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Sep 8, 2025

[None][doc] Update kvcache part (NVIDIA#7549)

f21c749

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

dominicshanshan mentioned this pull request Sep 8, 2025

[None][chore] Mass integration of release/1.0 - 4th (release/1.0 doc change mainly) #7607

Merged

1 task

coderabbitai bot mentioned this pull request Sep 8, 2025

[TRTLLM-7958][doc] add 1.0 release notes #7605

Merged

1 task

nv-guomingz added a commit that referenced this pull request Sep 9, 2025

[None][doc] Update kvcache part (#7549)

35dac55

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

nv-guomingz deleted the user/guomingz/1.0_doc_minor_update branch September 30, 2025 07:57

[None][doc] Update kvcache part #7549

[None][doc] Update kvcache part #7549

Uh oh!

Conversation

nv-guomingz commented Sep 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

nv-guomingz commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

nv-guomingz commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nv-guomingz commented Sep 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 5, 2025 •

edited

Loading