Cached linear SSM + causal conv support with Bamba demo #134

lucaslie · 2025-09-19T22:49:19Z

see title and slack

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

2ez4bz · 2025-09-23T12:07:04Z

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_causal_conv.py

+    groups: int = 1,
+    padding_mode: str = "zeros",
+) -> torch.Tensor:
+    assert padding_mode == "zeros", "padding_mode must be zeros"


Seems unused?

To stay close to conv1d signature.

2ez4bz · 2025-09-23T12:28:22Z

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py

    - pages_per_seq: [ps_0, ps_1, ..., ps_{b-1}] where ps_i is the number of pages allocated for
      sequence i. Note that, for example, cache_loc[p_0:p_1] will correspond to the pages associated
      with sequence 1 in the batch.
+    - slot_idx: [s_0, s_1, ..., s_{b-1}]


What is a slot?

Sequence slot from the request object. UUID in the range [0, max_batch_size) assigned by the runtime.

Paged attention doesn't care about the sequence mapping; it only cares about which pages hold cache for a particular sequence.

For SSM, there's no notion of a page; you need the whole state.

2ez4bz · 2025-09-23T12:29:43Z

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_mamba.py

+        )
+
+
+def _segment_sum(input_tensor):


Should ref where these come from?

2ez4bz · 2025-09-23T12:32:53Z

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_mamba.py

+        y = y[:, :seq_len, :, :]
+    y = y.reshape(batch_size, seq_len, num_heads, head_dim)
+
+    return y, ssm_state


Might as well omit ssm_state from return; I'd added it originally to update the cache in the caller.

2ez4bz · 2025-09-23T12:37:10Z

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_causal_conv.py

+    updated_cache = conv_state_cache.roll(shifts=-1, dims=-1)
+    # [B, T=1, C] -> [B, C]
+    new_sample_bc = input.transpose(1, 2)[..., 0]
+    updated_cache[:, :, -1] = new_sample_bc.to(updated_cache.dtype).to(updated_cache.device)


Out of curiosity, when would they end up on different devices?

Probably cursor.

2ez4bz · 2025-09-23T13:01:21Z

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_causal_conv.py

+        end_i = start_i + length_i
+
+        mask_i = (flat_idx >= start_i.to(torch.long)) & (flat_idx < end_i.to(torch.long))
+        idx_i = torch.nonzero(mask_i, as_tuple=False).squeeze(-1)


Btw this can cause host-device synchronization if mask_i is a CUDA tensor.

2ez4bz · 2025-09-23T13:15:02Z

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_causal_conv.py

+        cls, source_attn_node: Node, cache_config: CacheConfig
+    ) -> CacheInitializerDict:
+        inp_fake: torch.Tensor = source_attn_node.args[0].meta["val"]
+        w_fake: torch.Tensor = source_attn_node.args[1].meta["val"]


Who is passing these in at runtime?

2ez4bz · 2025-09-23T15:02:56Z

tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_torch_causal_conv_cached_op.py

+    # Reference by per-sequence prefill
+    y_ref = torch.empty_like(y)
+    for i, ln in enumerate(lens):
+        st = 0 if i == 0 else lens[0]


Would this still be valid if lens had more than 2 elements?

2ez4bz · 2025-09-23T15:20:53Z

tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_torch_mamba_cached_op.py

+def mamba_env():
+    device = "cuda"
+    dtype = torch.float16
+    atol = 5e-2


Most of the diffs are actually 0.0, the largest being one of the conv cache's state at 0.003. Might be worth tightening these bounds?

2ez4bz · 2025-09-23T19:08:21Z

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_causal_conv.py

+    )
+
+
+@AttentionRegistry.register("torch_causal_conv")


This is what the config yaml maps to with attn_backend.

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

Copilot AI review requested due to automatic review settings September 19, 2025 22:49

lucaslie changed the base branch from feat/ad_coverage_week1 to feat/ad_linear_attention September 19, 2025 22:49

Copilot AI reviewed Sep 19, 2025

View reviewed changes

torch ssm and causal conv support

1242bbb

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

lucaslie changed the title ~~Basic linear caching support~~ Cached linear SSM + causal conv support with Bamba demo Sep 21, 2025

lucaslie force-pushed the ll/bamba_ops branch from 7f9e189 to 1242bbb Compare September 21, 2025 01:07

lucaslie self-assigned this Sep 21, 2025

lucaslie merged commit 84ac4cc into feat/ad_linear_attention Sep 21, 2025
3 of 5 checks passed

2ez4bz reviewed Sep 23, 2025

View reviewed changes

lucaslie added a commit that referenced this pull request Sep 29, 2025

torch ssm and causal conv support (#134)

4b45bbd

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

nvchenghaoz pushed a commit that referenced this pull request Oct 1, 2025

torch ssm and causal conv support (#134)

87ca9e8

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

nvchenghaoz pushed a commit that referenced this pull request Oct 3, 2025

torch ssm and causal conv support (#134)

928d3a7

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cached linear SSM + causal conv support with Bamba demo #134

Cached linear SSM + causal conv support with Bamba demo #134

Uh oh!

lucaslie commented Sep 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

2ez4bz Sep 23, 2025

Uh oh!

2ez4bz Sep 23, 2025

Uh oh!

2ez4bz Sep 23, 2025

Uh oh!

2ez4bz Sep 23, 2025

Uh oh!

2ez4bz Sep 23, 2025

Uh oh!

2ez4bz Sep 23, 2025

Uh oh!

2ez4bz Sep 23, 2025

Uh oh!

2ez4bz Sep 23, 2025

Uh oh!

2ez4bz Sep 23, 2025

Uh oh!

2ez4bz Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		)


		def _segment_sum(input_tensor):

		)


		@AttentionRegistry.register("torch_causal_conv")

Cached linear SSM + causal conv support with Bamba demo #134

Cached linear SSM + causal conv support with Bamba demo #134

Uh oh!

Conversation

lucaslie commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lucaslie commented Sep 19, 2025 •

edited

Loading