KEMBAR78
Fix the sampler and update the triton/cuda kernels by nvchenghaoz · Pull Request #146 · nv-auto-deploy/TensorRT-LLM · GitHub
Skip to content

Conversation

@nvchenghaoz
Copy link

@coderabbitai summary

nvchenghaoz and others added 12 commits September 22, 2025 14:16
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
@nvchenghaoz nvchenghaoz merged commit 4b50b3e into feat/ad_linear_attention Sep 26, 2025
2 of 3 checks passed
lucaslie pushed a commit that referenced this pull request Sep 29, 2025
* Fix the bamba unit test

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* none: Add triton backend for ssm_transform and cuda backend for conv

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fully Use the TRT LLM kernels

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Add fake version for ssm transform op

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the datatype error in fake op

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the conv test error

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the triton ssm error

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the DemoLLM sampler mismatch

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Update the implementation for triton/cuda kernels

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the d2d memcpy for decode

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Revert the generator and remove the redundant code

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

---------

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
nvchenghaoz added a commit that referenced this pull request Oct 1, 2025
* Fix the bamba unit test

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* none: Add triton backend for ssm_transform and cuda backend for conv

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fully Use the TRT LLM kernels

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Add fake version for ssm transform op

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the datatype error in fake op

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the conv test error

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the triton ssm error

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the DemoLLM sampler mismatch

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Update the implementation for triton/cuda kernels

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the d2d memcpy for decode

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Revert the generator and remove the redundant code

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

---------

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
nvchenghaoz added a commit that referenced this pull request Oct 3, 2025
* Fix the bamba unit test

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* none: Add triton backend for ssm_transform and cuda backend for conv

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fully Use the TRT LLM kernels

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Add fake version for ssm transform op

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the datatype error in fake op

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the conv test error

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the triton ssm error

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the DemoLLM sampler mismatch

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Update the implementation for triton/cuda kernels

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Fix the d2d memcpy for decode

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

* Revert the generator and remove the redundant code

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

---------

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants