KEMBAR78
[Issue Template]Short one-line summary of the issue · Issue #270 · triton-inference-server/tensorrtllm_backend · GitHub
Skip to content

[Issue Template]Short one-line summary of the issue #270

@juney-nvidia

Description

@juney-nvidia

Environment

If applicable, please include the following:

  • CPU architecture (e.g., x86_64, aarch64)
  • CPU/Host memory size (if known)
  • GPU properties
    • GPU name (e.g., NVIDIA H100, NVIDIA A100, NVIDIA L40S)
    • GPU memory size (if known)
    • Clock frequencies used (if applicable)
  • Libraries
    • TensorRT-LLM backend branch or tag (e.g., main, v0.7.1)
    • TensorRT-LLM backend commit (if known)
    • Versions of TensorRT, AMMO, CUDA, cuBLAS, etc. used
    • Container used (if running TensorRT-LLM backend in a container)
  • NVIDIA driver version
  • OS (Ubuntu 22.04, CentOS 7, Windows 10)
  • Any other information that may be useful in reproducing the bug

Reproduction Steps

Provide detailed reproduction steps for the issue here, including any commands run on the command line.

Expected Behavior

Provide a brief summary of the expected behavior of the software. Provide output files or examples if possible.

Actual Behavior

Describe the actual behavior of the software and how it deviates from the expected behavior. Provide output files or examples if possible.

Additional Notes

Provide any additional context here you think might be useful for the TensorRT-LLM team to help debug this issue (such as experiments done, potential things to investigate).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions