KEMBAR78
[Bug]: Issue with MedusaConfig always using LLaMAConfig instead of QWenConfig · Issue #7208 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

[Bug]: Issue with MedusaConfig always using LLaMAConfig instead of QWenConfig #7208

@singularity-123

Description

@singularity-123

System Info

  • CPU architecture x86_64
  • CPU/Host memory size 32 GB
  • GPU properties
    • GPU name NVIDIA GeForce RTX 4090
    • GPU memory size 24,564 MiB (24 GB) per GPU
    • Clock frequencies used: N/A
  • Libraries
    • TensorRT-LLM branch or tag v0.16.0
    • TensorRT-LLM commit: N/A
    • Versions of TensorRT, Modelopt, CUDA, cuBLAS, etc. used
      • TensorRT version 10.7.0.post1
      • Modelopt version 0.21.1
      • CUDA version 12.6
      • cuBLAS version 12.6.4.1
    • Container used
  • NVIDIA driver version
  • OS Ubuntu 24.04.1 LTS

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from tensorrt_llm.models.medusa.config import MedusaConfig

config_dict = {
"architecture": "MedusaForCausalLM",
"dtype": "float16",
"logits_dtype": "float16",
"num_hidden_layers": 28,
"num_attention_heads": 12,
"num_key_value_heads": 2,
"hidden_size": 1536,
"norm_epsilon": 1e-06,
"vocab_size": 155676,
"max_position_embeddings": 32768,
"hidden_act": "silu",
"embedding_sharding_dim": 0,
"qwen_type": "qwen2",
"model_type": "qwen",
}

medusa_config = MedusaConfig(**config_dict)

print(f"Medusa GenericMedusaConfig: {medusa_config.config}")

Expected behavior

[TensorRT-LLM] TensorRT-LLM version: 0.16.0
Medusa GenericMedusaConfig: <tensorrt_llm.models.qwen.config.QWenConfig object at 0x7f7bb32591f0>

actual behavior

[TensorRT-LLM] TensorRT-LLM version: 0.16.0
Medusa GenericMedusaConfig: <tensorrt_llm.models.llama.config.LLaMAConfig object at 0x7f6fd2d65550>

I expect that when the model_type is set to qwen, the GenericMedusaConfig should be set to QWenConfig instead of LLaMAConfig.

additional notes

I’m currently working with TensorRT-LLM v0.16.0. The issue occurs in the code at https://github.com/NVIDIA/TensorRT-LLM/blob/v0.16.0/tensorrt_llm/models/medusa/config.py#L34.
The problem is that kwargs is a dictionary, which doesn’t support attribute access. Using hasattr on it is therefore incorrect—hasattr(kwargs, 'model_type') will always return False.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Labels

InvestigatingLLM API<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.bugSomething isn't workingtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions