KEMBAR78
rope_theta and max_position_embeddings from config by Yard1 · Pull Request #1096 · vllm-project/vllm · GitHub
Skip to content

Conversation

@Yard1
Copy link
Collaborator

@Yard1 Yard1 commented Sep 19, 2023

This PR lets rope_theta and max_position_embeddings to be read from model configs instead of hardcoding them. Notably, this allows codellama to work without issues with longer contexts.

Fixes #904

@WoosukKwon
Copy link
Collaborator

A quick question: Don't we have to use get_max_model_len(

def get_max_model_len(self) -> int:
) instead of the model's configuration?

BTW, there's a duplicated PR: #1057

@Yard1
Copy link
Collaborator Author

Yard1 commented Sep 19, 2023

Hmm good question. The HF Transformers behavior is to load it from config. I have no strong feelings one way or another, though I lean towards consistency between vllm and transformers. We can raise an exception if this value is set below max model len.

My bad about this being a duplicate, happy to close this one if needed

@WoosukKwon
Copy link
Collaborator

Hmm good question. The HF Transformers behavior is to load it from config. I have no strong feelings one way or another, though I lean towards consistency between vllm and transformers. We can raise an exception if this value is set below max model len.

My concern is that the bug will happen when the user-specified maximum model length is larger than the model's configuration. To my knowledge, at least mathematically, increasing max_positions of RoPE wouldn't affect the outputs as long as their positions fit in max_positions.

My bad about this being a duplicate, happy to close this one if needed\

No worries. As this is an urgent bug fix, I think we can take this PR and have @wanmok as a co-author (if you are ok with it).

@Yard1
Copy link
Collaborator Author

Yard1 commented Sep 19, 2023

Ofc I am fully ok with coautorship!

How about that exception, then? If there is a mismatch between the two, I feel it's better to let the user know explicitly and have them fix it, instead of trying to magic it away

@WoosukKwon
Copy link
Collaborator

How about that exception, then? If there is a mismatch between the two, I feel it's better to let the user know explicitly and have them fix it, instead of trying to magic it away

Got it. Then what's the role of the model_max_len argument? I thought it's to allow users to manually configure the length.

@Yard1
Copy link
Collaborator Author

Yard1 commented Sep 19, 2023

It can be used to set the length below the model maximum for eg. limiting the context length when serving or when memory constrained

@WoosukKwon
Copy link
Collaborator

WoosukKwon commented Sep 19, 2023

It can be used to set the length below the model maximum for eg. limiting the context length when serving or when memory constrained

Got it. Then I'm good with your idea to raise an error.

@Yard1
Copy link
Collaborator Author

Yard1 commented Sep 20, 2023

I think this PR will also fix #905

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yard1 LGTM! I slightly refactored the code in config.py to make the logic a bit clearer.

@WoosukKwon WoosukKwon merged commit 3302f0a into vllm-project:main Sep 20, 2023
@Yard1 Yard1 deleted the read_rope_from_config branch September 20, 2023 20:40
@Rogerspy
Copy link

I think this PR will also fix #905

No,this PR can not fix it.

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: wnma3mz <wnma3mz@gmail.com>
jikunshang added a commit to jikunshang/vllm that referenced this pull request Apr 25, 2025
<!--- pyml disable-next-line no-emphasis-as-heading -->

---------

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stuck at llm_engine.py:196

3 participants