-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
rope_theta and max_position_embeddings from config #1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hmm good question. The HF Transformers behavior is to load it from config. I have no strong feelings one way or another, though I lean towards consistency between vllm and transformers. We can raise an exception if this value is set below max model len. My bad about this being a duplicate, happy to close this one if needed |
My concern is that the bug will happen when the user-specified maximum model length is larger than the model's configuration. To my knowledge, at least mathematically, increasing
No worries. As this is an urgent bug fix, I think we can take this PR and have @wanmok as a co-author (if you are ok with it). |
|
Ofc I am fully ok with coautorship! How about that exception, then? If there is a mismatch between the two, I feel it's better to let the user know explicitly and have them fix it, instead of trying to magic it away |
Got it. Then what's the role of the |
|
It can be used to set the length below the model maximum for eg. limiting the context length when serving or when memory constrained |
Got it. Then I'm good with your idea to raise an error. |
|
I think this PR will also fix #905 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yard1 LGTM! I slightly refactored the code in config.py to make the logic a bit clearer.
No,this PR can not fix it. |
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: wnma3mz <wnma3mz@gmail.com>
<!--- pyml disable-next-line no-emphasis-as-heading --> --------- Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Signed-off-by: zhenwei <zhenweiliu@habana.ai> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
This PR lets
rope_thetaandmax_position_embeddingsto be read from model configs instead of hardcoding them. Notably, this allows codellama to work without issues with longer contexts.Fixes #904