KEMBAR78
Make `max_model_len` configurable by Yard1 · Pull Request #972 · vllm-project/vllm · GitHub
Skip to content

Conversation

@Yard1
Copy link
Collaborator

@Yard1 Yard1 commented Sep 7, 2023

Allows the user to override derived max_model_len if they so desire.

We can also warn the user if the max_model_len is set above what vLLM has derived - lmk if you think that's a good idea!

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your contribution. And yes, please warn the user if the max_model_len is set above what vLLM has derived.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
@Yard1
Copy link
Collaborator Author

Yard1 commented Sep 12, 2023

@zhuohan123 added warning

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the contribution!

@zhuohan123 zhuohan123 merged commit 0bb1e88 into vllm-project:main Sep 12, 2023
@Yard1 Yard1 deleted the configurable_model_max_len branch September 13, 2023 01:03
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request May 12, 2025
Enable fp32 softmax in flat_pa_mla for accuracy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants