KEMBAR78
Fix OOM in attention kernel test by WoosukKwon · Pull Request #1223 · vllm-project/vllm · GitHub
Skip to content

Conversation

@WoosukKwon
Copy link
Collaborator

Currently, we use a very large sequence length (e.g., 40K) to test prompt attention. This causes OOM in reference implementation which does not use flash attention. As xformers is already tested with its own tests, I believe it's ok to use shorter sequences in our tests.

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix!

@WoosukKwon WoosukKwon merged commit 6f88f76 into main Sep 28, 2023
@WoosukKwon WoosukKwon deleted the fix-attn-test branch September 28, 2023 21:33
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants