KEMBAR78
Add minimum capability requirement for AWQ by WoosukKwon · Pull Request #1064 · vllm-project/vllm · GitHub
Skip to content

Conversation

@WoosukKwon
Copy link
Collaborator

Closes #1063

@esmeetu
Copy link
Member

esmeetu commented Sep 17, 2023

@WoosukKwon I think setup.py file might need add compute capability check whether to build quant kernel or not.

@WoosukKwon
Copy link
Collaborator Author

@esmeetu Thanks for the suggestion! I've instead added a guard to prevent compilation for the supported GPUs.

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

Comment on lines +14 to +15
namespace vllm {
namespace awq {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this namespace needed?

Copy link
Collaborator Author

@WoosukKwon WoosukKwon Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are optional, but for better coding convention. From google cpp style guide:

With few exceptions, place code in a namespace.

Namespace prevents naming conflicts, so it's pretty useful for external code like the AWQ kernels.

Comment on lines +71 to +78
capability = torch.cuda.get_device_capability()
capability = capability[0] * 10 + capability[1]
if capability < quant_config.get_min_capability():
raise ValueError(
f"The quantization method {model_config.quantization} is not "
"supported for the current GPU. "
f"Minimum capability: {quant_config.get_min_capability()}. "
f"Current capability: {capability}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the assert false in C++ if we have the check here?

Copy link
Member

@zhuohan123 zhuohan123 Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw the comments in the PR. Can we just change setup.py instead of the C++ files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's problematic when we want to build the wheel for all GPU architectures (e.g., for pypi publication or building docker image). In such a case, we cannot selectively include the extension according to the architecture. Therefore, I believe this is an easier solution, and in fact we already used this kind of guard for bfloat16 attention kernels, which do not support Turing and Volta GPUs.

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix!

@WoosukKwon WoosukKwon merged commit 2b1c116 into main Sep 18, 2023
@WoosukKwon WoosukKwon deleted the awq-warning branch September 18, 2023 19:02
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AWQ does not support Turing GPUs

3 participants