[FIX] Don't initialize parameter by default #1067

zhuohan123 · 2023-09-17T08:24:28Z

Without this fix, many models cannot run. For example, gpt2 will initialize the parameters for vocab and the following assert will immediately fail.

WoosukKwon

LGTM. Let's actually remove the parameters when refactoring.

…llm-project#1067) Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

[CI]Moe alltoall communication optimization The DeepSeek V3/R1 model has 256 routing experts. During parallel inference, if the load of an EP rank is high, the overall communication and computing time is slowed down, which becomes a weakness of parallel inference because the load is unevenly distributed. However, the data volume in the prefill phase is large, and the inter-card communication time consumption/calculation time consumption and the data volume are closely related to each other. Therefore, less non-linear precision loss can be used to obtain a near-linear performance improvement. During parallel inference, global synchronization occurs during communication. As a result, the card with low load completes the calculation first and waits for the card with the highest load to complete the calculation. Therefore, if the load is unbalanced, the card with high load slows down the overall time consumption. Significant performance gains can be achieved by discarding a small number of tokens, which is unacceptable in some precision-sensitive scenarios. However, similar to quantification, it is a solution that uses an acceptable precision loss in some scenarios for performance. In addition, a trade-off between performance and precision can be achieved by configuring a proportion of discarded tokens. Perform the test on A3. The batch size is 8 (B), the prompt length is 3.5K tokens (S), and the parallel configuration is as follows: AttnDP=2, AttnTP=8, MoeTP=1, and MoeEP=16. In this sence, we got a 10%-15% performance gain. Plus, the next version, we'll have an alltoallv moe. --------- Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>

[FIX] Don't initialize parameter by default

2207b36

zhuohan123 requested a review from WoosukKwon September 17, 2023 08:24

revert changes in test code

1a85ba5

WoosukKwon approved these changes Sep 17, 2023

View reviewed changes

zhuohan123 merged commit 90979c3 into main Sep 18, 2023

zhuohan123 deleted the no-params-init branch October 16, 2023 21:02

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

[FIX] Don't initialize parameter by default (vllm-project#1067)

504dbd7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FIX] Don't initialize parameter by default #1067

[FIX] Don't initialize parameter by default #1067

Uh oh!

zhuohan123 commented Sep 17, 2023

Uh oh!

WoosukKwon left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[FIX] Don't initialize parameter by default #1067

[FIX] Don't initialize parameter by default #1067

Uh oh!

Conversation

zhuohan123 commented Sep 17, 2023

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants