KEMBAR78
Use TGI-like incremental detokenization by Yard1 · Pull Request #984 · vllm-project/vllm · GitHub
Skip to content

Conversation

@Yard1
Copy link
Collaborator

@Yard1 Yard1 commented Sep 8, 2023

Allows us to simplify the code and ensure that the incremental decoding is correct. Notably, when the tests/models/test_models.py is ran for meta-llama/Llama-2-7b-hf, this change fixes one of the different outputs between vLLM and HF.

vLLM before:
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.It is designed to be used in production

vLLM with this PR:
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be used in production

Notice the space after the dot.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
@Yard1
Copy link
Collaborator Author

Yard1 commented Sep 8, 2023

cc @WoosukKwon

@WoosukKwon WoosukKwon self-requested a review September 8, 2023 05:28
@zhuohan123
Copy link
Member

Does this PR overlap with #589?

@Yard1
Copy link
Collaborator Author

Yard1 commented Sep 8, 2023

I didn't look too deeply but I think they both should fix the issue!

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yard1 Thanks for submitting the PR! Left comments on the coding style and performance impact.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
@Yard1 Yard1 marked this pull request as draft September 8, 2023 17:54
@Yard1
Copy link
Collaborator Author

Yard1 commented Sep 8, 2023

Marking as draft until benchmarking is complete, but it does look like it brings extra overhead @WoosukKwon

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
@Yard1 Yard1 marked this pull request as ready for review September 12, 2023 19:56
@Yard1
Copy link
Collaborator Author

Yard1 commented Sep 12, 2023

@WoosukKwon I have reworked the code to combine the correctness of TGI approach with high performance of vLLM approach. The new iteration of this PR is not only more correct than the current vLLM implementation but also more performant. PTAL!

@Yard1 Yard1 requested a review from WoosukKwon September 12, 2023 19:56
@WoosukKwon
Copy link
Collaborator

@Yard1 Could you please update the test as well? I got errors in the test:

>           output_tokens = prev_output_tokens + new_tokens
E           TypeError: unsupported operand type(s) for +: 'int' and 'list'

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yard1 Awesome, Thanks a lot for the PR! This further boosts our performance.

Left several comments on the style.

Yard1 and others added 2 commits September 13, 2023 12:30
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
@Yard1 Yard1 requested a review from WoosukKwon September 13, 2023 19:38
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yard1 This is super nice. Thanks again for your hard work!

@WoosukKwon WoosukKwon merged commit 9841d48 into vllm-project:main Sep 13, 2023
@Yard1 Yard1 deleted the detokenize_incrementally_rework branch September 13, 2023 20:43
@HermitSun
Copy link
Contributor

HermitSun commented Sep 14, 2023

I found that aftering applying this pr, some models will produce some unexpected tokens when generating non-ascii texts (especially spaces and </s>). For examples, with the default generation params except max_tokens=256 and temperature=0.0:

  1. baichuan-inc/Baichuan-13B-Chat
    prompt:
    <reserved_102>你是谁<reserved_103>
    before:
    我的名字叫百川大模型,是由百川智能的工程师们创造的人工智能大语言模型,我拥有回答问题、聊天互动、文本创作、逻辑推理、数学计算、代码生成等技能,您可以随时向我提问。\n您需要什么帮助吗?
    after:
    我的名字叫百川大模型,是由百川智能的工程师们创造的人工智能大语言模型,我拥有回答问题、聊天互动、文本创作、逻辑推理、数学计算、代码生成等技能,您可以随时向我提问。\n您需要什么帮助吗??</s>
    problem:
    an extra space at the front, and extra and </s> at the end.

  2. baichuan-inc/Baichuan2-13B-Chat
    prompt:
    <reserved_102>你是谁<reserved_103>
    before:
    我是百川大模型,是由百川智能的工程师们创造的大语言模型,我可以和人类进行自然交流、解答问题、协助创作,帮助大众轻松、普惠的获得世界知识和专业服务。如果你有任何问题,可以随时向我提问
    after:
    我是百川大模型,是由百川智能的工程师们创造的大语言模型,我可以和人类进行自然交流、解答问题、协助创作,帮助大众轻松、普惠的获得世界知识和专业服务。如果你有任何问题,可以随时向我提问问</s>
    problem:
    extra and </s> at the end.

Do you have any ideas?

@Yard1
Copy link
Collaborator Author

Yard1 commented Sep 14, 2023

Let me check- I believe the space up front is actually correct, but not sure about the characters at the end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants