Use TGI-like incremental detokenization #984

Yard1 · 2023-09-08T01:24:28Z

Allows us to simplify the code and ensure that the incremental decoding is correct. Notably, when the tests/models/test_models.py is ran for meta-llama/Llama-2-7b-hf, this change fixes one of the different outputs between vLLM and HF.

vLLM before:
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.It is designed to be used in production

vLLM with this PR:
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be used in production

Notice the space after the dot.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 · 2023-09-08T01:24:34Z

cc @WoosukKwon

zhuohan123 · 2023-09-08T06:18:21Z

Does this PR overlap with #589?

Yard1 · 2023-09-08T06:30:04Z

I didn't look too deeply but I think they both should fix the issue!

WoosukKwon

@Yard1 Thanks for submitting the PR! Left comments on the coding style and performance impact.

tests/engine/test_detokenize.py

vllm/transformers_utils/tokenizer.py

vllm/sequence.py

tests/engine/test_detokenize.py

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 · 2023-09-08T18:01:00Z

Marking as draft until benchmarking is complete, but it does look like it brings extra overhead @WoosukKwon

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 · 2023-09-12T19:56:56Z

@WoosukKwon I have reworked the code to combine the correctness of TGI approach with high performance of vLLM approach. The new iteration of this PR is not only more correct than the current vLLM implementation but also more performant. PTAL!

WoosukKwon · 2023-09-13T00:20:15Z

@Yard1 Could you please update the test as well? I got errors in the test:

>           output_tokens = prev_output_tokens + new_tokens
E           TypeError: unsupported operand type(s) for +: 'int' and 'list'

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

WoosukKwon

@Yard1 Awesome, Thanks a lot for the PR! This further boosts our performance.

Left several comments on the style.

vllm/transformers_utils/tokenizer.py

vllm/sequence.py

vllm/transformers_utils/tokenizer.py

tests/engine/test_detokenize.py

vllm/transformers_utils/tokenizer.py

tests/engine/test_detokenize.py

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

WoosukKwon

@Yard1 This is super nice. Thanks again for your hard work!

HermitSun · 2023-09-14T09:15:58Z

I found that aftering applying this pr, some models will produce some unexpected tokens when generating non-ascii texts (especially spaces and </s>). For examples, with the default generation params except max_tokens=256 and temperature=0.0:

baichuan-inc/Baichuan-13B-Chat
prompt:
<reserved_102>你是谁<reserved_103>
before:
我的名字叫百川大模型，是由百川智能的工程师们创造的人工智能大语言模型，我拥有回答问题、聊天互动、文本创作、逻辑推理、数学计算、代码生成等技能，您可以随时向我提问。\n您需要什么帮助吗？
after:
我的名字叫百川大模型，是由百川智能的工程师们创造的人工智能大语言模型，我拥有回答问题、聊天互动、文本创作、逻辑推理、数学计算、代码生成等技能，您可以随时向我提问。\n您需要什么帮助吗？？</s>
problem:
an extra space at the front, and extra ？ and </s> at the end.
baichuan-inc/Baichuan2-13B-Chat
prompt:
<reserved_102>你是谁<reserved_103>
before:
我是百川大模型，是由百川智能的工程师们创造的大语言模型，我可以和人类进行自然交流、解答问题、协助创作，帮助大众轻松、普惠的获得世界知识和专业服务。如果你有任何问题，可以随时向我提问
after:
我是百川大模型，是由百川智能的工程师们创造的大语言模型，我可以和人类进行自然交流、解答问题、协助创作，帮助大众轻松、普惠的获得世界知识和专业服务。如果你有任何问题，可以随时向我提问问</s>
problem:
extra 问 and </s> at the end.

Do you have any ideas?

Yard1 · 2023-09-14T15:58:18Z

Let me check- I believe the space up front is actually correct, but not sure about the characters at the end

Yard1 added 2 commits September 7, 2023 16:29

Use TGI-like incremental detokenization

7c6492c

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Add test

644cf09

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

WoosukKwon self-requested a review September 8, 2023 05:28

WoosukKwon reviewed Sep 8, 2023

View reviewed changes

Yard1 added 3 commits September 8, 2023 10:14

Formatting

9afd8d7

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Use full URL

8dcfb1c

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Add comment

42a0756

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 marked this pull request as draft September 8, 2023 17:54

Yard1 added 2 commits September 12, 2023 12:43

Rework

94a6e00

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Merge branch 'upstream_main' into detokenize_incrementally_rework

6aedcba

Yard1 marked this pull request as ready for review September 12, 2023 19:56

Yard1 requested a review from WoosukKwon September 12, 2023 19:56

Fix tests

66f92b8

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

leoxing99 mentioned this pull request Sep 13, 2023

fix(server): fix garbled characters stream output #589

Closed

WoosukKwon reviewed Sep 13, 2023

View reviewed changes

Yard1 and others added 2 commits September 13, 2023 12:30

Apply suggestions from code review

47f869a

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Update test

047eb34

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 requested a review from WoosukKwon September 13, 2023 19:38

Rename output_tokens -> tokens

3da6d08

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

WoosukKwon approved these changes Sep 13, 2023

View reviewed changes

WoosukKwon merged commit 9841d48 into vllm-project:main Sep 13, 2023

Yard1 deleted the detokenize_incrementally_rework branch September 13, 2023 20:43

Yard1 mentioned this pull request Sep 14, 2023

Fix detokenization leaving special tokens #1044

Merged

esmeetu mentioned this pull request Sep 30, 2023

Chinese and emojis not supported via OpenAI API #1231

Closed

WoosukKwon mentioned this pull request Oct 11, 2023

Not seeing any performance improvement from 0.2.0 #1269

Closed

jeffwang0516 mentioned this pull request Dec 26, 2023

bug: Output text from CompletionChunk is different with tokenizer.decode bentoml/OpenLLM#809

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Use TGI-like incremental detokenization (vllm-project#984)

ec7f7cc

Uh oh!

Use TGI-like incremental detokenization #984

Use TGI-like incremental detokenization #984

Uh oh!

Conversation

Yard1 commented Sep 8, 2023

Uh oh!

Yard1 commented Sep 8, 2023

Uh oh!

zhuohan123 commented Sep 8, 2023

Uh oh!

Yard1 commented Sep 8, 2023

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yard1 commented Sep 8, 2023

Uh oh!

Yard1 commented Sep 12, 2023

Uh oh!

WoosukKwon commented Sep 13, 2023

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

HermitSun commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yard1 commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HermitSun commented Sep 14, 2023 •

edited

Loading

Yard1 commented Sep 14, 2023 •

edited

Loading