[WIP] Adding GPT2 with Multi Query Attention #21253

lvwerra · 2023-01-23T11:21:19Z

Adding GPT2 with Multi Query Attention

This PR adds a GPT2 architecture with Multi Query Attention (MQA). With MQA the V,K weights are shared across heads and only Qs are unique which makes it possible to run the model with very large batches.

This is the Architecture used in BigCode's SantaCoder.

There are a few things to do before we can merge the PR:

add performance improvements suggested by @jlamypoirier
fix tests:
- there is an issue with past
- there is an issue with loading the tokenizer (i guess missing vocab file in repo?)
- fix the generation examples

You can run the tests with:

RUN_SLOW=1 python -m pytest -s -v ./tests/models/gpt2mqa/

cc @bigximik @jlamypoirier @RaymondLi0

To review when ready I tag @ArthurZucker and @younesbelkada.

bigximik · 2023-01-23T16:49:53Z

Regarding tests test_batch_generation and test_batch_generation_2heads. If token initialisation class is changed form GPT2Tokenizer to GPT2TokenizerFast the test passes through until generated tokens assertion. Is it intended behaviour or the loading functionality should have rerouted from the default class?

jlamypoirier · 2023-01-23T17:35:50Z

src/transformers/models/gpt2mqa/modeling_gpt2mqa.py

+        attn_weights = attn_weights.view(batch_size, self.num_heads, query_length, key_length)
+
+        if self.scale_attn_weights:
+            attn_weights = attn_weights / torch.tensor(


Synchronisation issue #20061
Should be attn_weights = attn_weights / value.size(-1) ** 0.5 (that PR isn't great, we don't want to create a tensor here)

jlamypoirier · 2023-01-23T17:36:45Z

src/transformers/models/gpt2mqa/modeling_gpt2mqa.py

+            mask_value = torch.finfo(attn_weights.dtype).min
+            # Need to be a tensor, otherwise we get error: `RuntimeError: expected scalar type float but found double`.
+            # Need to be on the same device, otherwise `RuntimeError: ..., x and y to be on the same device`
+            mask_value = torch.tensor(mask_value, dtype=attn_weights.dtype).to(attn_weights.device)


Other synchronization issue, mask_value = torch.full([], mask_value, dtype=attn_weights.dtype).to(attn_weights.device) (this one should be a tensor according to the comment)

jlamypoirier · 2023-01-23T17:46:01Z

src/transformers/models/gpt2mqa/modeling_gpt2mqa.py

+        if layer_past is not None:
+            past_key, past_value = layer_past
+            # Concatenate on sequence dimension
+            key = torch.cat((past_key, key), dim=-1)


This is the slow op. Note that avoiding this will probably change the return type of the function, since we need to return the buffer cache and some extra information.

jlamypoirier · 2023-01-23T17:55:19Z

src/transformers/models/gpt2mqa/modeling_gpt2mqa.py

+            # self.c_attn = Conv1D(3 * self.embed_dim, self.embed_dim)
+            self.q_attn = Conv1D(self.embed_dim, self.embed_dim)
+            # Keys and values are shared across heads
+            self.kv_attn = Conv1D(2 * self.head_dim, self.embed_dim)


That's MQA_2 with q and kv separate right? That is likely a bit slower than keeping them together.

jlamypoirier · 2023-01-23T17:55:54Z

src/transformers/models/imagegpt/modeling_imagegpt.py

        >>> samples_img = [
        ...     np.reshape(np.rint(127.5 * (clusters[s] + 1.0)), [height, width, 3]).astype(np.uint8) for s in samples
-        ... ]  # convert color cluster tokens back to pixels
+        >>> ]  # convert color cluster tokens back to pixels


Are these intended?

No - I thought I removed those. Not sure why they came back 😂

github-actions · 2023-04-21T15:02:30Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

lvwerra · 2023-04-21T15:06:19Z

Closing in favour of #22575

leandro added 11 commits January 20, 2023 17:25

run template

67a460a

update modeling file

9053467

update config

95975ea

fix style

4705801

undo unnecessary changes

072e71d

add MQA Readme

03b18d9

fill info

fb53ff6

remove pruning tests

7511e64

remove upcasting tests

b415770

change checkpoint name in tests

47c2b2e

fix style

53b5ba6

jlamypoirier reviewed Jan 23, 2023

View reviewed changes

RaymondLi0 mentioned this pull request Jan 24, 2023

Preprocess hf bigcode-project/Megatron-LM#10

Merged

jlamypoirier mentioned this pull request Jan 26, 2023

Inference tasks and milestones bigcode-project/bigcode-inference-benchmark#11

Open

jlamypoirier mentioned this pull request Feb 7, 2023

Fork the model into GPTBigCode bigcode-project/transformers#6

Merged

github-actions bot closed this Mar 2, 2023

younesbelkada reopened this Mar 2, 2023

huggingface deleted a comment from github-actions bot Mar 2, 2023

ArthurZucker changed the title ~~Adding GPT2 with Multi Query Attention~~ [WIP] Adding GPT2 with Multi Query Attention Mar 2, 2023

huggingface deleted a comment from github-actions bot Mar 27, 2023

jlamypoirier mentioned this pull request Apr 4, 2023

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) #22575

Merged

lvwerra closed this Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Adding GPT2 with Multi Query Attention #21253

[WIP] Adding GPT2 with Multi Query Attention #21253

Uh oh!

lvwerra commented Jan 23, 2023

Uh oh!

bigximik commented Jan 23, 2023

Uh oh!

jlamypoirier Jan 23, 2023

Uh oh!

jlamypoirier Jan 23, 2023

Uh oh!

jlamypoirier Jan 23, 2023

Uh oh!

jlamypoirier Jan 23, 2023

Uh oh!

jlamypoirier Jan 23, 2023

Uh oh!

lvwerra Jan 24, 2023

Uh oh!

github-actions bot commented Apr 21, 2023

Uh oh!

lvwerra commented Apr 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[WIP] Adding GPT2 with Multi Query Attention #21253

[WIP] Adding GPT2 with Multi Query Attention #21253

Uh oh!

Conversation

lvwerra commented Jan 23, 2023

Adding GPT2 with Multi Query Attention

Uh oh!

bigximik commented Jan 23, 2023

Uh oh!

jlamypoirier Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

lvwerra Jan 24, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 21, 2023

Uh oh!

lvwerra commented Apr 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants