[FlaxBert] Add ForCausalLM #16995

sanchit-gandhi · 2022-04-28T16:46:19Z

What does this PR do?

Adds cross-attention blocks to the following module classes:

FlaxBertModule
FlaxRobertaModule (in part through copying FlaxBertModule)
FlaxBigBirdModule (in part through copying FlaxBertModule)
FlaxElectraModule (in part through copying FlaxBertModule)

Adds the following ForCausalLM model classes:

FlaxBertForCausalLM
FlaxRobertaForCausalLM (in part through copying FlaxBertForCausalLM)
FlaxBigBirdForCausalLM (in part through copying FlaxBertForCausalLM)
FlaxElectraForCausalLM (in part through copying FlaxBertForCausalLM)

Adds the following model tests:

FlaxRobertaForCausalLM
FlaxBigBirdForCausalLM
FlaxElectraForCausalLM

Note: FlaxBertForCausalLM is excluded due to the name mismatch with the PyTorch equivalent BertLMHeadModel. It is implicitly tested through the FlaxRobertaForCausalLM model tests, as well as in the following encoder-decoder model tests:

Bert-2-Bert (encoder-decoder)
Wav2Vec2-2-Bert (speech encoder-decoder)

HuggingFaceDocBuilderDev · 2022-04-28T17:00:57Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2022-05-02T17:06:10Z

tests/encoder_decoder/test_modeling_flax_encoder_decoder.py

        self.assertEqual(len(fx_outputs), len(pt_outputs), "Output lengths differ between Flax and PyTorch")
        for fx_output, pt_output in zip(fx_outputs, pt_outputs):
-            self.assert_almost_equals(fx_output, pt_output.numpy(), 1e-5)
+            self.assert_almost_equals(fx_output, pt_output.numpy(), 4e-2)


@ydshieh is 1e-5 now the default testing precision?

Found a bug in the FlaxBertModelTester and fixed! Thresholds now back to 1e-5 and passing (even with the randomly initialised decoder attention mask) :-)

Yes. So far for anything higher than 1e-5, I was able to find some issues, either in the models, or in the model testes.

src/transformers/models/bert/modeling_flax_bert.py

patrickvonplaten

Looks good to me - @sanchit-gandhi could you check though which models don't pass with 1e-5 and ideally why?

Overall 4e-2 is fine for me though cc @ydshieh what do you think?

ydshieh · 2022-05-02T19:17:50Z

Looks good to me - @sanchit-gandhi could you check though which models don't pass with 1e-5 and ideally why?

Overall 4e-2 is fine for me though cc @ydshieh what do you think?

Keep 1e-5 is much better, because so far I can always find some issues when I find something higher than 1e-5 (well, sometimes it took quite some time to figure out)

patrickvonplaten

Cool, feel free to merge @sanchit-gandhi

* [FlaxBert] Add ForCausalLM * make style * fix output attentions * Add RobertaForCausalLM * remove comment * fix fx-to-pt model loading * remove comment * add modeling tests * add enc-dec model tests * add big_bird * add electra * make style * make repo-consitency * add to docs * remove roberta test * quality * amend cookiecutter * fix attention_mask bug in flax bert model tester * tighten pt-fx thresholds to 1e-5 * add 'copied from' statements * amend 'copied from' statements * amend 'copied from' statements * quality

sanchit-gandhi added 14 commits May 2, 2022 13:07

[FlaxBert] Add ForCausalLM

632751f

make style

1a8098f

fix output attentions

2bcf242

Add RobertaForCausalLM

081ff94

remove comment

607df22

fix fx-to-pt model loading

94016fc

remove comment

7b506ea

add modeling tests

3bb01f8

add enc-dec model tests

c6d6149

add big_bird

2da3922

add electra

8ee169e

make style

192eded

make repo-consitency

ace4537

add to docs

62173c7

sanchit-gandhi force-pushed the FlaxBertForCausalLM branch from 9c9e49b to 62173c7 Compare May 2, 2022 11:07

sanchit-gandhi added 2 commits May 2, 2022 13:23

remove roberta test

a218785

quality

5d09c90

sanchit-gandhi marked this pull request as ready for review May 2, 2022 12:49

sanchit-gandhi changed the title ~~[WIP] [FlaxBert] Add ForCausalLM~~ [FlaxBert] Add ForCausalLM May 2, 2022

amend cookiecutter

a8bca7a

sanchit-gandhi requested a review from patrickvonplaten May 2, 2022 16:09

patrickvonplaten reviewed May 2, 2022

View reviewed changes

src/transformers/models/bert/modeling_flax_bert.py Show resolved Hide resolved

patrickvonplaten approved these changes May 2, 2022

View reviewed changes

sanchit-gandhi added 5 commits May 2, 2022 19:33

fix attention_mask bug in flax bert model tester

b4d7550

tighten pt-fx thresholds to 1e-5

e7826b2

add 'copied from' statements

8b46594

amend 'copied from' statements

912c776

amend 'copied from' statements

7046fae

quality

c4be218

patrickvonplaten approved these changes May 3, 2022

View reviewed changes

sanchit-gandhi merged commit cd9274d into huggingface:main May 3, 2022

sanchit-gandhi deleted the FlaxBertForCausalLM branch May 3, 2022 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FlaxBert] Add ForCausalLM #16995

[FlaxBert] Add ForCausalLM #16995

Uh oh!

sanchit-gandhi commented Apr 28, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 28, 2022 •

edited

Loading

Uh oh!

patrickvonplaten May 2, 2022

Uh oh!

sanchit-gandhi May 2, 2022

Uh oh!

ydshieh May 2, 2022

Uh oh!

Uh oh!

patrickvonplaten left a comment

Uh oh!

ydshieh commented May 2, 2022 •

edited

Loading

Uh oh!

patrickvonplaten left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[FlaxBert] Add ForCausalLM #16995

[FlaxBert] Add ForCausalLM #16995

Uh oh!

Conversation

sanchit-gandhi commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten May 2, 2022

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi May 2, 2022

Choose a reason for hiding this comment

Uh oh!

ydshieh May 2, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh commented May 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sanchit-gandhi commented Apr 28, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 28, 2022 •

edited

Loading

ydshieh commented May 2, 2022 •

edited

Loading