KEMBAR78
TF generate refactor - Greedy Search by patrickvonplaten · Pull Request #15562 · huggingface/transformers · GitHub
Skip to content

Conversation

@patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Feb 8, 2022

What does this PR do?

This PR is the first step to refactor TF's generate() method similar in spirit to https://discuss.huggingface.co/t/big-generate-refactor/1857 .

Motivation

The main reasons for the refactor are:

  • Make generate more readable and easier to understand.
  • Disentangle different components of generate, e.g. the model_input preparation, the logits processor creation and application, the sub-generation methods. Each of these components of generate should be as independent and extendable as possible. We've seen that such a design greatly helps to foster community contributions to the generate method for PyTorch.
  • Make generate more generally applicable for future use cases. This mostly concerns multi-modal models, such as speech to text and image to text.

In addition the final goal is to significantly speed up TF's generate method by making it XLA-compatible. This PR should greatly help at making the generate() method compatible with XLA by:

  • clearly separating the model inputs preparation part and the auto-regressive part. The auto-regressive part is handled in the sub-generation method which should now be much shorter. This is the part which will require some heavy changes to work with XLA.
  • Moving out all of the logits processing logic into a LogitsProcessorList class. We can now simply test each of those classes separately whether they are XLA-compatible and don't have to make each of them XLA-compatible. E.g. bad_token_ids and ngram_repeat are probably not at all XLA-compatible.

Proposed refactor design and steps to achieve it

TF's generate can currently handle more or less 4 different types of generation:

  • a) greedy search
  • b) sampling (GPT-like generation)
  • c) beam search (default use case for translation, etc...)
  • d) beam sample (exotic case where sampling is combined with beam_search

Currently a) and b) are handled by the sub-method _generate_no_beam_search while c) and d) are handled by _generate_beam_search .
This is not very readable and understandable. I propose to remove those sub-methods and instead define all four generation methods a), b), c) and d) separately.

I propose to handle the complete generate refactor in multiple steps.

  1. Define the new generate() design only for the simplest use case being greedy_search (a)). Leave all existing code for now to not break any of b), c), d)
  2. At this point we can already start change greedy search to be XLA-compatible.
  3. Analogues to 1), we define the sub-method b) and can then fully remove _generate_no_beam_search
  4. Refactor c) beam search. This is a bit more complex and I'd like to handle this in a second step.
  5. Finally we can refactor d) which should be relatively simple after 4) and can then remove the _generate_beam_search function.

This first PR lays the foundation for the complete refactor and finishes 1).

In short in does the following:

  • i.) Adds more aggressive greedy_search tests for a decoder-only model - GPT2 - and an encoder-decoder model - T5.
  • ii.) Define the TF logic for logits processors
  • iii.) Define the new generate() method which is in a first step called _generate() so that it will only be used for a) greedy_search. It should be much more readable by adding clear # 1. ... 2. comments for each model input preparation step.
  • iv.) Fully refactor greedy_search
  • v.) Clean-up some duplicated code that was heavily used by generate and related functions (i.e. tf_utils.py).

Next steps

=> In a next step, I propose to tackle both 2) and 3) at the same time.
@Rocketknight1 Maybe you could take care of 2) while maybe @gante maybe it would make sense if you start with 3) (should be pretty straight-forward after this PR). IMO, this would also greatly help at making more people in the team knowledgeable in how generate() works.
Before we go over to 4), I'm happy to do the past/encoder_outputs refactor (see all those TODO(Patrick) comments in generate). We still rely on a very ugly hack that puts encoder_outputs into the past variable. I didn't want to change this here because it require changing the prepare_inputs_for_generation method of multiple models and IMO the PR would have become to difficult to review.

Thoughts? @Rocketknight1 @gante @LysandreJik @sgugger

@HuggingFaceDocBuilder
Copy link

HuggingFaceDocBuilder commented Feb 8, 2022

The documentation is not available anymore as the PR was closed or merged.

@patrickvonplaten
Copy link
Contributor Author

patrickvonplaten commented Feb 14, 2022

All of the following slow tests pass:

RUN_SLOW=1 pytest -sv \
tests/bart/test_modeling_tf_bart.py \
tests/t5/test_modeling_tf_t5.py \
tests/gpt2/test_modeling_tf_gpt2.py \
tests/vision_encoder_decoder/test_modeling_tf_vision_encoder_decoder.py \
tests/encoder_decoder/test_modeling_tf_encoder_decoder.py \
tests/speech_to_text/test_modeling_tf_speech_to_text.py

as well as

RUN_SLOW=1 pytest -sv tests/test_modeling_tf_rag.py

(this one is tested on brutasse as it requires a heavy index).

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work and very clear roadmap! Thanks a lot for working on this!

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this big rework <3

Comment on lines +153 to +155
score_penalties = self._create_score_penalties(input_ids, scores)

scores = tf.math.multiply(scores, score_penalties)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at PT's implementation: wouldn't TF's gather and scatter work here as well? If there is no strong reason against it, you can leave it as a TODO for me, as this PR was already a ton of work 👍 (it should be faster and it would look closer to PT's implementation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those logits processor are definitely slow right now and it would be a very welcoming change to speed them up!

Comment on lines 137 to 138
input_ids = input_ids.cpu()
logits = logits.cpu()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is PT code that can be deleted, TF shouldn't need to move data around -- .cpu() is a valid operation that does move the data to CPU, but it is deprecated

(there are a few more of these that can be deleted/changed, will simply comment PT code :) )

Copy link
Contributor Author

@patrickvonplaten patrickvonplaten Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I had to add them to make the code work with TF 2.7.0 I think. Is it possible that we need this for older TF versions?

Just re-verified and all tests pass on GPU - sorry you're 100% correct. Adapting :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is curious. Maybe it is related to having to run in TF 1.x compatibility mode (which used to specify devices). I will try to look into it in the future -- since it works, let's ignore them for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not really sure what was going on, but I now removed all .cpu() statements and re-tested it. It seems to work fine now!

Comment on lines +113 to +126
input_ids = tf.constant([[0, 1, 3, 1], [0, 1, 0, 1]], dtype=tf.int32)
bad_word_tokens = [[1], [4], [1, 0], [0, 1, 2], [1, 3, 1, 3]]
scores = self._get_uniform_logits(batch_size, vocab_size)

no_bad_words_dist_proc = TFNoBadWordsLogitsProcessor(bad_words_ids=bad_word_tokens, eos_token_id=eos_token_id)

filtered_scores = no_bad_words_dist_proc(input_ids, tf.identity(scores))

# batch 1: 1st, 2nd, and 4th (0, 1, 3) token are forbidden
# batch 2: 1st, 2nd, and 3rd (0, 1, 2) token are forbidden
self.assertListEqual(
tf.math.is_inf(filtered_scores).numpy().tolist(),
[[True, True, False, True, True], [True, True, True, False, True]],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at this test and I noticed I probably didn't get the processor correctly. If we have bad_word_tokens=[[99, 100]], it means that the pair ([99, 100]) is forbidden, but the individual tokens (99 and 100) are not, correct?

If what I wrote above is correct, shouldn't the first batch have (1, 3) and the second batch (1, 2, 3) forbidden? What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the comments aren't super clear here, but the logic is the following. For both inputs (batch_1 and batch_2) we want to know which word ids we are allowed to generate for the next step. The logits vector is of dim 5 thus covering the word ids [0-4] . Now the word ids [1], [4] are always forbidden, which is why we have True for both output lists in the last place [4th logit] and the second one [1st]. The other three sequences are then responsible for setting the other values (0th and 3rd for the first batch) and (0th and 2nd for the second batch)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha -- e.g. because both batches end with a 1 and [1, 0] is a bad sequence, generating a 0 in the next step is forbidden. Thanks 👍

@Rocketknight1
Copy link
Member

Just finished and it looks very cool! I had a few nits but it seems like a great refactor, which will make it easy to optimize sub-methods for performance later.

patrickvonplaten and others added 4 commits February 15, 2022 16:59
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
@patrickvonplaten
Copy link
Contributor Author

Thanks a lot for the review @gante and @Rocketknight1 . Adapted the PR according to your comments and will merge once tests are passing.

@patrickvonplaten
Copy link
Contributor Author

Failing tests are flaky and unrelated. Merging

@patrickvonplaten patrickvonplaten merged commit 2e12b90 into huggingface:master Feb 15, 2022
@patrickvonplaten patrickvonplaten deleted the tf_generate_refactor branch February 15, 2022 16:54
FrancescoSaverioZuppichini pushed a commit that referenced this pull request Feb 15, 2022
* TF generate start refactor

* Add tf tests for sample generate

* re-organize

* boom boom

* Apply suggestions from code review

* re-add

* add all code

* make random greedy pass

* make encoder-decoder random work

* further improvements

* delete bogus file

* make gpt2 and t5 tests work

* finish logits tests

* correct logits processors

* correct past / encoder_outputs drama

* refactor some methods

* another fix

* refactor shape_list

* fix more shape list

* import shape
_list

* finish docs

* fix imports

* make style

* correct tf utils

* Fix TFRag as well

* Apply Lysandre's and Sylvais suggestions

* Update tests/test_generation_tf_logits_process.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update src/transformers/tf_utils.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* remove cpu according to gante

* correct logit processor

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
FrancescoSaverioZuppichini added a commit that referenced this pull request Feb 17, 2022
…architecture details in maskformer

added detr inside maskformer

added swin inside maskformer

removed #Copied from

swin now matches master

update model auto

update model auto

update swin in maskformer

update swin in maskformer

make fixies

fixed the docstring destroyed by make-fixies

typos

typos

typos

Rebase (#15606)

TF MT5 embeddings resize (#15567)

* Fix TF MT5 vocab resize

* more assertive testing

 🖍 remove broken link (#15615)

Fix _configuration_file argument getting passed to model (#15629)

[deepspeed docs] misc additions (#15585)

* [deepspeed docs] round_robin_gradients

* training and/or eval/predict loss is

* Update docs/source/main_classes/deepspeed.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

[research_projects] deal with security alerts (#15594)

* [research_projects] deal with security alerts

* add a note of the original PL ver and warning

Custom feature extractor (#15630)

* Rework AutoFeatureExtractor.from_pretrained internal

* Custom feature extractor

* Add more tests

* Add support for custom feature extractor code

* Clean up

Fix grammar in tokenizer_summary (#15614)

"to make ensure" is redundant.

Add push to hub to feature extractor (#15632)

* Add push to hub to feature extractor

* Quality

* Clean up

[Fix doc example] FlaxVisionEncoderDecoder (#15626)

* Fix wrong checkpoint name: vit

* Fix missing import

* Fix more missing import

* make style

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Fix a bug that ignores max_seq_len in preprocess (#15238)

Report only the failed imports in `requires_backends` (#15636)

Make Swin work with VisionEncoderDecoderModel (#15527)

* Add attribute_map

* Add mention in docs

* Set hidden_size attribute correctly

* Add note about Transformer-based models only

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>

Remove redundant error logging in from_pretrained() method (#15631)

* Remove error logging in from_pretrained() method

Register feature extractor (#15634)

* Rework AutoFeatureExtractor.from_pretrained internal

* Custom feature extractor

* Add more tests

* Add support for custom feature extractor code

* Clean up

* Add register API to AutoFeatureExtractor

fix bug for the log of  RNG states are not properly loaded  exception. (#15638)

Co-authored-by: muz <muzhi1991@limuzhideMBP-2.lan>

[SpeechEncoderDecoder] Make sure no EOS is generated in test (#15655)

logger doc

Revert "logger doc"

This reverts commit 41168a4.

Require tokenizers>=0.11.1 (#15266)

`tokenizers` version that supports the feature to choose the direction of truncation

Fix ASR pipelines from local directories with wav2vec models that have language models attached (#15590)

* Fix loading pipelines with wav2vec models with lm when in local paths

* Adding tests

* Fix test

* Adding tests

* Flake8 fixes

* Removing conflict files :(

* Adding task type to test

* Remove unnecessary test and imports

Fix typo in speech2text2 doc (#15617)

Forward looks for inputs, not input_ids

Allow custom code for Processors (#15649)

* Allow custom code for Processors

* Add more test

* Test all auto_map configs are properly set

add scores to Wav2Vec2WithLMOutput (#15413)

* add scores to Wav2Vec2WithLMOutput

* style fixup

Update bad_words_ids usage (#15641)

* Improve the parameter `bad_word_ids' usage

* Update the bad_words_ids strategy

updated with latest PL and Ray (#15653)

Add section about doc testing (#15659)

* Add doctesting section

* Improve

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Fix quality

add a network debug script and document it (#15652)

* add a network debug script and document it

* doc

Re-export `KeyDataset`. (#15645)

* Re-export `KeyDataset`.

* Update the docs locations.

Add `decoder_kwargs` to send to LM on asr pipeline. (#15646)

Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>

Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>

TF generate refactor - Greedy Search (#15562)

* TF generate start refactor

* Add tf tests for sample generate

* re-organize

* boom boom

* Apply suggestions from code review

* re-add

* add all code

* make random greedy pass

* make encoder-decoder random work

* further improvements

* delete bogus file

* make gpt2 and t5 tests work

* finish logits tests

* correct logits processors

* correct past / encoder_outputs drama

* refactor some methods

* another fix

* refactor shape_list

* fix more shape list

* import shape
_list

* finish docs

* fix imports

* make style

* correct tf utils

* Fix TFRag as well

* Apply Lysandre's and Sylvais suggestions

* Update tests/test_generation_tf_logits_process.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update src/transformers/tf_utils.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* remove cpu according to gante

* correct logit processor

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

[pipeline doc] fix api (#15660)

* [pipeline doc] fix api

* remove duplicate

Fix TFSequenceSummary's activation (#15643)

* fix TFSequenceSummary

* fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix model equivalence tests (#15670)

* Fix model equivalence tests

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Fix vit test (#15671)

Add a missing space in a deprecation message (#15651)

[t5/t0/mt5 models] faster/leaner custom layer norm (#14656)

* [t5] faster/leaner custom layer norm

* wip

* apex.normalization.FusedRMSNorm

* cleanup

* cleanup

* add doc

* add catch all

* Trigger CI

* expand

Add push_to_hub method to processors (#15668)

* Add push_to_hub method to processors

* Fix test

* The other one too!

Usage examples for logger (#15657)

* logger

* Update docs/source/main_classes/logging.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update docs/source/main_classes/logging.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Fix dec_attn_mask in TFTransfoXLMainLayer (#15665)

* fix attn

* clean-up

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

prefixed MaskFormer to FPN

Update README_ko.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Update docs/source/index.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

update maskformer.mdx

update maskformer.mdx

fixes

replaced pretrained weight name

test for maskformer

Update docs/source/model_doc/maskformer.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Update docs/source/model_doc/maskformer.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Typos in doc

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

line in __init__

line in __init__

indent in maskformer.mdx

removed maskformer in auto mapping

resolved conversation for maskformer config

resolved conversations

minor fixes for ci

fixed MASK_FOMER -> MASKFORMER in __init__

added export MaskFormerPreTrainedModel

fixes for ci

MaskFormerForInstanceSegmentation docstring

🔥 Remove build_doc_test github action (#15680)

Add register method to AutoProcessor (#15669)

* Add push_to_hub method to processors

* Fix test

* The other one too!

* Add register method to AutoProcessor

* Update src/transformers/models/auto/processing_auto.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

[Wav2Vec2ProcessorWithLM] Fix auto processor with lm (#15683)

Fix Funnel configuration doc (#15686)

* fix doc

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Implementation of activations as pytorch modules (#15616)

* Implement activations as pytorch modules

* Apply fixup

* Add missing tests for activations

* Update docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Add image classification notebook (#15667)

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

Add PoolFormer (#15531)

* Added all files, PoolFormerFeatureExtractor still failing tests

* Fixed PoolFormerFeatureExtractor not being able to import

* Completed Poolformer doc

* Applied Suggested fixes

* Fixed errors in modeling_auto.py

* Fix feature extractor, convert docs to Markdown, styling of code

* Remove PoolFormer from check_repo and fix integration test

* Remove Poolformer from check_repo

* Fixed configuration_poolformer.py docs and removed inference.py from poolformer

* Ran with black v22

* Added PoolFormer to _toctree.yml

* Updated poolformer doc

* Applied suggested fixes and added on README.md

* Did make fixup and make fix-copies, tests should pass now

* Changed PoolFormer weights conversion script name and fixed README

* Applied fixes in test_modeling_poolformer.py and modeling_poolformer.py

* Added PoolFormerFeatureExtractor to AutoFeatureExtractor API

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>

Update docs/source/model_doc/maskformer.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Update docs/source/model_doc/maskformer.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

modeling_maskformer docstring

make style
stevhliu pushed a commit to stevhliu/transformers that referenced this pull request Feb 18, 2022
* TF generate start refactor

* Add tf tests for sample generate

* re-organize

* boom boom

* Apply suggestions from code review

* re-add

* add all code

* make random greedy pass

* make encoder-decoder random work

* further improvements

* delete bogus file

* make gpt2 and t5 tests work

* finish logits tests

* correct logits processors

* correct past / encoder_outputs drama

* refactor some methods

* another fix

* refactor shape_list

* fix more shape list

* import shape
_list

* finish docs

* fix imports

* make style

* correct tf utils

* Fix TFRag as well

* Apply Lysandre's and Sylvais suggestions

* Update tests/test_generation_tf_logits_process.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update src/transformers/tf_utils.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* remove cpu according to gante

* correct logit processor

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
@ydshieh
Copy link
Collaborator

ydshieh commented Feb 23, 2022

@patrickvonplaten

About the following

We still rely on a very ugly hack that puts encoder_outputs into the past variable.
I didn't want to change this here because it require changing the prepare_inputs_for_generation
method of multiple models and IMO the PR would have become to difficult to review.

I haven't looked the details, but it is something different from the PyTorch's models (and generate), right?
During the fix for PT/TF inconsistency, this causes some problem for a strict equivalence testing.
Maybe I can work on this once other inconsistencies are fixed (and by that time, generate already has some clean refactoring by you guys!).

@patrickvonplaten
Copy link
Contributor Author

Hey @ydshieh,

Great initiative! Yes that's indeed an important refactor we should do soon - would you like to look into it? In short we should do the following in TF:

  • Always force return_dict=True. The we don't need any of these self._use_cache() methods anymore (we can just check whether past_key_values is present in the output. Also this way the can clearly differentiate between encoder_outputs and past_key_values for the encoder-decoder case. Would you like to open a PR for it? Happy to help you along the way!

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 23, 2022

Hey @ydshieh,

Great initiative! Yes that's indeed an important refactor we should do soon - would you like to look into it? In short we should do the following in TF:

* Always force `return_dict=True`. The we don't need any of these `self._use_cache()` methods anymore (we can just check whether `past_key_values` is present in the output. Also this way the can clearly differentiate between `encoder_outputs` and `past_key_values` for the encoder-decoder case. Would you like to open a PR for it? Happy to help you along the way!

Sure - after I finish the work of adding testing past_key_values for TF encoder models (those can be used for causal lm and can have cross-attention`). Currently, no test for it, and making #15477 undetected for quite some time.


@slow
def test_lm_generate_distilgpt2(self):
def test_lm_generate_distilgpt2_batch_special(self):
Copy link
Contributor Author

@patrickvonplaten patrickvonplaten Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a new aggressive test

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whwreto buy proper pyros?

spacemanidol pushed a commit to neuralmagic/transformers that referenced this pull request Mar 15, 2022
* Make sure custom configs work with Transformers (#15569)

* Make sure custom configs work with Transformers

* Apply code review suggestions

* Add Wav2Vec2 Adapter Weights to Flax (#15566)

* Add Wav2Vec2 Adapter Weights to Flax

* Suggested changes

* Upgrade click version (#15579)

* [Flax tests/FlaxBert] make from_pretrained test faster (#15561)

* Add implementation of typical sampling (#15504)

* typical decoding

* changing arg name

* add test config params

* forgotten arg rename

* fix edge case where scores are same

* test for typical logits warper

* code quality fixes

* Constrained Beam Search [without disjunctive decoding] (#15416)

* added classes to get started with constrained beam search

* in progress, think i can directly force tokens now but not yet with the round robin

* think now i have total control, now need to code the bank selection

* technically works as desired, need to optimize and fix design choices leading to undersirable outputs

* complete PR #1 without disjunctive decoding

* removed incorrect tests

* Delete k.txt

* Delete test.py

* Delete test.sh

* revert changes to test scripts

* genutils

* full implementation with testing, no disjunctive yet

* shifted docs

* passing all tests realistically ran locally

* removing accidentally included print statements

* fixed source of error in initial PR test

* fixing the get_device() vs device trap

* fixed documentation docstrings about constrained_beam_search

* fixed tests having failing for Speech2TextModel's floating point inputs

* fix cuda long tensor

* added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search

* deleted accidentally added test halting code with assert False

* code reformat

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

* fixing based on comments on PR

* took out the testing code that should but work fails without the beam search moditification ; style changes

* fixing comments issues

* docstrings for ConstraintListState

* typo in PhrsalConstraint docstring

* docstrings improvements

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Trigger doc build

* Fix quality

* Fix tests hub failure (#15580)

* Expose hub test problem

* Fix tests

* update serving_output for some TF models (#15568)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [trainer docs] document how to select specific gpus (#15551)

* [trainer docs] document how to select specific gpus

* expand

* add urls

* add accelerate launcher

* Add link (#15588)

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>

* Expand tutorial for custom models (#15587)

* Expand tutorial for custom models

* Style

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Make slow tests slow

* Reformat tokenization_fnet

* Add Tensorflow handling of ONNX conversion (#13831)

* Add TensorFlow support for ONNX export

* Change documentation to mention conversion with Tensorflow

* Refactor export into export_pytorch and export_tensorflow

* Check model's type instead of framework installation to choose between TF and Pytorch

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Alberto Bégué <alberto.begue@della.ai>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Add example batch size to all commands (#15596)

* Compute loss independent from decoder for TF EncDec models (as #14139) (#15175)

* Compute loss independent from decoder (as 14139)

* fix expected seq_len + style

* Apply the same change to TFVisionEncoderDecoderModel

* fix style

* Add case with labels in equivalence test

* uncomment

* Add case with labels in equivalence test

* add decoder_token_labels

* use hf_compute_loss

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add copied from

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix Seq2SeqTrainer (#15603)

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>

* Add local and TensorFlow ONNX export examples to docs (#15604)

* Add local and TensorFlow ONNX export examples to docs

* Use PyTorch - TensorFlow split

* Correct JSON format (#15600)

* [Generate] Small refactor (#15611)

* Mark "code in the Hub" API as experimental (#15624)

* Enable ONNX export when PyTorch and TensorFlow installed in the same environment (#15625)

* TF: Add informative warning for inexistent CPU backprop ops (#15612)

* Add informative warning

* Rebase (#15606)

* TF MT5 embeddings resize (#15567)

* Fix TF MT5 vocab resize

* more assertive testing

* 🖍 remove broken link (#15615)

* Fix _configuration_file argument getting passed to model (#15629)

* [deepspeed docs] misc additions (#15585)

* [deepspeed docs] round_robin_gradients

* training and/or eval/predict loss is

* Update docs/source/main_classes/deepspeed.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [research_projects] deal with security alerts (#15594)

* [research_projects] deal with security alerts

* add a note of the original PL ver and warning

* Custom feature extractor (#15630)

* Rework AutoFeatureExtractor.from_pretrained internal

* Custom feature extractor

* Add more tests

* Add support for custom feature extractor code

* Clean up

* Fix grammar in tokenizer_summary (#15614)

"to make ensure" is redundant.

* Add push to hub to feature extractor (#15632)

* Add push to hub to feature extractor

* Quality

* Clean up

* [Fix doc example] FlaxVisionEncoderDecoder (#15626)

* Fix wrong checkpoint name: vit

* Fix missing import

* Fix more missing import

* make style

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix a bug that ignores max_seq_len in preprocess (#15238)

* Report only the failed imports in `requires_backends` (#15636)

* Make Swin work with VisionEncoderDecoderModel (#15527)

* Add attribute_map

* Add mention in docs

* Set hidden_size attribute correctly

* Add note about Transformer-based models only

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>

* Remove redundant error logging in from_pretrained() method (#15631)

* Remove error logging in from_pretrained() method

* Register feature extractor (#15634)

* Rework AutoFeatureExtractor.from_pretrained internal

* Custom feature extractor

* Add more tests

* Add support for custom feature extractor code

* Clean up

* Add register API to AutoFeatureExtractor

* fix bug for the log of  RNG states are not properly loaded  exception. (#15638)

Co-authored-by: muz <muzhi1991@limuzhideMBP-2.lan>

* [SpeechEncoderDecoder] Make sure no EOS is generated in test (#15655)

* logger doc

* Revert "logger doc"

This reverts commit 41168a49ce61685ac5c9c38cd5b88fd883c0d811.

* Require tokenizers>=0.11.1 (#15266)

`tokenizers` version that supports the feature to choose the direction of truncation

* Fix ASR pipelines from local directories with wav2vec models that have language models attached (#15590)

* Fix loading pipelines with wav2vec models with lm when in local paths

* Adding tests

* Fix test

* Adding tests

* Flake8 fixes

* Removing conflict files :(

* Adding task type to test

* Remove unnecessary test and imports

* Fix typo in speech2text2 doc (#15617)

Forward looks for inputs, not input_ids

* Allow custom code for Processors (#15649)

* Allow custom code for Processors

* Add more test

* Test all auto_map configs are properly set

* add scores to Wav2Vec2WithLMOutput (#15413)

* add scores to Wav2Vec2WithLMOutput

* style fixup

* Update bad_words_ids usage (#15641)

* Improve the parameter `bad_word_ids' usage

* Update the bad_words_ids strategy

* updated with latest PL and Ray (#15653)

* Add section about doc testing (#15659)

* Add doctesting section

* Improve

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix quality

* add a network debug script and document it (#15652)

* add a network debug script and document it

* doc

* Re-export `KeyDataset`. (#15645)

* Re-export `KeyDataset`.

* Update the docs locations.

* Add `decoder_kwargs` to send to LM on asr pipeline. (#15646)

Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>

Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>

* TF generate refactor - Greedy Search (#15562)

* TF generate start refactor

* Add tf tests for sample generate

* re-organize

* boom boom

* Apply suggestions from code review

* re-add

* add all code

* make random greedy pass

* make encoder-decoder random work

* further improvements

* delete bogus file

* make gpt2 and t5 tests work

* finish logits tests

* correct logits processors

* correct past / encoder_outputs drama

* refactor some methods

* another fix

* refactor shape_list

* fix more shape list

* import shape
_list

* finish docs

* fix imports

* make style

* correct tf utils

* Fix TFRag as well

* Apply Lysandre's and Sylvais suggestions

* Update tests/test_generation_tf_logits_process.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update src/transformers/tf_utils.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* remove cpu according to gante

* correct logit processor

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* [pipeline doc] fix api (#15660)

* [pipeline doc] fix api

* remove duplicate

* Fix TFSequenceSummary's activation (#15643)

* fix TFSequenceSummary

* fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix model equivalence tests (#15670)

* Fix model equivalence tests

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix vit test (#15671)

* Add a missing space in a deprecation message (#15651)

* [t5/t0/mt5 models] faster/leaner custom layer norm (#14656)

* [t5] faster/leaner custom layer norm

* wip

* apex.normalization.FusedRMSNorm

* cleanup

* cleanup

* add doc

* add catch all

* Trigger CI

* expand

* Add push_to_hub method to processors (#15668)

* Add push_to_hub method to processors

* Fix test

* The other one too!

* Usage examples for logger (#15657)

* logger

* Update docs/source/main_classes/logging.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update docs/source/main_classes/logging.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix dec_attn_mask in TFTransfoXLMainLayer (#15665)

* fix attn

* clean-up

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🔥 Remove build_doc_test github action (#15680)

* Add register method to AutoProcessor (#15669)

* Add push_to_hub method to processors

* Fix test

* The other one too!

* Add register method to AutoProcessor

* Update src/transformers/models/auto/processing_auto.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* [Wav2Vec2ProcessorWithLM] Fix auto processor with lm (#15683)

* Fix Funnel configuration doc (#15686)

* fix doc

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Implementation of activations as pytorch modules (#15616)

* Implement activations as pytorch modules

* Apply fixup

* Add missing tests for activations

* Update docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Add image classification notebook (#15667)

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Add PoolFormer (#15531)

* Added all files, PoolFormerFeatureExtractor still failing tests

* Fixed PoolFormerFeatureExtractor not being able to import

* Completed Poolformer doc

* Applied Suggested fixes

* Fixed errors in modeling_auto.py

* Fix feature extractor, convert docs to Markdown, styling of code

* Remove PoolFormer from check_repo and fix integration test

* Remove Poolformer from check_repo

* Fixed configuration_poolformer.py docs and removed inference.py from poolformer

* Ran with black v22

* Added PoolFormer to _toctree.yml

* Updated poolformer doc

* Applied suggested fixes and added on README.md

* Did make fixup and make fix-copies, tests should pass now

* Changed PoolFormer weights conversion script name and fixed README

* Applied fixes in test_modeling_poolformer.py and modeling_poolformer.py

* Added PoolFormerFeatureExtractor to AutoFeatureExtractor API

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>

* Minor fix on README.md (#15688)

* fix README

* fix more arxiv links

* make fix-copies

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix shapes in model docstrings (#15696)

* Add SimMIM (#15586)

* Add first draft

* Make model importable

* Make SwinForMaskedImageModeling importable

* Fix imports

* Add missing inits

* Add support for Swin

* Fix bug

* Fix bug

* Fix another bug

* Fix Swin MIM implementation

* Fix default encoder stride

* Fix Swin

* Add print statements for debugging

* Add image_size data argument

* Fix Swin

* Fix image_size

* Add print statements for debugging

* Fix print statement

* Remove print statements

* Improve reshaping of bool_masked_pos

* Add support for DeiT, fix tests

* Improve docstrings

* Apply new black version

* Improve script

* Fix bug

* Improve README

* Apply suggestions from code review

* Remove DS_Store and add to gitignore

* Apply suggestions from code review + fix BEiT Flax

* Revert BEiT changes

* Improve README

* Fix code quality

* Improve README

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Adding a model, more doc for pushing to the hub (#15690)

* doc for adding a model to the hub

* run make style

* resolved conversation

* removed a line

* removed )

* Update docs/source/add_new_model.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/add_new_model.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix CLIP fast tokenizer and change some properties of the slow version (#15067)

Very big changes concerning the tokenizer fast of CLIP which did not correspond to the tokenizer slow of CLIP

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix SiluActivation (#15718)

* TF: add initializer_std with a small value in TFFunnelModelTester (#15684)

* Fix DETR model deprecation warnings for int div (#15702)

* Fix LongformerModel hidden states (#15537)

* add undo padding

* fix

* fix tuple issue

* make style and quality

* move unpad logic to LongformerEncoder + unpad attentions + update tests

* move unpad logic to TFLongformerEncoder

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add PLBart (#13269)

* Init PLBART

* Add missing configuration file

* Add conversion script and configurationf ile

* Fix style

* Update modeling and conversion scripts

* Fix scale embedding in config

* Add comment

* Fix conversion script

* Add classification option to conversion script

* Fix vocab size in config doc

* Add tokenizer files from MBart50

* Allow no lang code in regular tokenizer

* Add PLBart Tokenizer Converters

* Remove mask from multi tokenizer

* Remove mask from multi tokenizer

* Change from MBart-50 to MBart tokenizer

* Fix names and modify src/tgt behavior

* Fix imports for tokenizer

* Remove <mask> from multi tokenizer

* Fix style

* Change tokenizer_class to processor_class

* Add attribute map to config class

* Update modeling file to modified MBart code

* Update configuration file to MBart style configuration

* Fix tokenizer

* Separate tokenizers

* Fix error in tokenization auto

* Copy MBart tests

* Replace with MBart tokenization tests

* Fix style

* Fix language code in multi tokenizer

* Fix configuration docs

* Add entry for plbart_multi in transformers init

* Add dummy objects and fix imports

* Fix modeling tests

* Add TODO in config

* Fix copyright year

* Fix modeling docs and test

* Fix some tokenization tests and style

* Add changes from review

* Fix copies

* Fix docs

* Fix docs

* Fix style

* Fix year

* Add changes from review

* Remove extra changes

* Fix base tokenizer and doc

* Fix style

* Fix modeling and slow tokenizer tests

* Remove Multi-tokenizer Converter and Tests

* Delete QA model and Multi Tokenizer dummy objects

* Fix repo consistency and code quality issues

* Fix example documentation

* Fix style

* Remove PLBartTokenizer from type checking in init

* Fix consistency issue

* Add changes from review

* Fix style

* Remove PLBartTokenizerFast

* Remove FastTokenizer converter

* Fix AutoTokenzier mapping

* Add plbart to toctree and fix consistency issues

* Add language codes tokenizer test

* Fix styling and doc issues

* Add fixes for failing tests

* Fix copies

* Fix failing modeling test

* Change assert to assertTrue in modeling tests

* style_doc handles decorators in examples (#15719)

* Fix auto (#15706)

* fix: hfdeepspeed config argument (#15711)

`HfDeepSpeedConfig` accepts a dictionary or path to `.json` file containing DS configurations, not `TrainingArguments`.

* fix bug in PT speech-encoder-decoder (#15699)

* fix bug in PT speech-encoder-decoder

* add pt test for `inputs is not None`

* fix test

* new pt test

* Update tests/test_modeling_speech_encoder_decoder.py

* make fixup

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Add missing PLBart entry in README (#15721)

* Add missing PLBart entry in index

* Fix README

* Fix README

* Fix style

* Change to master model doc

* Remove input and target reset after preprocessing (#15741)

Remove input and target reset after preprocessing

* Fix minor comment typos (#15740)

* add VisionTextDualEncoder and CLIP fine-tuning script (#15701)

* begin script

* update script

* fix features and data args

* main

* add requirements

* add column name args

* fix captions

* don't jit transforms

* fix caption

* fix labels, handle attention mask

* convert pixel values to numpy

* labels => input_ids

* transform images on the fly

* use AutoModel class, create the hybird model outside of the script

* fix version message

* add readme

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* adderss review comments

* add more comments

* allow freezing vision and text models

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Add layer_idx to CrossAttention of GPT2 model (#15730)

* Add layer_idx to CrossAttention

* Add layer_idx to crossattention of ImageGPT model

* TF text classification examples (#15704)

* Working example with to_tf_dataset

* updated text_classification

* more comments

* revert temporary addition to test next version of CLIPTokenizerFast (#15717)

* added link to our writing-doc document (#15756)

* TF train_step docstring (#15755)

* TF train_step docstring

* Gelu10 (#15676)

* Add GeLU10 (clipped version of GeLU) to transformers to improve quantization performances.

* Add unittests.

* Import tensorflow after `is_tf_available` check.

* Fix tensorflow wrong function `tf.tensor` to `tf.constant`

* style.

* use `tf.math.max`

* Fix tf tests.

* style.

* style style style style style style

* style style style style style style

* Address @sgugger comments.

* Fix wrong operator for raising ValueError for ClippedGELUActivation.

* Time stamps for CTC models (#15687)

* [Wav2Vec2 Time Stamps]

* Add first version

* add word time stamps

* Fix

* save intermediate space

* improve

* [Finish CTC Tokenizer]

* remove @

* remove @

* push

* continue with phonemes

* up

* finish PR

* up

* add example

* rename

* finish

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct split

* finalize

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fixed pipeline code (#15607)

Co-authored-by: Boumadane Abdelmoumene <moumene.boumadane@gmail.com>

* Fix typo on examples/pytorch/question-answering (#15644)

cna -> can

* Cleanup transformers-cli (#15767)

* Fix `HfArgumentParser` when passing a generator (#15758)

* Fix `HfArgumentParser` when passing a generator

* Add missing import

* Always convert `dataclass_types` into a list

* Adding ZeroShotImageClassificationPipeline (#12119)

* [Proposal] Adding ZeroShotImageClassificationPipeline

- Based on CLIP

* WIP, Resurection in progress.

* Resurrection... achieved.

* Reword handling different `padding_value` for `feature_extractor` and
`tokenizer`.

* Thanks doc-builder !

* Adding docs + global namespace `ZeroShotImageClassificationPipeline`.

* Fixing templates.

* Make the test pass and be robust to floating error.

* Adressing suraj's comments on docs mostly.

* Tf support start.

* TF support.

* Update src/transformers/pipelines/zero_shot_image_classification.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* [M2M100, XGLM] fix create_position_ids_from_inputs_embeds (#15751)

* Supporting Merges.txt files than contain an endline. (#15782)

(`hf-internal-testing/tiny-clip` for instance)

* [CLIP] fix grad ckpt (#15789)

* [ViLT] Fix checkpoint url in config (#15790)

* [ViLT] Fix checkpoint url in config

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Enable `image-segmentation` on `AutoModelForSemanticSegmentation` (#15647)

* Enabling Beit SegFormer to `image-segmentation`.

* Fixing the score.

* Fix import ?

* Missing in type hint.

* Multiple test fixes:

- Add `raw_image` support. It should be the default IMHO since in Python
  world it doesn't make any sense to base64 encode the image (Sorry
  @mishig, didn't catch that in my review). I really think we should
  consider breaking BC here.
- Add support for Segformer tiny test (needed
  `SegformerModelTester.get_config` to enable TinyConfig
  @NielsRogge)
- Add the check that `batch_size` works correctly on that pipeline.
  Uncovered that it doesn't for Detr, which IMO is OK since images
  after `feature_extractor` don't have the same size. Comment should
  explain.

* Type hint as a string.

* Make fixup + update black.

* torch+vision protections.

* Don't use torchvision, use F.interpolate instead (no new dep).

* Last fixes for Segformer.

* Update test to reflect new image (which was broken)

* Update tests.

* Major BC modification:

- Removed the string compressed PNG string, that's a job for users
`transformers` stays in python land.
- Removed the `score` for semantic segmentation. It has hardly a meaning
  on its own in this context.
- Don't include the grayscale with logits for now (which could enable
  users to get a sense of confidence). Might be done later.
- Don't include the surface of the mask (could be used for sorting by
  users, to filter out small masks). It's already calculable, and
  it's easier to add later, than to add now and break later if we need.

* `make fixup`.

* Small changes.

* Rebase + doc fixup.

* [doc] custom_models: mention security features of the Hub (#15768)

* custom_models: tiny doc addition

* mention security feature earlier in the section

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Align documentation with code defaults (#15468)

In the code, `do_normalize` defaults to True

* HTML dev docs (#15678)

Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>

* Fix indent in doc-builder CI (#15798)

* 🧼  NLP task guides (#15731)

* clean commit of changes to NLP tasks

* 🖍 apply feedback

* 📝 move tf data collator in multiple choice

Co-authored-by: Steven <stevhliu@gmail.com>

* [Test refactor 1/5] Per-folder tests reorganization (#15725)

* Per-folder tests reorganization

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

* [Test refactor 2/5] Tests fetcher (#15726)

* Tests fetcher

* Review comments

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Review comments

* [Test refactor 3/5] Notification service improvement (#15727)

* Per-folder tests reorganization

* Review comments

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

* [Test refactor 4/5] Improve the scheduled tests (#15728)

* [Test refactor 5/5] Build docker images (#15729)

* Fix build_documentation CI (#15803)

* Scheduled tests should only run on a daily basis

* Docker images should only run on a daily basis

* Fix model templates (#15806)

* Fix model templates

* Update paths

* Fix add-new-model-like when old model checkpoint is not found (#15805)

* Fix add-new-model-like command when old checkpoint can't be recovered

* Style

* Fix from_pretrained with default base_model_prefix (#15814)

* Revert changes in logit size for semantic segmentation models (#15722)

* Revert changes in logit size for semantic segmentation models

* Address review comments

* [Unispeech] Fix slow tests (#15818)

* remove soundfile old way of loading audio

* Adapt slow test

* [Barthez Tokenizer] Fix saving (#15815)

* [TFXLNet] Correct tf xlnet generate (#15822)

* [TFXLNet] Correct tf xlnet

* adapt test comment

* Fix the push run (#15807)

* Fix semantic segmentation pipeline test (#15826)

* Fix dummy_inputs() to dummy_inputs in symbolic_trace doc (#15776)

* Add model specific output classes to PoolFormer model docs (#15746)

* Added model specific output classes to poolformer docs

* Fixed Segformer typo in Poolformer docs

* Adding the option to return_timestamps on pure CTC ASR models. (#15792)

* Adding the option to return_timestamps on pure CTC ASR models.

* Remove `math.prod` which was introduced in Python 3.8

* int are not floats.

* Reworking the PR to support "char" vs "word" output.

* Fixup!

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Quality.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* HFTracer.trace should use/return self.graph to be compatible with torch.fx.Tracer (#15824)

* Fix tf.concatenate + test past_key_values for TF models (#15774)

* fix wrong method name tf.concatenate

* add tests related to causal LM / decoder

* make style and quality

* clean-up

* Fix TFBertModel's extended_attention_mask when past_key_values is provided

* Fix tests

* fix copies

* More tf.int8 -> tf.int32 in TF test template

* clean-up

* Update TF test template

* revert the previous commit + update the TF test template

* Fix TF template extended_attention_mask when past_key_values is provided

* Fix some styles manually

* clean-up

* Fix ValueError: too many values to unpack in the test

* Fix more: too many values to unpack in the test

* Add a comment for extended_attention_mask when there is past_key_values

* Fix TFElectra extended_attention_mask when past_key_values is provided

* Add tests to other TF models

* Fix for TF Electra test: add prepare_config_and_inputs_for_decoder

* Fix not passing training arg to lm_head in TFRobertaForCausalLM

* Fix tests (with past) for TF Roberta

* add testing for pask_key_values for TFElectra model

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [examples/summarization and translation] fix readme (#15833)

* Add ONNX Runtime quantization for text classification notebook (#15817)

* Re-enable doctests for the quicktour (#15828)

* Re-enable doctests for the quicktour

* Re-enable doctests for task_summary (#15830)

* Remove &

* Framework split model report (#15825)

* Add TFConvNextModel (#15750)

* feat: initial implementation of convnext in tensorflow.

* fix: sample code for the classification model.

* chore: added checked for  from the classification model.

* chore: set bias initializer in the classification head.

* chore: updated license terms.

* chore: removed ununsed imports

* feat: enabled  argument during using drop_path.

* chore: replaced tf.identity with layers.Activation(linear).

* chore: edited default checkpoint.

* fix: minor bugs in the initializations.

* partial-fix: tf model errors for loading pretrained pt weights.

* partial-fix: call method updated

* partial-fix: cross loading of weights (4x3 variables to be matched)

* chore: removed unneeded comment.

* removed playground.py

* rebasing

* rebasing and removing playground.py.

* fix: renaming TFConvNextStage conv and layer norm layers

* chore: added initializers and other minor additions.

* chore: added initializers and other minor additions.

* add: tests for convnext.

* fix: integration tester class.

* fix: issues mentioned in pr feedback (round 1).

* fix: how output_hidden_states arg is propoagated inside the network.

* feat: handling of  arg for pure cnn models.

* chore: added a note on equal contribution in model docs.

* rebasing

* rebasing and removing playground.py.

* feat: encapsulation for the convnext trunk.

* Fix variable naming; Test-related corrections; Run make fixup

* chore: added Joao as a contributor to convnext.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* chore: corrected copyright year and added comment on NHWC.

* chore: fixed the black version and ran formatting.

* chore: ran make style.

* chore: removed from_pt argument from test, ran make style.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* fix: tests in the convnext subclass, ran make style.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* chore: moved convnext test to the correct location

* fix: locations for the test file of convnext.

* fix: convnext tests.

* chore: applied  sgugger's suggestion for dealing w/ output_attentions.

* chore: added comments.

* chore: applied updated quality enviornment style.

* chore: applied formatting with quality enviornment.

* chore: revert to the previous tests/test_modeling_common.py.

* chore: revert to the original test_modeling_common.py

* chore: revert to previous states for test_modeling_tf_common.py and modeling_tf_utils.py

* fix: tests for convnext.

* chore: removed output_attentions argument from convnext config.

* chore: revert to the earlier tf utils.

* fix: output shapes of the hidden states

* chore: removed unnecessary comment

* chore: reverting to the right test_modeling_tf_common.py.

* Styling nits

Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

* [UniSpeechSat] correct unispeech sat (#15847)

* Flax Speech-Encoder-Decoder Model (#15613)

* rebase

* Delete shift tokens func

* downsample decoder input seq len for init

* correct attention mask

* add tests

* pt flax cross test

* make fixup

* init file for import

* change pt-flax cross test threshold

* pt-flax test logits only

* move tests

* make repo-consistency

* consistent indentation

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix (deprecated) ONNX exporter to account for new tf2onnx API (#15856)

* Fix (deprecated) ONNX exporter to account for new tf2onnx API

* Fixing the timestamps with chunking. (#15843)

* Fixing the timestamps with chunking.

* The changes modified (and fixed) the striding tests.

* Adding a tokenizer test.

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Defense -> comment.

* Update src/transformers/models/wav2vec2/tokenization_wav2vec2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [TF-PT-Tests] Fix PyTorch - TF tests for different GPU devices (#15846)

* Add Data2Vec (#15507)

* Add data2vec model cloned from roberta

* Add checkpoint conversion script

* Fix copies

* Update docs

* Add checkpoint conversion script

* Remove fairseq data2vec_text script and fix format

* Add comment on where to get data2vec_text.py

* Remove mock implementation cheat.py and fix style

* Fix copies

* Remove TF and Flax classes from init

* Add back copy from fairseq data2vec_text.py and fix style

* Update model name in docs/source/index.mdx to be CamelCase

* Revert model name in table to lower-case to get check_table test to pass

* Update src/transformers/models/data2vec/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update documentation

* Copy-paste Data2VecConfig from BertConfig

* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency

* Update config special tokens to match RoBERTa

* Split multiple assertions and add individual error messages

* Rename Data2VecModel to Data2VecForTextModel

* Add Data2Vec to _toctree.yml

* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings

* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).

* finish audio model

* finish audio file

* Update names and fix style, quality and repo consistency

* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.

* add inputs to logits to data2vec'

* correct autio models

* correct config auto

* correct tok auto

* Update utils/tests_fetcher.py

* delete unnecessary files

* delete unnecessary files

* further renaming

* make all tests pass

* finish

* remove useless test file

* Update tests/test_modeling_common.py

* Update utils/check_repo.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec_text.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix copies

* Update docs

* Remove fairseq data2vec_text script and fix format

* Add comment on where to get data2vec_text.py

* Remove mock implementation cheat.py and fix style

* Fix copies

* Remove TF and Flax classes from init

* Add back copy from fairseq data2vec_text.py and fix style

* Update model name in docs/source/index.mdx to be CamelCase

* Revert model name in table to lower-case to get check_table test to pass

* Update documentation

* Update src/transformers/models/data2vec/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Copy-paste Data2VecConfig from BertConfig

* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency

* Update config special tokens to match RoBERTa

* Split multiple assertions and add individual error messages

* Rename Data2VecModel to Data2VecForTextModel

* Add Data2Vec to _toctree.yml

* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings

* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).

* finish audio model

* finish audio file

* add inputs to logits to data2vec'

* Update names and fix style, quality and repo consistency

* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.

* correct autio models

* correct config auto

* correct tok auto

* delete unnecessary files

* delete unnecessary files

* Update utils/tests_fetcher.py

* further renaming

* make all tests pass

* finish

* remove useless test file

* Update tests/test_modeling_common.py

* Update utils/check_repo.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec_text.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Move data2vec tests to new structure

* Fix test imports for text tests

* Remove fairseq files

* Change paper link to arxiv

* Modify Data2Vec documentation to reflect that the encoder is not shared across the audio and text models in the current implementation.

* Update text model checkpoint to be facebook/data2vec-text-base

* Add 'Copy from' statements and update paper links and docs

* fix copy from statements

* improve copied from

* correct more copied from statements

* finish copied from stuff

* make style

* add model to README

* add to master

Co-authored-by: Eduardo Gonzalez Ponferrada <eduardo@ferrumhealth.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [Benchmark tools] Deprecate all (#15848)

* [Benchmark tools] Deprecate all

* up

* Add PT + TF automatic builds (#15860)

* Add PT + TF automatic builds

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Wrap up

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update TF LM examples (#15855)

* Add time stamps for wav2vec2 with lm (#15854)

* [Wav2Vec2 With LM] add timestamps

* correct

* correct

* Apply suggestions from code review

* correct

* Update src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py

* make style

* Update src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* make style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add link to notebooks (#15791)

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Scatter should run on CUDA (#15872)

* [vision] Add problem_type support (#15851)

* Add problem_type to missing models

* Fix deit test

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* use python 3.7 for flax self-push tests (#15865)

* set python 3.7 for flax tests

* setup-python@v2

* python-dev

* install -y

* python3-dev

* install kenlm from source

* install cython

* cd to kenlm

* kenlm install

* don't install kenlm

* change flax pretrained to run flax tests

* cleanup

* remove python-dev

* Bump up doc node version to 16 (#15874)

* No self-hosted runner for dev documentation (#15710)

* Inference for multilingual models (#15836)

* 📝 first draft for multilingual models

* 🖍 make style

* fix deepspeed tests (#15881)

* fix deepspeed tests

* style

* more fixes

* Remove stash for now (#15882)

* M2M100 support for ONNX export (#15193)

* Add M2M100 support for ONNX export

* Delete useless imports

* Add M2M100 to tests

* Fix protobuf issue

* [Bart] Fix implementation note doc (#15879)

* Add TF generate sample tests with all logit processors (#15852)

* Add GPT2 TF generate sample test with all logits processor

* Add T5 generate sample test

* Adding timestamps for CTC with LM in ASR pipeline. (#15863)

* Adding timestamps for CTC with LM in ASR pipeline.

* iRemove print.

* Nit change.

* Update TF QA example (#15870)

* Updates in Trainer to support new features in SM Model Parallel library (#15877)

* Create optimizer after model creation for SMP

* update dp_rank to rdp_rank for opt_state_dict

* update world_size and process_index for smp

* Address comments

* Lint fix

Co-authored-by: Cavdar <dcavdar@a07817b12d7e.ant.amazon.com>

* Fix tiny typo (#15884)

* Maskformer (#15682)

* maskformer

* conflicts

* conflicts

* minor fixes

* feature extractor test fix

refactor MaskFormerLoss following conversation

MaskFormer related types should not trigger a module time import error

missed one

removed all the types that are not used

update config mapping

minor updates in the doc

resolved conversation that doesn't need a discussion

minor changes

resolved conversations

fixed DetrDecoder

* minor changes

minor changes

fixed mdx file

test feature_extractor return types

functional losses -> classes

removed the return type test for the feature extractor

minor changes + style + quality

* conflicts?

* rebase master

* readme

* added missing files

* deleded poolformers test that where in the wrong palce

* CI

* minor changes

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* resolved conversations

* minor changes

* conversations

[Unispeech] Fix slow tests (#15818)

* remove soundfile old way of loading audio

* Adapt slow test

[Barthez Tokenizer] Fix saving (#15815)

[TFXLNet] Correct tf xlnet generate (#15822)

* [TFXLNet] Correct tf xlnet

* adapt test comment

Fix the push run (#15807)

Fix semantic segmentation pipeline test (#15826)

Fix dummy_inputs() to dummy_inputs in symbolic_trace doc (#15776)

Add model specific output classes to PoolFormer model docs (#15746)

* Added model specific output classes to poolformer docs

* Fixed Segformer typo in Poolformer docs

Adding the option to return_timestamps on pure CTC ASR models. (#15792)

* Adding the option to return_timestamps on pure CTC ASR models.

* Remove `math.prod` which was introduced in Python 3.8

* int are not floats.

* Reworking the PR to support "char" vs "word" output.

* Fixup!

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Quality.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

HFTracer.trace should use/return self.graph to be compatible with torch.fx.Tracer (#15824)

Fix tf.concatenate + test past_key_values for TF models (#15774)

* fix wrong method name tf.concatenate

* add tests related to causal LM / decoder

* make style and quality

* clean-up

* Fix TFBertModel's extended_attention_mask when past_key_values is provided

* Fix tests

* fix copies

* More tf.int8 -> tf.int32 in TF test template

* clean-up

* Update TF test template

* revert the previous commit + update the TF test template

* Fix TF template extended_attention_mask when past_key_values is provided

* Fix some styles manually

* clean-up

* Fix ValueError: too many values to unpack in the test

* Fix more: too many values to unpack in the test

* Add a comment for extended_attention_mask when there is past_key_values

* Fix TFElectra extended_attention_mask when past_key_values is provided

* Add tests to other TF models

* Fix for TF Electra test: add prepare_config_and_inputs_for_decoder

* Fix not passing training arg to lm_head in TFRobertaForCausalLM

* Fix tests (with past) for TF Roberta

* add testing for pask_key_values for TFElectra model

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

[examples/summarization and translation] fix readme (#15833)

Add ONNX Runtime quantization for text classification notebook (#15817)

Re-enable doctests for the quicktour (#15828)

* Re-enable doctests for the quicktour

* Re-enable doctests for task_summary (#15830)

* Remove &

Framework split model report (#15825)

Add TFConvNextModel (#15750)

* feat: initial implementation of convnext in tensorflow.

* fix: sample code for the classification model.

* chore: added checked for  from the classification model.

* chore: set bias initializer in the classification head.

* chore: updated license terms.

* chore: removed ununsed imports

* feat: enabled  argument during using drop_path.

* chore: replaced tf.identity with layers.Activation(linear).

* chore: edited default checkpoint.

* fix: minor bugs in the initializations.

* partial-fix: tf model errors for loading pretrained pt weights.

* partial-fix: call method updated

* partial-fix: cross loading of weights (4x3 variables to be matched)

* chore: removed unneeded comment.

* removed playground.py

* rebasing

* rebasing and removing playground.py.

* fix: renaming TFConvNextStage conv and layer norm layers

* chore: added initializers and other minor additions.

* chore: added initializers and other minor additions.

* add: tests for convnext.

* fix: integration tester class.

* fix: issues mentioned in pr feedback (round 1).

* fix: how output_hidden_states arg is propoagated inside the network.

* feat: handling of  arg for pure cnn models.

* chore: added a note on equal contribution in model docs.

* rebasing

* rebasing and removing playground.py.

* feat: encapsulation for the convnext trunk.

* Fix variable naming; Test-related corrections; Run make fixup

* chore: added Joao as a contributor to convnext.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* chore: corrected copyright year and added comment on NHWC.

* chore: fixed the black version and ran formatting.

* chore: ran make style.

* chore: removed from_pt argument from test, ran make style.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* fix: tests in the convnext subclass, ran make style.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* chore: moved convnext test to the correct location

* fix: locations for the test file of convnext.

* fix: convnext tests.

* chore: applied  sgugger's suggestion for dealing w/ output_attentions.

* chore: added comments.

* chore: applied updated quality enviornment style.

* chore: applied formatting with quality enviornment.

* chore: revert to the previous tests/test_modeling_common.py.

* chore: revert to the original test_modeling_common.py

* chore: revert to previous states for test_modeling_tf_common.py and modeling_tf_utils.py

* fix: tests for convnext.

* chore: removed output_attentions argument from convnext config.

* chore: revert to the earlier tf utils.

* fix: output shapes of the hidden states

* chore: removed unnecessary comment

* chore: reverting to the right test_modeling_tf_common.py.

* Styling nits

Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

* minor changes

* doc fix in feature extractor

* doc

* typose

* removed detr logic from config

* removed detr logic from config

* removed num_labels

* small fix in the config

* auxilary -> auxiliary

* make style

* some test is failing

* fix a weird char in config prevending doc-builder

* retry to fix the doc-builder issue

* make style

* new try to fix the doc builder

* CI

* change weights to facebook

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

* Fix Bug in FlaxWav2Vec2 Slow Test (#15887)

* [SegFormer] Add deprecation warning (#15889)

* Add deprecation warning

* Remove from docs and hide in kwargs

* Improve implementation

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* TF generate refactor - Sample (#15793)

* Add TF logits wrappers 

* Add sample method

* add tests for TF logit wrappers

* TF generate sample tests now run on CPU

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* [XGLM] run sampling test on CPU to be deterministic (#15892)

* run sampling test on CPU to be deterministic

* input_ids on CPU

* Fix SegformerForImageClassification (#15895)

* Fix reshape

* Apply suggestion from code review

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Update delete-dev-doc job to match build-dev-doc (#15891)

* Update delete-dev-doc job to match build-dev-doc

* More debug info

* More debug info

* Stash if needed

* Remove the comment update

* Fix paths

* Wtf is going on..

* Fix git status test

* Try another way

* I don't understand what's happening

* Bash shell

* What's happening now...

* What's happening now...

* Try like this

* Back to trying to use bash

* And like that?

* Refine tests

* Stash after adding new files

* Stash after adding new files

* Proper commit sha and PR number

* Address review comments

* Fix doc links in release utils (#15903)

* Fix a TF Vision Encoder Decoder test (#15896)

* send PyTorch inputs to the correct device

* Fix: TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [Fix link in pipeline doc] (#15906)

* Fix and improve REALM fine-tuning (#15297)

* Draft

* Add test

* Update src/transformers/models/realm/modeling_realm.py

* Apply suggestion

* Add block_mask

* Update

* Update

* Add block_embedding_to

* Remove no_grad

* Use AutoTokenizer

* Remove model.to overridding

* Freeze FlaxWav2Vec2 Feature Encoder (#15873)

* Freeze FlaxWav2Vec2 Feature Encoder

* add to all module apply

* add backprop test

* The tests were not updated after the addition of `torch.diag` (#15890)

in the scoring (which is more correct)

* [Doctests] Fix ignore bug and add more doc tests (#15911)

* finish speech doc tests

* finish

* boom

* Update src/transformers/models/speech_to_text/modeling_speech_to_text.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* v4.18.0.dev.0

* Enabling MaskFormer in pipelines (#15917)

* Enabling MaskFormer in ppipelines

No AutoModel though :(

* Ooops local file.

* Mark slow tests as slow

* fix for the output from post_process_panoptic_segmentation (#15916)

* Add vision models to doc tests (#15905)

* Add vision models to doc tests

* Apply suggestions from code review

* Add more models

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Fix #15898 (#15928)

* Update README.md (#15926)

* Re-enabling all fast pipeline tests. (#15924)

* Update README.md

* Support CLIPTokenizerFast for CLIPProcessor (#15913)

* Fix to support fast tokenizer with `CLIPProcessor`

* Update CLIPProcessor test for fast tokenizer

* Fix Docstring Style

* Rename into meaningful Variable name in test code

* Updating the slow tests: (#15893)

Linked to https://github.com/huggingface/transformers/pull/15826

* Making MaskFormerForInstanceSegmentation. (#15934)

Small adjustments.

Adding in type hint.

Last fix ?

Only include the default dict thing, not the pipelines.

* Add missing support for Flax XLM-RoBERTa (#15900)

* Adding Flax XLM-RoBERTa

* Add Flax to __init__

* Adding doc and dummy objects

* Add tests

* Add Flax XLM-R models autodoc

* Fix tests

* Add Flask XLM-RoBERTa to TEST_FILES_WITH_NO_COMMON_TESTS

* Update src/transformers/models/xlm_roberta/modeling_flax_xlm_roberta.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update tests/xlm_roberta/test_modeling_flax_xlm_roberta.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update tests/xlm_roberta/test_modeling_flax_xlm_roberta.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Remove test on large Flask XLM-RoBERTa

* Add tokenizer to the test

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* [FlaxT5 Example] fix flax t5 example pretraining (#15835)

* Do not change the output from tuple to list - to match PT's version (#15918)

* Do not change the output from tuple to list - to match PT's version

* Fix the same issues for 5 other models and the template

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Tests for MaskFormerFeatureExtractor's post_process*** methods (#15929)

* proper tests for post_process*** methods in feature extractor

* mask th == 0

* Update tests/maskformer/test_feature_extraction_maskformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Constrained Beam Search [*With* Disjunctive Decoding] (#15761)

* added classes to get started with constrained beam search

* in progress, think i can directly force tokens now but not yet with the round robin

* think now i have total control, now need to code the bank selection

* technically works as desired, need to optimize and fix design choices leading to undersirable outputs

* complete PR #1 without disjunctive decoding

* removed incorrect tests

* Delete k.txt

* Delete test.py

* Delete test.sh

* revert changes to test scripts

* genutils

* full implementation with testing, no disjunctive yet

* shifted docs

* passing all tests realistically ran locally

* removing accidentally included print statements

* fixed source of error in initial PR test

* fixing the get_device() vs device trap

* fixed documentation docstrings about constrained_beam_search

* fixed tests having failing for Speech2TextModel's floating point inputs

* fix cuda long tensor

* added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search

* deleted accidentally added test halting code with assert False

* code reformat

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

* fixing based on comments on PR

* took out the testing code that should but work fails without the beam search moditification ; style changes

* fixing comments issues

* docstrings for ConstraintListState

* typo in PhrsalConstraint docstring

* docstrings improvements

* finished adding what is sort of an opinionated implementation of disjunctive generation, but it revealed errors in inner beam search logic during testing.

* fixed bug found in constrained beam search that used beam_idx that were not global across all the batches

* disjunctive constraint working 100% correctly

* passing all tests

* Accidentally included mlruns

* Update src/transformers/generation_beam_constraints.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/generation_beam_constraints.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* complete overhaul of type complexities and other nits

* strict type checks in generate()

* fixing second round of feedback by narsil

* fixed failing generation test because of type check overhaul

* generation test fail fix

* fixing test fails

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Move dependency to call method (#15941)

* made MaskFormerModelTest faster (#15942)

* [Bug Fix] Beam search example in docs fails & a fix (integrating `max_length` in `BeamScorer.finalize()`) (#15555)

* added the test and fix
…
@gante gante mentioned this pull request Mar 25, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants