KEMBAR78
LLaMA Implementation by zphang · Pull Request #21955 · huggingface/transformers · GitHub
Skip to content

Conversation

@zphang
Copy link
Contributor

@zphang zphang commented Mar 5, 2023

What does this PR do?

Implementation of LLaMA models (https://arxiv.org/abs/2302.13971). Model weights can be requested here. Weight conversion script is included.

Weights conversion can be run via:

python src/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir /path/to/downloaded/llama/weights \
    --model_size 7B \
    --output_dir /output/path

Models can then be loaded via:

tokenizer = transformers.LLaMATokenizer.from_pretrained("/output/path/tokenizer/")
model = transformers.LLaMAForCausalLM.from_pretrained("/output/path/llama-7b/")

Example:

batch = tokenizer(
    "The primary use of LLaMA is research on large language models, including",
    return_tensors="pt", 
    add_special_tokens=False
)
batch = {k: v.cuda() for k, v in batch.items()}
generated = model.generate(batch["input_ids"], max_length=100)
print(tokenizer.decode(generated[0]))

Fixes #21796

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker @younesbelkada

@ewof
Copy link

ewof commented Mar 5, 2023

does this work with int8?

@zphang
Copy link
Contributor Author

zphang commented Mar 5, 2023

does this work with int8?

No idea! I haven't messed with int8 too much myself. It ought to be compatible with whatever is already supported in the HF models.

@YellowRoseCx
Copy link

nice work! thanks for the upload and I hope it gets pulled

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 5, 2023

The documentation is not available anymore as the PR was closed or merged.

@zphang
Copy link
Contributor Author

zphang commented Mar 5, 2023

It looks like the tests which are currently failing are unrelated to the LLaMA code, so this should be good to review/use.

If folks can try it out (particularly with the larger, sharded models) and see if there are any issues, that will be helpful!

@USBhost
Copy link

USBhost commented Mar 5, 2023

It looks like the tests which are currently failing are unrelated to the LLaMA code, so this should be good to review/use.

If folks can try it out (particularly with the larger, sharded models) and see if there are any issues, that will be helpful!

At lest the convert script seems to work fine. I was able to convert 7B to 30B. I do not have enough ram to convert 65B.

@deepdiffuser
Copy link

Great work. thanks for putting this together

@USBhost
Copy link

USBhost commented Mar 5, 2023

After replacing transformers from Kobold with this PR I am able to load the shards as expected. Just I cant generate anything because Kobold still needs some changes.
image

@zsc
Copy link

zsc commented Mar 5, 2023

does this work with int8?

No idea! I haven't messed with int8 too much myself. It ought to be compatible with whatever is already supported in the HF models.

Int8 seems not working but float16 is fine, in my hasty put-together test at https://github.com/zsc/llama_infer . Please throw a comment in case you find something!

@zsc
Copy link

zsc commented Mar 5, 2023

@zphang I'm not able to get something like tokenizer = AutoTokenizer.from_pretrained("/data/llama/hf/7b/tokenizer/") to work. Is this intentional or just leaving AutoTokenizer for future work?

@zphang
Copy link
Contributor Author

zphang commented Mar 5, 2023

@zphang I'm not able to get something like tokenizer = AutoTokenizer.from_pretrained("/data/llama/hf/7b/tokenizer/") to work. Is this intentional or just leaving AutoTokenizer for future work?

What issue are you having / what is the error?

@oobabooga
Copy link
Contributor

I have tested the code and these are my findings:

  1. The conversion script works.
  2. Loading the model works.
  3. Loading the tokenizer with transformers.LLaMATokenizer.from_pretrained works.
  4. Loading the tokenizer with AutoTokenizer.from_pretrained does not work and generates this error:
OSError: /tmp/converted/tokenizer/ does not appear to have a file named config.json. Checkout
'https://huggingface.co//tmp/converted/tokenizer//None' for available files.

  1. The generated text seems to be incoherent. If I try these default values for the generation parameters:
model.generate(input_ids, eos_token_id=2, do_sample=True, temperature=1, top_p=1, typical_p=1, repetition_penalty=1, top_k=50, min_length=0, no_repeat_ngram_size=0, num_beams=1, penalty_alpha=0, length_penalty=1, early_stopping=False, max_new_tokens=200).cuda()

with this prompt:

Common sense questions and answers

Question: What color is the sky?
Factual answer:

I get

Common sense questions and answers

Question: What color is the sky?
Factual answer: Tags: python, django, django-models

Question: Using Django with multiple databases

I am attempting to use django with multiple databases, and I have the following code:

\begin{code}
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': ':memory:',
    },
    'db_one': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': 'db_one',
    },
    'db_two': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': 'db_two',
    },
}

It seems to me that prompts are being completely ignored.

  1. Loading in 8-bit mode with load_in_8bit=True works.

@zsc
Copy link

zsc commented Mar 5, 2023

This is OK: tokenizer = transformers.LLaMATokenizer.from_pretrained("/data/llama/hf/7b/tokenizer/")

If using tokenizer = AutoTokenizer.from_pretrained("/data/llama/hf/7b/tokenizer/" then it will complain no "config.json".

OSError: /data/llama/hf/7b/tokenizer/ does not appear to have a file named config.json. Checkout 
'https://huggingface.co//data/llama/hf/7b/tokenizer//None' for available files.

I then hacked by softlinking /data/llama/hf/7b/tokenizer/special_tokens_map.json to /data/llama/hf/7b/tokenizer/config.json and it works. So maybe just rename?

Anyway, can now happily play with LLaMA in Hugging Face world and thanks for the great work!

@zphang
Copy link
Contributor Author

zphang commented Mar 5, 2023

Thanks for the comments. Looks like the saved tokenizer doesn't work for AutoTokenizer but works if you directly instantiate from LLaMATokenizer. Maybe one of the HF folks can chime in on the best way to address that.

The generated text seems to be incoherent. If I try these default values for the generation parameters:

Can you check the input_ids you're using to generate? The tokenizer currently adds both BOS and EOS tokens by default, and an EOS might cause the model to ignore your prompt.

Perhaps I can set EOS to not be added by default so it operates closer to expected behavior.

@oobabooga
Copy link
Contributor

For this prompt:

'Common sense questions and answers\n\nQuestion: What color is the sky?\nFactual answer:'

these are the input_ids:

tensor([[    1, 13103,  4060,  5155,   322,  6089,    13,    13, 16492, 29901,
          1724,  2927,   338,   278, 14744, 29973,    13, 29943, 19304,  1234,
         29901,     2]], device='cuda:0')

I do not know how to interpret these numbers, but if there is an EOS token in that tensor and that token is causing the text generation to derail, changing that default would be valuable.

@zphang
Copy link
Contributor Author

zphang commented Mar 5, 2023

1 is BOS and 2 is EOS. Can you try without the last input id?

I also added an example in my PR message.

@oobabooga
Copy link
Contributor

I confirm that doing this

    input_ids = input_ids[:, :-1]

to remove the last input id before calling model.generate(...) causes the text generation to become coherent:

Common sense questions and answers

Question: What color is the sky?
Factual answer: The sky is blue. The sky is blue, and it is a fact that it is blue. The sky is indisputably blue.

@wdyxwzyh
Copy link

wdyxwzyh commented Mar 17, 2023

@ longzhang418

Hello, I have a question about training llama. Does the current version of training not support the zero strategy of deepspeed? I added the deepspeed configuration file, but it doesn't seem to take effect. Looking for your reply, thank you~

Hi, we success use accelerate with ds_3_zero3_offload policy to train llama-13B in 4 A100s with memory 80GB(batch size=16 length=256 works for us ), firstly transfer 13B weights into hf weights format, then our accelerate config is in below:

compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: 'cpu'
offload_param_device: none
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'fp16'
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
use_cpu: false

The fsdp should works too, as the impl in standford repo.
significantly, we test 13B with ds0/ds1/ds2 on one A100 all failed because of CUDA memory.

Finally we failed on test some model pipeline methods, eg. fairscale pp, it's the next thing we need to do.

@haorannlp
Copy link

haorannlp commented Mar 19, 2023

What's the motivation for the three special tokens tokenizer.pad_token, tokenizer.bos_token, tokenizer.eos_token = '' when converting the llama tokenizer?

TheTerrasque pushed a commit to TheTerrasque/text-generation-webui that referenced this pull request Mar 19, 2023
commit 0cbe2dd
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Mar 18 12:24:54 2023 -0300

    Update README.md

commit 36ac7be
Merge: d2a7fac 705f513
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Mar 18 11:57:10 2023 -0300

    Merge pull request oobabooga#407 from ThisIsPIRI/gitignore

    Add loras to .gitignore

commit d2a7fac
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Mar 18 11:56:04 2023 -0300

    Use pip instead of conda for pytorch

commit 705f513
Author: ThisIsPIRI <thisispiri@gmail.com>
Date:   Sat Mar 18 23:33:24 2023 +0900

    Add loras to .gitignore

commit a0b1a30
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Mar 18 11:23:56 2023 -0300

    Specify torchvision/torchaudio versions

commit c753261
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Mar 18 10:55:57 2023 -0300

    Disable stop_at_newline by default

commit 7c945cf
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Mar 18 10:55:24 2023 -0300

    Don't include PeftModel every time

commit 86b9900
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Mar 18 10:27:52 2023 -0300

    Remove rwkv dependency

commit a163807
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Mar 18 03:07:27 2023 -0300

    Update README.md

commit a7acfa4
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 22:57:46 2023 -0300

    Update README.md

commit bcd8afd
Merge: dc35861 e26763a
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 22:57:28 2023 -0300

    Merge pull request oobabooga#393 from WojtekKowaluk/mps_support

    Fix for MPS support on Apple Silicon

commit e26763a
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 22:56:46 2023 -0300

    Minor changes

commit 7994b58
Author: Wojtek Kowaluk <wojtek@Wojteks-MacBook-Pro.local>
Date:   Sat Mar 18 02:27:26 2023 +0100

    clean up duplicated code

commit dc35861
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 21:05:17 2023 -0300

    Update README.md

commit 30939e2
Author: Wojtek Kowaluk <wojtek@Wojteks-MacBook-Pro.local>
Date:   Sat Mar 18 00:56:23 2023 +0100

    add mps support on apple silicon

commit 7d97da1
Author: Wojtek Kowaluk <wojtek@Wojteks-MacBook-Pro.local>
Date:   Sat Mar 18 00:17:05 2023 +0100

    add venv paths to gitignore

commit f2a5ca7
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 20:50:27 2023 -0300

    Update README.md

commit 8c8286b
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 20:49:40 2023 -0300

    Update README.md

commit 0c05e65
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 20:25:42 2023 -0300

    Update README.md

commit adc2003
Merge: 20f5b45 66e8d12
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 20:19:33 2023 -0300

    Merge branch 'main' of github.com:oobabooga/text-generation-webui

commit 20f5b45
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 20:19:04 2023 -0300

    Add parameters reference oobabooga#386 oobabooga#331

commit 66e8d12
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 19:59:37 2023 -0300

    Update README.md

commit 9a87111
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 19:52:22 2023 -0300

    Update README.md

commit d4f38b6
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 18:57:48 2023 -0300

    Update README.md

commit ad7c829
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 18:55:01 2023 -0300

    Update README.md

commit 4426f94
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 18:51:07 2023 -0300

    Update the installation instructions. Tldr use WSL

commit 9256e93
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 17:45:28 2023 -0300

    Add some LoRA params

commit 9ed2c45
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 16:06:11 2023 -0300

    Use markdown in the "HTML" tab

commit f0b2645
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 13:07:17 2023 -0300

    Add a comment

commit 7da742e
Merge: ebef4a5 02e1113
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 12:37:23 2023 -0300

    Merge pull request oobabooga#207 from EliasVincent/stt-extension

    Extension: Whisper Speech-To-Text Input

commit ebef4a5
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 11:58:45 2023 -0300

    Update README

commit cdfa787
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 11:53:28 2023 -0300

    Update README

commit 3bda907
Merge: 4c13067 614dad0
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 11:48:48 2023 -0300

    Merge pull request oobabooga#366 from oobabooga/lora

    Add LoRA support

commit 614dad0
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 11:43:11 2023 -0300

    Remove unused import

commit a717fd7
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 11:42:25 2023 -0300

    Sort the imports

commit 7d97287
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 11:41:12 2023 -0300

    Update settings-template.json

commit 29fe7b1
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 11:39:48 2023 -0300

    Remove LoRA tab, move it into the Parameters menu

commit 214dc68
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 11:24:52 2023 -0300

    Several QoL changes related to LoRA

commit 4c13067
Merge: ee164d1 53b6a66
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Mar 17 09:47:57 2023 -0300

    Merge pull request oobabooga#377 from askmyteapot/Fix-Multi-gpu-GPTQ-Llama-no-tokens

    Update GPTQ_Loader.py

commit 53b6a66
Author: askmyteapot <62238146+askmyteapot@users.noreply.github.com>
Date:   Fri Mar 17 18:34:13 2023 +1000

    Update GPTQ_Loader.py

    Correcting decoder layer for renamed class.

commit 0cecfc6
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 21:35:53 2023 -0300

    Add files

commit 104293f
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 21:31:39 2023 -0300

    Add LoRA support

commit ee164d1
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 18:22:16 2023 -0300

    Don't split the layers in 8-bit mode by default

commit 0a2aa79
Merge: dd1c596 e085cb4
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 17:27:03 2023 -0300

    Merge pull request oobabooga#358 from mayaeary/8bit-offload

    Add support for memory maps with --load-in-8bit

commit e085cb4
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 13:34:23 2023 -0300

    Small changes

commit dd1c596
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 12:45:27 2023 -0300

    Update README

commit 38d7017
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 12:44:03 2023 -0300

    Add all command-line flags to "Interface mode"

commit 83cb20a
Author: awoo <awoo@awoo>
Date:   Thu Mar 16 18:42:53 2023 +0300

    Add support for --gpu-memory witn --load-in-8bit

commit 23a5e88
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 11:16:17 2023 -0300

    The LLaMA PR has been merged into transformers

    huggingface/transformers#21955

    The tokenizer class has been changed from

    "LLaMATokenizer"

    to

    "LlamaTokenizer"

    It is necessary to edit this change in every tokenizer_config.json
    that you had for LLaMA so far.

commit d54f3f4
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 10:19:00 2023 -0300

    Add no-stream checkbox to the interface

commit 1c37896
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 10:18:34 2023 -0300

    Remove unused imports

commit a577fb1
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Mar 16 00:46:59 2023 -0300

    Keep GALACTICA special tokens (oobabooga#300)

commit 25a00ea
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 23:43:35 2023 -0300

    Add "Experimental" warning

commit 599d313
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 23:34:08 2023 -0300

    Increase the reload timeout a bit

commit 4d64a57
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 23:29:56 2023 -0300

    Add Interface mode tab

commit b501722
Merge: ffb8986 d3a280e
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 20:46:04 2023 -0300

    Merge branch 'main' of github.com:oobabooga/text-generation-webui

commit ffb8986
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 20:44:34 2023 -0300

    Mini refactor

commit d3a280e
Merge: 445ebf0 0552ab2
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 20:22:08 2023 -0300

    Merge pull request oobabooga#348 from mayaeary/feature/koboldai-api-share

    flask_cloudflared for shared tunnels

commit 445ebf0
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 20:06:46 2023 -0300

    Update README.md

commit 0552ab2
Author: awoo <awoo@awoo>
Date:   Thu Mar 16 02:00:16 2023 +0300

    flask_cloudflared for shared tunnels

commit e9e76bb
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 19:42:29 2023 -0300

    Delete WSL.md

commit 09045e4
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 19:42:06 2023 -0300

    Add WSL guide

commit 9ff5033
Merge: 66256ac 055edc7
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 19:37:26 2023 -0300

    Merge pull request oobabooga#345 from jfryton/main

    Guide for Windows Subsystem for Linux

commit 66256ac
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 19:31:27 2023 -0300

    Make the "no GPU has been detected" message more descriptive

commit 055edc7
Author: jfryton <35437877+jfryton@users.noreply.github.com>
Date:   Wed Mar 15 18:21:14 2023 -0400

    Update WSL.md

commit 89883a3
Author: jfryton <35437877+jfryton@users.noreply.github.com>
Date:   Wed Mar 15 18:20:21 2023 -0400

    Create WSL.md guide for setting up WSL Ubuntu

    Quick start guide for Windows Subsystem for Linux (Ubuntu), including port forwarding to enable local network webui access.

commit 67d6247
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 18:56:26 2023 -0300

    Further reorganize chat UI

commit ab12a17
Merge: 6a1787a 3028112
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 18:31:39 2023 -0300

    Merge pull request oobabooga#342 from mayaeary/koboldai-api

    Extension: KoboldAI api

commit 3028112
Author: awoo <awoo@awoo>
Date:   Wed Mar 15 23:52:46 2023 +0300

    KoboldAI api

commit 6a1787a
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 16:55:40 2023 -0300

    CSS fixes

commit 3047ed8
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 16:41:38 2023 -0300

    CSS fix

commit 87b84d2
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 16:39:59 2023 -0300

    CSS fix

commit c1959c2
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 16:34:31 2023 -0300

    Show/hide the extensions block using javascript

commit 348596f
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 15:11:16 2023 -0300

    Fix broken extensions

commit c5f14fb
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 14:19:28 2023 -0300

    Optimize the HTML generation speed

commit bf812c4
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 14:05:35 2023 -0300

    Minor fix

commit 658849d
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 13:29:00 2023 -0300

    Move a checkbutton

commit 05ee323
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 13:26:32 2023 -0300

    Rename a file

commit 40c9e46
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 13:25:28 2023 -0300

    Add file

commit d30a140
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 13:24:54 2023 -0300

    Further reorganize the UI

commit ffc6cb3
Merge: cf2da86 3b62bd1
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 12:56:21 2023 -0300

    Merge pull request oobabooga#325 from Ph0rk0z/fix-RWKV-Names

    Fix rwkv names

commit cf2da86
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 12:51:13 2023 -0300

    Prevent *Is typing* from disappearing instantly while streaming

commit 4146ac4
Merge: 1413931 29b7c5a
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 12:47:41 2023 -0300

    Merge pull request oobabooga#266 from HideLord/main

    Adding markdown support and slight refactoring.

commit 29b7c5a
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 12:40:03 2023 -0300

    Sort the requirements

commit ec972b8
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 12:33:26 2023 -0300

    Move all css/js into separate files

commit 693b53d
Merge: 63c5a13 1413931
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 12:08:56 2023 -0300

    Merge branch 'main' into HideLord-main

commit 1413931
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 12:01:32 2023 -0300

    Add a header bar and redesign the interface (oobabooga#293)

commit 9d6a625
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Mar 15 11:04:30 2023 -0300

    Add 'hallucinations' filter oobabooga#326

    This breaks the API since a new parameter has been added.
    It should be a one-line fix. See api-example.py.

commit 3b62bd1
Author: Forkoz <59298527+Ph0rk0z@users.noreply.github.com>
Date:   Tue Mar 14 21:23:39 2023 +0000

    Remove PTH extension from RWKV

    When loading the current model was blank unless you typed it out.

commit f0f325e
Author: Forkoz <59298527+Ph0rk0z@users.noreply.github.com>
Date:   Tue Mar 14 21:21:47 2023 +0000

    Remove Json from loading

    no more 20b tokenizer

commit 128d18e
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Mar 14 17:57:25 2023 -0300

    Update README.md

commit 1236c7f
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Mar 14 17:56:15 2023 -0300

    Update README.md

commit b419dff
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Mar 14 17:55:35 2023 -0300

    Update README.md

commit 72d207c
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Mar 14 16:31:27 2023 -0300

    Remove the chat API

    It is not implemented, has not been tested, and this is causing confusion.

commit afc5339
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Mar 14 16:04:17 2023 -0300

    Remove "eval" statements from text generation functions

commit 5c05223
Merge: b327554 87192e2
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Mar 14 08:05:24 2023 -0300

    Merge pull request oobabooga#295 from Zerogoki00/opt4-bit

    Add support for quantized OPT models

commit 87192e2
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Mar 14 08:02:21 2023 -0300

    Update README

commit 265ba38
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Mar 14 07:56:31 2023 -0300

    Rename a file, add deprecation warning for --load-in-4bit

commit 3da73e4
Merge: 518e5c4 b327554
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Mar 14 07:50:36 2023 -0300

    Merge branch 'main' into Zerogoki00-opt4-bit

commit b327554
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Mar 14 00:18:13 2023 -0300

    Update bug_report_template.yml

commit 33b9a15
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 23:03:16 2023 -0300

    Delete config.yml

commit b5e0d3c
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 23:02:25 2023 -0300

    Create config.yml

commit 7f301fd
Merge: d685332 02d4075
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 22:41:21 2023 -0300

    Merge pull request oobabooga#305 from oobabooga/dependabot/pip/accelerate-0.17.1

    Bump accelerate from 0.17.0 to 0.17.1

commit 02d4075
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Tue Mar 14 01:40:42 2023 +0000

    Bump accelerate from 0.17.0 to 0.17.1

    Bumps [accelerate](https://github.com/huggingface/accelerate) from 0.17.0 to 0.17.1.
    - [Release notes](https://github.com/huggingface/accelerate/releases)
    - [Commits](huggingface/accelerate@v0.17.0...v0.17.1)

    ---
    updated-dependencies:
    - dependency-name: accelerate
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit d685332
Merge: 481ef3c df83088
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 22:39:59 2023 -0300

    Merge pull request oobabooga#307 from oobabooga/dependabot/pip/bitsandbytes-0.37.1

    Bump bitsandbytes from 0.37.0 to 0.37.1

commit 481ef3c
Merge: a0ef82c 715c3ec
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 22:39:22 2023 -0300

    Merge pull request oobabooga#304 from oobabooga/dependabot/pip/rwkv-0.4.2

    Bump rwkv from 0.3.1 to 0.4.2

commit df83088
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Tue Mar 14 01:36:18 2023 +0000

    Bump bitsandbytes from 0.37.0 to 0.37.1

    Bumps [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) from 0.37.0 to 0.37.1.
    - [Release notes](https://github.com/TimDettmers/bitsandbytes/releases)
    - [Changelog](https://github.com/TimDettmers/bitsandbytes/blob/main/CHANGELOG.md)
    - [Commits](https://github.com/TimDettmers/bitsandbytes/commits)

    ---
    updated-dependencies:
    - dependency-name: bitsandbytes
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 715c3ec
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Tue Mar 14 01:36:02 2023 +0000

    Bump rwkv from 0.3.1 to 0.4.2

    Bumps [rwkv](https://github.com/BlinkDL/ChatRWKV) from 0.3.1 to 0.4.2.
    - [Release notes](https://github.com/BlinkDL/ChatRWKV/releases)
    - [Commits](https://github.com/BlinkDL/ChatRWKV/commits)

    ---
    updated-dependencies:
    - dependency-name: rwkv
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit a0ef82c
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 22:35:28 2023 -0300

    Activate dependabot

commit 3fb8196
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 22:28:00 2023 -0300

    Implement "*Is recording a voice message...*" for TTS oobabooga#303

commit 0dab2c5
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 22:18:03 2023 -0300

    Update feature_request.md

commit 79e519c
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 20:03:08 2023 -0300

    Update stale.yml

commit 1571458
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 19:39:21 2023 -0300

    Update stale.yml

commit bad0b0a
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 19:20:18 2023 -0300

    Update stale.yml

commit c805843
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 19:09:06 2023 -0300

    Update stale.yml

commit 60cc7d3
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:53:11 2023 -0300

    Update stale.yml

commit 7c17613
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:47:31 2023 -0300

    Update and rename .github/workflow/stale.yml to .github/workflows/stale.yml

commit 47c941c
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:37:35 2023 -0300

    Create stale.yml

commit 511b136
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:29:38 2023 -0300

    Update bug_report_template.yml

commit d6763a6
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:27:24 2023 -0300

    Update feature_request.md

commit c6ecb35
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:26:28 2023 -0300

    Update feature_request.md

commit 6846427
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:19:07 2023 -0300

    Update feature_request.md

commit bcfb7d7
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:16:18 2023 -0300

    Update bug_report_template.yml

commit ed30bd3
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:14:54 2023 -0300

    Update bug_report_template.yml

commit aee3b53
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:14:31 2023 -0300

    Update bug_report_template.yml

commit 7dbc071
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:09:58 2023 -0300

    Delete bug_report.md

commit 69d4b81
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:09:37 2023 -0300

    Create bug_report_template.yml

commit 0a75584
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 18:07:08 2023 -0300

    Create issue templates

commit 02e1113
Author: EliasVincent <riesyeti@outlook.de>
Date:   Mon Mar 13 21:41:19 2023 +0100

    add auto-transcribe option

commit 518e5c4
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Mar 13 16:45:08 2023 -0300

    Some minor fixes to the GPTQ loader

commit 8778b75
Author: Ayanami Rei <wennadocta@protonmail.com>
Date:   Mon Mar 13 22:11:40 2023 +0300

    use updated load_quantized

commit a6a6522
Author: Ayanami Rei <wennadocta@protonmail.com>
Date:   Mon Mar 13 22:11:32 2023 +0300

    determine model type from model name

commit b6c5c57
Author: Ayanami Rei <wennadocta@protonmail.com>
Date:   Mon Mar 13 22:11:08 2023 +0300

    remove default value from argument

commit 63c5a13
Merge: 683556f 7ab45fb
Author: Alexander Hristov Hristov <polimonom@gmail.com>
Date:   Mon Mar 13 19:50:08 2023 +0200

    Merge branch 'main' into main

commit e1c952c
Author: Ayanami Rei <wennadocta@protonmail.com>
Date:   Mon Mar 13 20:22:38 2023 +0300

    make argument non case-sensitive

commit b746250
Author: Ayanami Rei <wennadocta@protonmail.com>
Date:   Mon Mar 13 20:18:56 2023 +0300

    Update README

commit 3c9afd5
Author: Ayanami Rei <wennadocta@protonmail.com>
Date:   Mon Mar 13 20:14:40 2023 +0300

    rename method

commit 1b99ed6
Author: Ayanami Rei <wennadocta@protonmail.com>
Date:   Mon Mar 13 20:01:34 2023 +0300

    add argument --gptq-model-type and remove duplicate arguments

commit edbc611
Author: Ayanami Rei <wennadocta@protonmail.com>
Date:   Mon Mar 13 20:00:38 2023 +0300

    use new quant loader

commit 345b6de
Author: Ayanami Rei <wennadocta@protonmail.com>
Date:   Mon Mar 13 19:59:57 2023 +0300

    refactor quant models loader and add support of OPT

commit 48aa528
Author: EliasVincent <riesyeti@outlook.de>
Date:   Sun Mar 12 21:03:07 2023 +0100

    use Gradio microphone input instead

commit 683556f
Author: HideLord <polimonom@gmail.com>
Date:   Sun Mar 12 21:34:09 2023 +0200

    Adding markdown support and slight refactoring.

commit 3b41459
Merge: 1c0bda3 3375eae
Author: Elias Vincent Simon <riesyeti@outlook.de>
Date:   Sun Mar 12 19:19:43 2023 +0100

    Merge branch 'oobabooga:main' into stt-extension

commit 1c0bda3
Author: EliasVincent <riesyeti@outlook.de>
Date:   Fri Mar 10 11:47:16 2023 +0100

    added installation instructions

commit a24fa78
Author: EliasVincent <riesyeti@outlook.de>
Date:   Thu Mar 9 21:18:46 2023 +0100

    tweaked Whisper parameters

commit d5efc06
Merge: 00359ba 3341447
Author: Elias Vincent Simon <riesyeti@outlook.de>
Date:   Thu Mar 9 21:05:34 2023 +0100

    Merge branch 'oobabooga:main' into stt-extension

commit 00359ba
Author: EliasVincent <riesyeti@outlook.de>
Date:   Thu Mar 9 21:03:49 2023 +0100

    interactive preview window

commit 7a03d0b
Author: EliasVincent <riesyeti@outlook.de>
Date:   Thu Mar 9 20:33:00 2023 +0100

    cleanup

commit 4c72e43
Author: EliasVincent <riesyeti@outlook.de>
Date:   Thu Mar 9 12:46:50 2023 +0100

    first implementation
oushu1zhangxiangxuan1 added a commit to oushu1zhangxiangxuan1/transformers that referenced this pull request Mar 20, 2023
* Fix 2 quicktour file doctest (#21742)

* Update expect output values - as Hub repo. files are updated

* Update expect output values - as librosa is from 0.9.2 to 0.10.0 on CI docker

* fix

* update one more

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`GPTNeo`] Fix gradient checkpointing bug (#21733)

* fix bug

* forward contrib credits from discussions

* change logic

---------

Co-authored-by: edbeeching <edbeeching@users.noreply.github.com>

* Generate: Fix GIT batched captioning (#21738)

* Skip test_log_level for now

* Added Type Hints for modeling_tf_encoder_decoder.py (#21673)

* Ran Black formatting

* Added imports and reformatted

* Update src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Auto api Value Error addition to Troubleshoot (#21708)

* troubleshooting guide: added an error description for missing auto-mapping

* minor polishing

* changed the example

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/troubleshooting.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [deepspeed tests] fix issues introduced by #21700 (#21769)

* [deepspeed tests] fix issues introduced by #21700

* fix

* fix

* Graphormer fix  (#21699)

* Removed useless check for backend

* fix style check for graphormer

* Reverted change and corrected requires_backend for cython

* code qual

* fix: Change is_last chunk calc and add conditional break in chunk_iter (#21612)

* fix: Change is_last chunk calc and add conditional break

* format fix

* account for 0 and full stride_rights, add comment

* add new test

* make style

* update slow whisper asr test timestamps

* use nested_simplify on output and round timestamp to hundreths place

* [Flax] adding support for batch norm layers (#21581)

* [flax] adding support for batch norm layers

* fixing bugs related to pt+flax integration

* cleanup, batchnorm support in sharded pt to flax

* support for batchnorm tests in pt+flax integration

* simplifying checking batch norm layer

* [Examples] Generalise run audio classification for log-mel models (#21756)

* [Examples] Generalise run audio classification for log-mel models

* batch feature extractor

* make style

* Different behavior in DistilBERT when using "inputs_embeds" (#21752)

* Different behavior in DistilBERT when using "inputs_embeds"
Fixes #21089

* fix failing test

* [Flax] Fix erroneous kwargs being passed to generate config (#21765)

* [Whisper] Add SpecAugment (#21298)

* Return and rescale attention_mask

* Add SpecAugment to Whisper modeling

* Fix test

* Update docstring

* Add SpecAug related parameters to model config

* Add the _mask_input_features function to doc

* Fix quality

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove dev comments

* Add test

* Resolve conflict

* feat: mask {feature, time} prob fast tests

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix-ci-whisper (#21767)

* fix history

* input_features instead of input ids for TFWhisport doctest

* use translate intead of transcribe

* Generate - update cookie cutters to not initialize cache with training and gradient checkpointing (#21759)

* [time series] updated expected values for integration test. (#21762)

* updated expected

* prediction_length fix

* prediction_length default value

* default prediction_length 24

* revert back prediction_length default

* move prediction_length test

* [GPT2, ProphetNet] Fix gradient checkpointing bug (#21772)

* fix gradient checkpointing bug

* fix gradient checkpointing bug

* ran make fix-copies

* fixed bug

* fixed bug

* [SpeechT5] Fix HiFiGAN tests (#21788)

* Fix resume_from_checkpoint for deepspeed (#21735)

* Fix resume_from_checkpoint for deepspeed

Fix resume_from_checkpoint for deepspeed, by ensuring that the deepspeed engine is the one to load the checkpoint.

* Empty commit to trigger CI

* Removed deepspeed skipping 

Removed deepspeed skipping inside the _load_from_checkpoint function, as it is obsolete

* another adjustment

* Trigger CI

* trigger circleci

* style

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

* [examples/summarization] deal with `max_length` and `num_beams` (#21740)

* Override the decoding parameters of Seq2SeqTrainer

* Fix quality

* Fix max_length parameter

* Fix quality

* Remove redundant parameter max_length

* Separate the preprocess of train and validation to use different max_target_length

* Fix type in gpt2 config docstring (#21782)

Fix docstring gpt2 config

* Fix en documentation typos (#21799)

* fix wrong url

* typos in english documentation

* [FX tracer] Make `concrete_args` from outside available (#21775)

make concrete_args from outside available

* [Pipeline] Add zero shot audio classificatoin pipeline (#21600)

* add pipeline

* update init

* add zero shot to init

* update inits and correct checkpoints

* update base to support input features

* add tests

* Update src/transformers/pipelines/zero_shot_audio_classification.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/pipelines/zero_shot_audio_classification.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* update pieline code

* use tiny checkpoint

* nits and expected value with tiny model

* style

* last nit on tests values

* fix styling

* fix collate fn that was casting t float

* update

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* [torch] remove deprecated uint8 in favor of bool (#21384)

* uint8 -> bool

* fix copies

* style

* update test modeling commen when checking attention buffers

* style

* use logical not on random mask instead of subtraction with 1

* remove torch uint8

* quality

* remove modified modeling utils

* Update based on review

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

---------

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* [`tests`] add `accelerate` marker (#21743)

* add `accelerate` marker

* add to docs

* Update docs/source/en/testing.mdx

* Fix PyTorch Perceiver `PerceiverFourierPositionEncoding` with fp16 (#21787)

* fix perceiver fp16

* hopefully fix tests

* Fix nn.init.trunc_normal_ call on torch.float16 data (#21789)

fix nn.init.trunc_normal_ call on half data

* Fix gradient checkpointing bug in gptneox (#21815)

* Fix gradient checkpointing bug in gptneox

* Remove use_cache block

* Inheritance-based framework detection (#21784)

* Fix quality with `ruff==0.0.253` (#21828)

fix quality with ruff 0.0.253

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* introduce `logger.warning_once` and use it for grad checkpointing code (#21804)

* logger.warning_once

* style

* Rename `MobileViTModelTest` to `TFMobileViTModelTest` (#21825)

Let's give TF a bit more love ❤️ 🙏

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix gradient checkpointing bug BioGpt (#21844)

Co-authored-by: saswatmeher <saswatmeher@cse.iitb.ac.in>

* check for None forced tokens (#21793)

* Fix gradient checkpointing bug in git (#21818)

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix gradient checkpointing imagegpt (#21816)

* Fix gradient checkpointing bug in gptneox

* Fix gradient checkpointing bug in modeling_imagegpt.py

* Revert gpt neox changes

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix tf random token masking probability in data collator (#21834)

* fix tf random mask tokens probability

* fix tf random mask tokens probability in collator for langauge modelling

* [`T5`] Fix torchquant issue (#21843)

* fix torchquant issue

* add tests

* [`Blip2`] Add `Blip2Model` (#21817)

* add v1

* add `Blip2Model`

- add relevant functions
- add tests
- add on automapping

* fix docs

* fix doctest

* Fix the issue of blip model returning loss even when the label is not provided.  (#21811)

* Fix the issue of blip model returning loss even when the label is not provoided

* Fix ruff failure

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* [GPTJ] Fix gradient checkpointing bug  (#21794)

* If applied, this commit fixes generate bug in gptj

* Remove extra same code block

* formatting and test fix

* Conflict fix and declaration error fix

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add: task guide for zero shot object detection (#21829)

* zero shot object detection part 1

* added batch prediction section

* added image guided object detection section

* make style

* added the task guide to the TOC

* minor polishing

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

* added embedded owlvit demo

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor fix

* make style

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Make Slack CI reporting stronger (#21823)

* Use token

* Avoid failure

* better error

* Fix

* fix style

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`Blip2`] Fix Blip-2 multi gpu (#21707)

* fix blip multi gpu

* fix

* final changes

* adapt suggestions

* fix failing slow test

* forward contrib credits from testing and suggestions

* reformat

---------

Co-authored-by: akkikiki <akkikiki@users.noreply.github.com>

* Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval (#21684)

* Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval

* minor fix return_dict

* implement test for loss computation

---------

Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>

* 🔥Rework pipeline testing by removing `PipelineTestCaseMeta` 🚀 (#21516)

* Add PipelineTesterMixin

* remove class PipelineTestCaseMeta

* move validate_test_components

* Add for ViT

* Add to SPECIAL_MODULE_TO_TEST_MAP

* style and quality

* Add feature-extraction

* update

* raise instead of skip

* add tiny_model_summary.json

* more explicit

* skip tasks not in mapping

* add availability check

* Add Copyright

* A way to diable irrelevant tests

* update with main

* remove disable_irrelevant_tests

* skip tests

* better skip message

* better skip message

* Add all pipeline task tests

* revert

* Import PipelineTesterMixin

* subclass test classes with PipelineTesterMixin

* Add pipieline_model_mapping

* Fix import after adding pipieline_model_mapping

* Fix style and quality after adding pipieline_model_mapping

* Fix one more import after adding pipieline_model_mapping

* Fix style and quality after adding pipieline_model_mapping

* Fix test issues

* Fix import requirements

* Fix mapping for MobileViTModelTest

* Update

* Better skip message

* pipieline_model_mapping could not be None

* Remove some PipelineTesterMixin

* Fix typo

* revert tests_fetcher.py

* update

* rename

* revert

* Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests

* style and quality

* test fetcher for all pipeline/model tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Improve TF weight loading, especially PT crossloading (#21792)

* First commit for the improved PT-TF weight loading

* Remove workarounds from TFEncoderDecoder tests

* Allow a custom weight renaming function in from_pretrained and use that to clean up EncoderDecoder

* make fixup

* First attempt at visionencoderdecoder

* Disable tensorfloat32 in tests to get consistent outputs

* Quick fix to tf_vision_encoder_decoder tests

* make fixup

* Update Blenderbot tests

* Remove unused arg in modeling_tf_opt

* load_tf_sharded_weights had strict=True! This meant transfer learning was impossible, so I'm setting it to False.

* Support prefixes when loading sharded TF checkpoints

* make fixup

* Add test to load sharded models with a weight prefix

* Fix sharded weight loading test

* Add a test for transfer from a sharded checkpoint

* make fixup

* Add test to check that crossloading from PT with a prefix works

* Refactor from_pretrained in the encoderdecoder classes

* Refactor from_pretrained in the encoderdecoder classes

* missmatched -> mismatched

* Explicitly check for None

* No comments showing my very impressive and attractive knowledge of Py3.9+

* Disable TF32 across all TF tests

* Fix flaky test for log level (#21776)

* Fix flaky test for log level

* Fix other flaky test

* prepare for "__floordiv__ is deprecated  and its behavior will change in a future version of pytorch" (#20211)

* rounding_mode = "floor"  instead of // to prevent behavioral change

* add other TODO

* use `torch_int_div` from pytrch_utils

* same for tests

* fix copies

* style

* use relative imports when needed

* Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* [ConvBert] Fix #21523 (#21849)

* fix reshaping
Fixes #21523

* add test

* styling

* last fixes

* Update src/transformers/models/convbert/modeling_convbert.py

* code quallity

* Flax beam search fix (#21857)

* Fix gradient checkpointing bug Bart (#21866)

Co-authored-by: saswatmeher <saswatmeher@cse.iitb.ac.in>

* [deepspeed] check whether model is NLP one instead of counting on input type (#21800)

* trying to figure out whether model is NLP

* drop my changes and apply easier fix

* trying to handle all int input types

* fix logic

---------

Co-authored-by: Stas Bekman <stas@stason.org>

* Change the way tensor is reshaped in BartAttention (from .view to .reshape) (#21860)

* Change the .view call to .reshape

* Change the .view call to .reshape to all the copies from bart attention

* Fix copies and style

* Fix copies and style

* Fix copies and style

* Fix copies and style

* Fix copies and style

* Revert unneccessary changes

* Revert unneccessary changes

* Revert unneccessary changes

* Revert unneccessary changes

* Italian translation of community.mdx (#21871)

Italian translation of community.mdx gh-17459

* [`Blip`] Fix blip doctest (#21868)

fix blip doctest

* Removed BLIP mention from the troubleshooting guide (#21872)

removed BLIP mention from the troubleshooting guide

* update FSDP and add XLA-FSDP documentation (#21812)

* update FSDP and add XLA-FSDP documentation

* resolving comments

* minor update

* fix xla-fsdp docs

* [doc] deepspeed tests (#21859)

* Add an utility file to get information from test files (#21856)

* Add an utility file to get information from test files

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add check for different embedding types in examples (#21881)

* Add check for different embedding types in examples

* Correctly update summarization example

* Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights (#21879)

apply normal_ after assigning weight as nn.Parameter to avoid unnecessary initialization computation

* Add TFVisionTextDualEncoder (#21873)

* Temporary commit to stash everything so far

* Temporary commit to stash everything so far

* stash commit

* Refactor from_pretrained

* Fix final test, make fixup

* Update dummies

* Add model to TEST_FILES_WITH_NO_COMMON_TESTS

* Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add TFVisionTextDualEncoder to utils/documentation_tests.txt

* make fixup

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add ALIGN to transformers (#21741)

Adds the ALIGN model to transformers. ALIGN is introduced in "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision" by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.

* Fix Gradient checkpointing bug BigBird (#21882)

Co-authored-by: saswatmeher <saswatmeher@cse.iitb.ac.in>

* Fix `WhisperModelTest`  (#21883)

* force on the same device

* fix tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix `test_load_default_pipelines_pt` for `ClapModel` (#21886)

* fix tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix checkpoint (#21874)

* [Refactor] Relative imports wherever we can (#21880)

* initial commit

* update

* second batch

* style

* fix imports

* fix relative import on pipeline

* [ZAC] fix ci daily  (#21893)

add correct revision after model was overwritten

* Use PyAV instead of Decord in examples (#21572)

* Use PyAV instead of Decord

* Get frame indices

* Fix number of frames

* Update src/transformers/models/videomae/image_processing_videomae.py

* Fix up

* Fix copies

* Update timesformer doctests

* Update docstrings

* Add `inputs_embeds` functionality when generating with BioGPT  (#21889)

* initial commit to add inputs_embeds to generation

* formatting

* [T5 doc] Fix confusing documentation about `d_kv` (#21896)

* Confusing documentation in T5

* Fix onfusing documentation in T5 configuration file

* [Whisper] Add rescaling function with `do_normalize` (#21263)

* add `zero_mean_unit_var_norm` function

* normalize before MEL computation

* fixup

* add simple test

* quality

* Update tests/models/whisper/test_feature_extraction_whisper.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fixup

* use attention masks if padding was applied

* Update based on review

Co-authored-by: bofeng huang <bofenghuang7@gmail.com>

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: bofeng huang <bofenghuang7@gmail.com>

* fix typo in Bart's attention (#21898)

* [GPT-J] add deprecation warning (#21869)

* add deprecation warning

* remove pos ids from args docstirng

* fix failing test

* fsdp bf16 enable autocast (#21847)

* Fix gradient checkpointing bug LED (#21840)

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix gradient checkpointing bug M2M 100 (#21841)

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix gradient checkpointing bug marian (#21842)

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Mark pipeline tests to skip them easily (#21887)

* Mark pipeline tests to skip them easily

* Mark the mixin as pipeline test

* Update src/transformers/testing_utils.py

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Clean up auto mapping names (#21903)

* add new test

* fix after new test

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Prophetnet batch dimension inversion fix (#21870)

* decoder forward pass is working

* no model has forward pass returning attentions

* decoder ngram changed to not mix batch size

* current basic forward pass returns identical result

* passed test_model attentions

* passed test_encoder_decoder_model_generate

* passed test_headmasking

* removed old block

* removed comments bug/fixme

* removed bug comments

* applied styling

* applied fix-copies

* applied ngram forward comments

* corrected dimension notation

* applied styling and comment fixes

* changed asserts for raise ValueError

* changed question gen test

* updated hidden_states integration test

* applied styling

* Make schedulers picklable by making lr_lambda fns global (#21768)

* Make schedulers picklable by making lr_lambda fns global

* add unused _get_constant_schedule_lr_lambda arg

* remove unneeded _get_constant_schedule_lr_lamda

* add test

* make style

* rebase, remove torch dep, put lambda back

* repo-consistency and style

* Refactor whisper asr pipeline to include language too. (#21427)

* [WIP] whisper refacto to support language output.

* Handling merges.

* A bit more cleanup and comments.

* Many improvements.

Lots of details everywhere.

* Cleanup old code and tests.

* Handle lone timestamp tokens (just recover when something bad happens).

* Adding return_language example.

* No ffmpeg.

* Hmm.

* Some corrections.

* Both fast and slow.

* New black.

* Update src/transformers/models/whisper/tokenization_whisper.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove print.

* Undoing tests modifications.

* Smaller test modifications.

* Rename.

* Remove maxDiff.

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add Blip and Blip2 for pipeline tests (#21904)

* fix

* add to tests

* style and quality

* add missing

---------

Co-authored-by: NielsRogge <NielsRogge@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Temporarily skip 3 tests in `BridgeTowerModelTest` (#21908)

skip for now

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Faster zero shot image (#21897)

* Make ZeroShotImageClassificationPipeline faster

The pipeline makes separate calls to model for each candidate label.
This commit combines all labels into one call.
Original code takes more that 60 seconds to process one image and 1000
candidate labels. Updated code takes less than 2 seconds.

* implement batching

* code formatting

* Creating an even faster zero-shot-image-classifiction.

Unfortunately super tailored towards CLIP.

Co-Authored-By: Yessen Kanapin <yessen@deepinfra.com>

* Quality.

* Cleanup.

* Order different on the CI it seems.

* Cleanup.

* Quality.

---------

Co-authored-by: Yessen Kanapin <yessen@deepinfra.com>

* [time series] Add Time series inputs tests (#21846)

* intial test of inputs

* added test for generation

* remove asserts

* fixed test

* Update tests/models/time_series_transformer/test_modeling_time_series_transformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Avoid modeling tests run in pipeline CI jobs (#21911)

* rework is_pipeline_test

* bring back 3 tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix doctests for TFVisionTextDualEncoder (#21910)

* faster forward following what is done for images (#21906)

* faster forward following what is done for images

* add missing licence

* Fix gradient checkpointing bug in MBart (#21918)

* Fix gradient checkpointing bug in mvp (#21920)

* Fix gradient checkpointing megatron bert (#21921)

* Update `model_split_percents` for `WhisperModelTest` (#21922)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Use large VM for `repo_utils_job` (#21928)

upgrade to large VM

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Cleanup more auto mapping names  (#21909)

* fix auto 2

* fix auto 2

* fix task guide issue

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* feat: filter try/except when looking at custom code (#21914)

* feat: filter try/except

* Update src/transformers/dynamic_module_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix `AlignModelTest` tests (#21923)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Avoid failure in `check_repo.py` due to missing backends (#21930)

* Update utils/check_repo.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update utils/check_repo.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix wrong documentation about DataCollator padding defaults (#21919)

* Fix wrong documentation about DataCollator padding defaults

* Fix styling

* [Flan-UL2] Add-flan-ul2 (#21929)

* add doc and readme

* add model docs

* update toctree and fix copies

* update

* update doc file

* fix

* add FLAN-UL2 to configuration mapping

* fixup

* Apply suggestions from code review

* more clarification

---------

Co-authored-by: younesbelakda <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update README logo (#21933)

* [CLAP] Support batched inputs for CLAP. Fixes pipeline issues (#21931)

* fix pipeline

* fix feature_extraction clap

* you can now batch the `is_longer` attribute

* add tests

* fixup

* add expected scores

* comment on is_longert

* [Whisper] Fix feature normalization in `WhisperFeatureExtractor` (#21938)

Fix feature normalization in WhisperFeatureExtractor

* Fix gradient checkpointing bug in OPT (#21943)

* Fix gradient checkpointing bug in Pegasus (#21944)

* Fix gradient checkpointing bug in Rembert (#21945)

* Fix gradient checkpointing bug in Roformer (#21946)

* Fixed gradient_checkpointing/use_cache bug in blenderbot (#21833)

* Fixed gradient_checkpointing/use_cache bug in blenderbot

* Update modeling_blenderbot.py

* Added back if statement

* Formatted using black

* Update expected values in `XLMProphetNetModelIntegrationTest` (#21957)

update values

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [CI] Fix ci  (#21940)

* fix `get_proposal_pos_embed`

* fix order

* style

* zero shot simplify test

* add approximate values for zero shot audio classification

* Disable DDP for neuron (#21953)

Disable DDp for neuron

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-42-72.us-west-2.compute.internal>

* Fix bert issue (#21963)

Co-authored-by: saswatmeher <saswatmeher@cse.iitb.ac.in>

* [Generate] Fix gradient_checkpointing and use_cache bug for BLOOM (#21956)

Step 1 - Change use_cache fix

* Add missing parameter definition in layoutlm config (#21960)

Four parameters in `LayoutLM` config were missing definitions, Added their definition (copied from BertConfig).

* Use larger atol in `torch.allclose` for some tests (#21966)

Use larger atol

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add TF contrastive image text finetuning example (#21939)

* Initial commit

* stash commit

* Add model checkpointing and pushing

* Fix model name inference

* Update README

* Update README

* Remove a couple of Torch references

* Update copyright date

* make fixup

* Update PushToHubCallback args!

* Remove the torch summary

* Add strategy.scope

* Update expected values for `test_xglm_sample` (#21975)

update expected values for xglm

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix gradient checkpointing bug in BigBird Pegasus (#21976)

* Fix gradient checkpointing bug in Blenderbot Small (#21977)

* Fix gradient checkpointing bug in BlipText (#21978)

Make Format

* Fix gradient checkpointing bug in Codegen (#21979)

* Fix gradient checkpointing bug in ESM (#21980)

* docs: improve clarity for language modeling (#21952)

* docs: improve clarity for clm/mlm

* docs: remove incorrect explanation

* docs: remove incorrect explanation

---------

Co-authored-by: pdhall99 <pdhall99>

* Update `Jukebox` tests (#21984)

* update expected values for jukebox

* update expected values for jukebox

* update expected values for jukebox

* update expected values for jukebox

* update expected values for jukebox

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add check before int casting for PIL conversion (#21969)

* Add check before int casting for PIL conversion

* Line length

* Tidier logic

* Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens (#21959)

* Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens

* fix docs

* Empty commit

* formatting

* [DETR, YOLOS] Fix device bug (#21974)

* Fix integration test

* Add test

* Add test

* Remove unneeded casts to bool (#21983)

Remove cast to Bool

* Update `notification_service.py` (#21992)

* better check

* better check

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Skip `test_multi_gpu_data_parallel_forward` for some model tests (#21991)

skip test_multi_gpu_data_parallel_forward for some model tests

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [Whisper] Add model for audio classification (#21754)

* [Whisper] Add model for audio classification

* make fix-copies

* add to docs

* add docstring

* empty returns

* add code example

* switch to fleurs

* stick everything on one line

* Stop requiring Torch for our TF examples! (#21997)

* Stop requiring Torch for our TF examples!

* Slight tweak to logging in the example itself

* [TF] Fix creating a PR while pushing in TF framework (#21968)

* add create pr arg

* style

* add test

* ficup

* update test

* last nit fix typo

* add `is_pt_tf_cross_test` marker for the tsts

* [DETR and friends] Remove is_timm_available (#21814)

* First draft

* Fix to_dict

* Improve conversion script

* Update config

* Remove timm dependency

* Fix dummies

* Fix typo, add integration test

* Upload 101 model as well

* Remove timm dummies

* Fix style

---------

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* [Time-Series] informer model (#21099)

* added informer to gitignore

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* moved enc-dec init to InformerEncoder/Decoder init

* added 'init_std' to config, now model init works!

* WIP conversion script, and added code sources

* WIP conversion script: loading original informer pth works

* WIP conversion script: change defaults in the config

* WIP conversion script: supporting Informer input embedding

* WIP conversion script: added parameters for the informer embed

* WIP conversion script: change dim_feedforward=2048

* WIP conversion script: remove unused args for loading checkpoint

* just cleaning up

* DataEmbedding removed, after thinking with Kashif

* working on forward pass

* WIP forward pass: trying to establish working batch for forward pass

* cleaning and finalizing

* adding HF names and docs

* init after cleaning works

* WIP in tests

* added docs for the informer specific args

* fix style

* undo change

* cleaning informer, now need to work only enc-dec

* initial enc-dec classes

* added encoder and decoder

* added todo

* add todos for conv_layers

* added decoder docs from vanilla

* added encoder docs from vanilla

* remove encoder decoder from the original informer

* removed AttentionLayer from the original paper

* removed TriangularCausalMask, same as decoder_attention_mask

* initial sparse attention

* use conv_layers

* fixed test_config test

* fix parenthesis when itearting zip(layers, conv_layers)

* error found in prob attention, added sizes as comments

* fix sizes

* added proposal for q_reduce indexing, and remove unused

* WIP ProbMask, and changed factor=2 for testing

* remove unused libs for this PR for creating the env

* fix checking the attn_weights.size() after bmm

* Q_reduce: changed from torch.gather to simple slicing

* WIP calculate final attn_output

* finish adding v_aggregated, attn_output ready

* changed tgt_len to u in attention_mask, need to fix the size error

* comment attention_mask for encoder, and fix if cond for v_agg

* added ProbMask support (wip), removed old original code

* finished ProbMask 😃

* Revert "remove unused libs for this PR for creating the env"

This reverts commit 11a081e09e92771e51a5d2758d53a9afb59547f0.

* fixes

* make style

* fix initial tests

* fix more tests

* dry

* make style

* remove unused files

* style

* added integration tests

* fix num_static_real_features

* fix header

* remove unused function

* fix example

* fix docs

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/modeling_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fixes for reviewer

* use prediction_length from model

* fix style

* fixed informer.mdx

* added to index

* updated readme

* undo

* make fix-copies

* typo

* fix copy

* added Informer to toctree

* in order

* fixed comments

* remove unneeded new lines in docs

* make static real and cat optional

* fix use of distil conv layers

* fixed integration test

* added checkpoint for convlayer

* make fix-copies

* updated from time series model

* make fix-copies

* copy decoder

* fix unit tests

* updated scaling config

* fix integration tests

* IGNORE_NON_TESTED

* IGNORE_NON_AUTO_CONFIGURED

* IGNORE_NON_AUTO_CONFIGURED

* updated check configs

* fix formatting

* undo change from time series

* prediction_length should not be None

* aliign with the blog: prettify ProbSparse and change attention_factor  to sampling_factor

* make style

* make fix-copies

* niels CR: update contributed by

* niels CR: update configuration_informer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: update kashif -> huggingface

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: `sampling_factor` only relevant when `attention_type`=prob

* make style

* fixed U_part: added multiplication by `L_Q`

* fixed bug: remove `is not None` from `if config.distil`

* fixed test: `decoder_seq_length` to `encoder_seq_length` in cross_attentions check

* fix integration tests

* updated model hub

* do not shift as in training

* undo

* fix make-copies

* make fix-copies

* added `if prediction_length is None`

* changed `ProbSparseAttention` to `InformerProbSparseAttention`

* changed `V_sum` -> `v_mean_dim_time`

* changed `ConvLayer` to `InformerConvLayer` and fixed `super()`

* TimeSeriesTansformer->Informer in decoder's Copied from

* more descriptive in ProbSparse

* make style

* fix coped from

* Revert "added `if prediction_length is None`"

This reverts commit b4cbddfa05e3bd739b79569cd3c3b89e316f2451.

* fixed indent

* use InformerSinusoidalPositionalEmbedding

* make fix-style

* fix from #21860

* fix name

* make fix-copies

* use time series utils

* fix dec num_heads

* docstring

* added time series util doc

* _import_structure

* formatting

* changes from review

* make style

* fix docs

* fix doc

* removed NegativeLogLikelihood

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update tiny model creation script and some others files (#22006)

* Update 1

* Update 2

* Update 3

* Update 4

* Update 5

* Update 6

* Update 7

* Update 8

* Update 9

* Update 10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Generate - add 1 to cur_len to make up the new beam length (#21993)

* add 1 to cur_len to make up the new beam length

cur_len is 1 token shorter comparing to the length of the sequence whose best_sum_logprobs is the numerator.

* cur_len+=1 before check if beam hyp is done

* format code

* reformat with black

---------

Co-authored-by: Chiming <chiming@biomap.com>

* VideoMAE doctest - use valid dummy pixel values (#22022)

Use valid dummy pixel values

* update: bertology paper (#22012)

* Update `AudioClassificationPipelineTests::test_small_model_pt` for PT 2.0.0 (#22023)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`bnb`] Fix bnb error message (#22026)

* fix error message

* make style

* [WIP] Add BridgeTowerForContrastiveLearning (#21964)

* Add BridgeTower for ITC

* Fix review feedback

* Rename BridgeTowerForITC, cleanup

* Fix style and quality

* implement tests

---------

Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>

* Fix test for torchneuroncore in Trainer (#22028)

* Add tokenize_kwargs parameter definition in the FeatureExtractionPipeline (#22031)

add tokenize_kwargs doc in the FeatureExtractionPipeline

* [examples/speech-recognition] Add SpecAugment to run_speech_recognition_seq2seq.py (#21942)

* Add specaugment to run_speech_recognition_seq2seq.py

* Remove useless argument: text_column

* Fix quality

* Update return_attention_mask condition

* Update specaugment arguments only for whisper models

* Remove SpecAugment arguments from ModelArguments, only leave default values for simplicity

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update apply_spec_augment only for whisper models

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Rename return_attention_mask to forward_attention_mask to avoid confusion with wav2vec2 models

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fixes the gradient checkpointing of whisper (#22019)

* fixing

* Update modeling_whisper.py

* Update modeling_whisper.py

* Update src/transformers/models/whisper/modeling_whisper.py

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Avoid `text_config_dict` and `vision_config_dict` being saved  for CLIP-like models (#22035)

* Avoid text_config_dict and vision_config_dict being saved

* for other CLIP-like models

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Mark all `BridgeTower` tests slow for now (#22039)

* slow me

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Bug fix: token classification pipeline while passing offset_mapping (#22034)

fix slow tokenizers with passing offset_mapping

* Update ALIGN docs (#22025)

* Fix typos and add code examples, resources

* [21737][T5]: Fix gradient checkpoint bug (#22036)

* [21737][T5]: Fix gradient checkpoint bug

* [21737][T5]: Fix gradient checkpoint bug

* [21737][T5]: Fix gradient checkpoint bug

* Update src/transformers/models/mt5/modeling_mt5.py

* Update src/transformers/models/t5/modeling_t5.py

---------

Co-authored-by: njindal <njindal@adobe.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Docs Improvement - In ZSH, not using ' ' around pip install fails, fix it (#22045)

In ZSH, not using ' ' around pip install fails

Running 
```
pip install transformers[torch]
```
in the default ZSH terminal will fail with the error `zsh: no matches found: transformers[torch]`

The solution is to wrap the installation path in ' ' like 
```
pip install 'transformers[torch]'
```

Relevant StackOverflow: https://stackoverflow.com/questions/30539798/zsh-no-matches-found-requestssecurity

* Can't install tf2 on M1 Chip by default (#22046)

* Remove set_access_token usage + fail tests if FutureWarning (#22051)

* Remove set_access_token usage + fail tests if FutureWarning

* do not fail on FutureWarning in CI

---------

Co-authored-by: testbot <lucainp@hf.co>

* Show the number of `huggingface_hub` warnings in CI report (#22054)

* show hfh warnings

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Return analysis for hyperparameter_search with Ray backend (#22040)

* return analysis for hyperparameter_search with ray backend

* Revert "return analysis for hyperparameter_search with ray backend"

This reverts commit cd5179070930e03020d96d98eb51dec3eb21ef75.

* add run_summary attribute to BestRun and return analysis for ray backend

* fix typo

* add doc for run_summary for ray backend

* pt-to-tf model architecture override (#22055)

* Add an argument to pt-to-tf to allow overriding the model class

* make fixup

* Minor fix to error message

* Remove unused extra conversion from the script

* rm $ symbol from code block from contributing.md (#22057)

rm $ symbol from code block 

Removed the $ symbol from the code block to make copy-pasting easier.

* [deepspeed] offload + non-cpuadam optimizer exception (#22043)

* [deepspeed] offload + non-cpuadam optimizer exception

* flip

* revert min version

* Edit the docstring of `image_processing_donut` to match code (#22033)

* Edit the docstring of `image_processing_donut` to match code

* improve style

* more style improvement after installing quality

* Skip 3 tests for `WhisperEncoderModelTest` (#22060)

* skip 3 tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add setters by type of args to TrainingArguments (#21570)

* Add setters by type of args to TrainingArguments

* Define more setters

* Update tiny model creation script (#22058)

Update the script

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix case when using --gradient_accumulation_steps with DDP disabled. (#22007)

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-42-72.us-west-2.compute.internal>

* Add a progress bar for the total download of shards (#22062)

* Add a progress bar for the total download of shards

* Check for no cache at all

* Fix check

* Fix gradient checkpointing bug in Speech2Text (#22079)

* Fix gradient checkpointing bug in Speech2Text

* Update modeling_speech_to_text.py

* Update modeling_speech_to_text_2.py

* Fix gradient checkpointing bug in switch transformer (#22081)

* [GPT2] Propose fix for #21080 (#21853)

* Make sure position ids are masked

* test that padded input produce the same results

* fix failing tests

* fixup

* fix batch test

* Fix small typo in flan-ul2.mdx (#22068)

* Update flan-ul2.mdx

* Update flan-ul2.mdx

* Generate - Fix broken documentation links (#22078)

fix broken links

* Fix gradient checkpointing bug in Speecht5 (#22080)

* Fix gradient checkpointing bug in Speecht5

* Update modeling_speech_to_text.py

* Update src/transformers/models/speech_to_text/modeling_speech_to_text.py

* Fix change errors

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix hint in src/transformers/modeling_utils.py (#22074)

fix hint

* handle numpy inputs in whole word mask data collator (#22032)

* GPT-J specific half precision on CPU note (#22086)

* re: #21989

* update re: #21989

* removed cpu option

* make style

* Fix imports of TF MobileViT (#22065)

* Fix imports of TF MobileViT

* Fix copies

* Revert "[GPT2] Propose fix for #21080" (#22093)

Revert "[GPT2] Propose fix for #21080 (#21853)" to avoid CI failure

This reverts commit a3fef89b2694fac4dd642a3f77d3e96d0c3df82a.

* [Whisper] Remove embed_tokens from encoder docstring (#21996)

* [Whisper] Remove embed_tokens from encoder docstring

* new line to retrigger CI

* remove new line

* Add AutoModelForZeroShotImageClassification (#22087)

Adds AutoModelForZeroShotImageClassification to transformers

* add new model of MGP-STR (#21418)

* add new model of MGP-STR

* fix the check failings

* remove torch and numpy from mgp_tokenization

* remove unused import from modeling_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str.py

* add test_processing_mgp_str

* add test_processing_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str and add softmax outs to model

* rm test_processing_mgp_str and add softmax outs to model

* rewrite the code of mgp-str according to PR suggestions

* rewrite the code of mgp-str according to PR suggestions

* add new model of MGP-STR

* fix the check failings

* remove torch and numpy from mgp_tokenization

* remove unused import from modeling_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str.py

* add test_processing_mgp_str

* add test_processing_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str and add softmax outs to model

* rewrite the code of mgp-str according to PR suggestions

* rewrite the code of mgp-str according to PR suggestions

* remove representation_size from MGPSTRConfig

* reformat configuration_mgp_str.py

* format test_processor_mgp_str.py

* add test for tokenizer and complete model/processer test and model file

* rm Unnecessary tupple in modeling_mgp_str

* reduce hidden_size/layers/label_size in test_model

* add integration tests and change MGPSTR to Mgpstr

* add test for logit values

* reformat test model file

---------

Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>

* Add pr_checks.mdx Italian translation (#17459) (#22116)

* Add pr_checks.mdx Italian translation (#17459)

* Updated pr_checks.mdx Italian translation (#17459)

* Fix gradient checkpointing bug in xglm (#22127)

* Fix gradient checkpointing bug in Trajectory Transformer (#22125)

* Fix gradient checkpointing bug in xlm_roberta_xl (#22128)

* Added big_models.mdx italian translation #17600  (#22115)

* updated toctree

* italian translation big_model.mdx

* italian translation big_models

* [`Blip2`] skip accelerate test (#22124)

skip accelerate test

* Fix gradient checkpointing bug in xmod (#22129)

* Fix gradient checkpointing bug in LongT5 (#22130)

* Fix gradient checkpointing bug in trocr (#22126)

* Fix gradient checkpointing bug in trocr

* Fix format

* Update src/transformers/models/trocr/modeling_trocr.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Zero-shot image classification task guide (#22132)

* WIP

* WIP

* manual inference example

* make style

* Apply suggestions from code review

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

---------

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

* Fix doc link for MGP-STR (#22138)

* Adding Type Hints to TF_Pegasus model (#21941)

* Adding Type Hints to TF_Pegasus model

* Updated some parameters per maintainer comments

* Add a new script to check model testers' config (#22063)

* Add script

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update configuration_align.py (projected_dim=640) (#22139)

Update configuration_align.py

updated projected_dim=640 from 512 in arguments of AlignConfig

* [`Whiper`] add `get_input_embeddings` to `WhisperForAudioClassification` (#22133)

* add `get_input_embeddings` to `WhisperForAudioClassification`

* add common tests

* fix another common test

* Update tests/models/whisper/test_modeling_whisper.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix style

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Trainer: let generate pick its inputs (#22108)

* Let generate pick its inputs

* fix squad seq2seq example

* Enforce same behavior as PyTorch 2.0 for older versions (#22136)

* [trainer] fix bug in grad accum with multiple epochs (#22098)

* [trainer] fix bug in grad accum

* comment out debug

* fix one-off

* rename counter

* [deepspeed docs] Activation Checkpointing (#22099)

* [deepspeed docs] Activation Checkpointing

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update deepspeed.mdx

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove backend check for torch.compile (#22140)

* Remove backend enforcment for torch.compile

* Update error

* Update src/transformers/training_args.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Style

---------

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* [Safetensors] Add explicit  flag to from pretrained (#22083)

* [Safetensors] Add explicit  flag to from pretrained

* add test

* remove @

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Prepare daily CI for torch 2.0.0 (#22135)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* docs:  New terms and updates to glossary (#21982)

* Updated glossary with new terms, added abbreviations for certain terms and merged autoencoding models, autoregressive models and causal language modeling into encoder and decoder models

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Added link to 'Pipeline for inference' tutorial

* Trigger CI

* Update docs/source/en/glossary.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Added entry for self supervised learning, added deleted entries + fixed broken links

* Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [🛠️] Fix-whisper-breaking-changes  (#21965)

* temp fix

* temporary fix

* update

* fix tests

* fixup

* update based on reveiew

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* update to fix tests

* update docstring

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Move `is_pipeline_test_to_skip` to specific model test classes (#21999)

* Move `is_pipeline_test_to_skip` to specific model test classes

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add ConvNeXT V2 (#21679)

* Add ConvNeXt V2 to transformers
* TF model is separated from the PR to fix issues

* Update 2 doctest expected values for torch 2.0.0 (#22148)

update values

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Translation Italian: perf_train_cpu and perf_train_cpu_many (#22151)

* added translated files

added perf_train_cpu and perf_train_cpu_many

* updated toctree

* Fix big model inference for T5 models in float16 (#22095)

* Fix big model inference for T5 models in float16

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Style

* Trigger CI with latest release

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Create MaskedImageCompletionOutput and fix ViT docs (#22152)

* create MaskedImageCompletionOutput

* fix bugs

* fix bugs

* to_pil - don't rescale if int and in range 0-255 (#22158)

* Don't rescale if in and in range 0-255

* Raise value error if int values too large

* Update tests/test_image_transforms.py

* Update tests/test_image_transforms.py

* [trainer] add `--optim adamw_torch_fused` for pt-2.0+ (#22144)

* [trainer] add --optim adamw_torch_fused

* change optim default

* deal with non-torch

* revert default change; prep; add fp16/amp assert

* typo

* typo

* Revert "Enforce same behavior as PyTorch 2.0 for older versions" (#22163)

Revert "Enforce same behavior as PyTorch 2.0 for older versions (#22136)"

This reverts commit 1c801d65eb42a71ea52db797af760bd96c8b113f.

* v4.28.0.dev0

* Load optimizer state on CPU to avoid CUDA OOM (#22159)

* Run all tests by default (#22162)

* Fix: unfinished_sequences with correct device  (#22184)

Fix: unfinished_sequences with correct device 

The original code was causing errors when running torch.jit.trace due to the tensor options being incorrect. I fixed this by using torch.ones to create a tensor with the correct device and dtype. This should resolve the issue with running torch.jit.trace.

* Revert 22152 MaskedImageCompletionOutput changes (#22187)

Revert changes

* Regression pipeline device (#22190)

* Fix regression in pipeline when device=-1 is passed

* Add regression test

* Update BridgeTowerForContrastiveLearning (#22145)

* Use return_loss for BridgeTowerForContrastiveLearning, add example

* fix tests

* Update example in BridgeTowerForContrastiveLearning

* Update test_modeling_bridgetower.py

* update model output format

* minor update

* Update src/transformers/models/bridgetower/modeling_bridgetower.py

* make style

---------

Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* t5 remove data dependency (#22097)

* t5 remove data dependency

* make style

* make fix-copies

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com>

* Fix DeepSpeed CI (#22194)

* Deal with torch-tensorrt

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typo in  Align docs  (#22199)

Fix align docs typo

* Update expected values in `MgpstrModelIntegrationTest` (#22195)

Update values

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Italian Translation of migration.mdx (#22183)

* Tranlstion Italian: migration

* Update migration.mdx

minor fixes

* Update _toctree.yml

* Delete migration.mdx

* Add italian translation of migration.mdx

* Update of migration.mdx translation and toctree

* LLaMA Implementation (#21955)

* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------

Co-authored-by: Stella Biderman <stellabiderman@gmail.com>

* LLaMA Implementation (#21955)

* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------

Co-authored-by: Stella Biderman <stellabiderman@gmail.com>

* Update tiny model creation script (#22202)

* Update UNCONVERTIBLE_MODEL_ARCHITECTURES

* Deal with 2 model tester classes in single test file

* Deal with 2 model tester classes in single test file

* Deal with 2 model tester classes in single test file

* make style and quality

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Temporarily fix ONNX model exporting error (#21830)

* Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143

* Reduced column width

* Fix formatting.

* Revert "Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143"

This reverts commit 6e95a108042118d204da447729f3834affa354fc.

* Fix export error.

* Revert "Fix formatting."

This reverts commit 8310f60da10358edbdf77a2a2f3c83ee55066cb8.

* Propagated changes made in SwinV2 to Swin2SR

* [`XGLM`] Add `accelerate` support for XGLM (#22207)

* add `accelerate` support for XGLM

* fix order

* fixes a typo in WhisperFeatureExtractor docs. (#22208)

* fixes a typo

* .

* 🔥py38 + torch 2 🔥🔥🔥🚀 (#22204)

* py38 + torch 2

* increment cache versions

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Hotfix for natten issue with torch 2.0.0 on CircleCI (#22218)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix typos in llama.mdx (#22223)

* fix code example in mgp-str doc (#22219)

Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>

* Use `dash==2.8.1` for now for daily CI (#22227)

Use dash 2.8.1 for now

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Depth estimation task guide (#22205)

* added doc to toc, auto tip with  supported models, mention of task guide in model docs

* make style

* removed "see also"

* minor fix

* LLaMA house-keeping (#22216)

* LLaMA house-keeping

* Doc links

* fix AutoTP in deepspeed could not work for bloom (#22196)

* fix AutoTP in deepspeed could not work for bloom

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add a method in BloomModel to build ailib

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Add LlamaForSequenceClassification (#22209)

* Add LlamaForSequenceClassification

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Add docstring

* Add test

* Add input embedding getter and setter

* Remove dead code

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Removed .mdx extensi…
@MrZilinXiao
Copy link

What's the motivation for the three special tokens tokenizer.pad_token, tokenizer.bos_token, tokenizer.eos_token = '' when converting the llama tokenizer?

Yeah, I am also quite confused. And I noticed even with add_special_tokens=True, LlamaTokenizer will not append an EOS token explicitly.

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Mar 27, 2023

Just started working on a fix for the tokenizer conversion, will link the PR asap: #22402

raghavanone pushed a commit to raghavanone/transformers that referenced this pull request Apr 5, 2023
* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------

Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
raghavanone pushed a commit to raghavanone/transformers that referenced this pull request Apr 5, 2023
* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------

Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
xloem pushed a commit to xloem/transformers that referenced this pull request Apr 9, 2023
* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------

Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
@ucas010
Copy link

ucas010 commented Apr 12, 2023

what's the version of huggingface-transformers for LLaMa ?

@hscspring
Copy link

hscspring commented Apr 12, 2023

When I am converting the llama to hf format (using convert_llama_weights_to_hf.py), each part of layer is the same size as the whole file
is anyone who met the same issue before?

python convert_llama_weights_to_hf.py  --input_dir models/ --model_size 7B --output_dir llama/fp32/

the subfiles (it is processing):

-rw-rw-r--. 1 wyd anaconda  26G Apr 12 15:55 pytorch_model-1-of-33.bin
-rw-rw-r--. 1 wyd anaconda  26G Apr 12 15:59 pytorch_model-2-of-33.bin
-rw-rw-r--. 1 wyd anaconda 129M Apr 12 15:59 pytorch_model-3-of-33.bin

the raw file:

-rw-rw-r--. 1 wyd anaconda  26G Apr 12 15:50 consolidated.00.fp32.pth

it's going right when i converted the llama to fp16.

Ph0rk0z pushed a commit to Ph0rk0z/text-generation-webui-testing that referenced this pull request Apr 17, 2023
huggingface/transformers#21955

The tokenizer class has been changed from

"LLaMATokenizer"

to

"LlamaTokenizer"

It is necessary to edit this change in every tokenizer_config.json
that you had for LLaMA so far.
@constanzafierro
Copy link

I think there might be something off with this model and tokenizer https://discuss.huggingface.co/t/llamatokenizerfast-returns-token-type-ids-but-the-forward-pass-of-the-llamamodel-does-not-receive-token-type-ids/42431 could someone take a look?

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Jun 9, 2023

Hey! Yes this is fixed by #24042

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------

Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
model = transformers.LlamaForCausalLM.from_pretrained("/output/path/llama-7b/")
```

- The LLaMA tokenizer is based on [sentencepiece](https://github.com/google/sentencepiece). One quick of sentencepiece is that when decoding a sequence, if the first token is the start of the word (e.g. "Banana"), the tokenizer does not prepend the prefix space to the string. To have the tokenizer output the prefix space, set `decode_with_prefix_space=True` in the `LlamaTokenizer` object or in the tokenizer configuration.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zphang Hi,
Developing a streaming(token by token) generation application, I faced the need for this functionality. I can't find its implementation with some minutes of searching back and forth. Could you point it out?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add_prefix_space will be added in #25224

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLaMA