KEMBAR78
feat: extract rev in attn_implementation kernels via @ by drbh · Pull Request #40009 · huggingface/transformers · GitHub
Skip to content

Conversation

@drbh
Copy link
Contributor

@drbh drbh commented Aug 7, 2025

This PR adds the ability to specify kernel revisions via the @ symbol in in the attn_implementation in AutoModelForCausalLM.from_pretrained

Example usage

uv run repro.py
# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "accelerate",
#     "torch==2.7.0",
#     "transformers",
#     "kernels>=0.9.0",
# ]
#
# [tool.uv.sources]
# transformers = { git = "https://github.com/drbh/transformers.git", branch = "allow-kernel-rev" }
# ///
import time
import torch

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig


torch.set_float32_matmul_precision("high")

model_id = "meta-llama/Llama-3.2-3b-Instruct"
model = (
    AutoModelForCausalLM.from_pretrained(
        model_id,
        # attn_implementation="kernels-community/flash-attn@main",
        # attn_implementation="kernels-community/flash-attn@56449c1aa267bd0f48a191f0e6979dedf9f2ec32", # (mains sha)
        attn_implementation="kernels-community/flash-attn@09eec95", # main-1 commit sha
        torch_dtype=torch.bfloat16,
    )
    .eval()
    .cuda()
)
tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="left")

print("+ Model loaded successfully.")

prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
generation_config = GenerationConfig(
    temperature=0.1,
    top_p=0.95,
    top_k=50,
    num_beams=1,
    max_new_tokens=50,
    do_sample=True,
    seed=42,
)
start_time = time.time()
with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        generation_config=generation_config,
    )
end_time = time.time()
print(f"+ Generation time: {end_time - start_time:.2f} seconds")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
print("+ Inference completed successfully.")

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this 🤗 ! A minor nit is to update the regex on L:2724 to reflect this change

Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 2724 to 2738
# Accept HF kernel references in the form:
# <namespace>/<repo_name>[@<revision>][:<kernel_name>]
#
# - <namespace> and <repo_name> are any non-"/" and non-":" sequences.
# - "@<revision>" is optional (branch, tag, or commit-ish), e.g. "@main", "@v1.2.0", "@abc123".
# - ":<kernel_name>" is optional and selects a function inside the kernel repo.
# - Both options can appear together and in this order only: @revision first, then :kernel_name.
# - We intentionally allow a leading "<wrapper>|" prefix (e.g., "flash|...") because the code
# strips it before loading; '|' is not excluded in the character classes here.
#
# Examples that match:
# "org/model"
# "org/model@main"
# "org/model:custom_kernel"
# "org/model@v1.2.3:custom_kernel"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool ! Thanks

@MekkCyber MekkCyber requested a review from ArthurZucker August 11, 2025 08:26
Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, great call about the regexp @MekkCyber! Looks good!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice! THe doc would be better in https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L4494-L4522 but otherwise absolutely perfect )way better than the kernel kwargs I had in mind)

@drbh drbh merged commit 1cea763 into huggingface:main Aug 11, 2025
24 checks passed
tc-mb pushed a commit to tc-mb/transformers that referenced this pull request Aug 27, 2025
* unpin `torchcodec==0.5.0` and use `torch 2.8` on daily CI (#40072)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix: resolve triton version check compatibility on windows (#39986)

* fix: resolve triton version check compatibility on windows

* style: remove trailing space

* fix: fix typo

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* [qwen-vl] fix beam search with videos (#39726)

* fix

* fix copies

* [gemma3] update conversion key mapping (#39778)

update conversion key mapping

* fix: move super().__init__ after vision_config init in Mistral3Config (#40063)

fix: move super().__init__ after vision_config init in Mistral3Config (#40062)

* Remove deprecated cache-related objects (#40035)

remove them

* guard on model.eval when using torch.compile + FSDP2 (#37413)

guard on model.eval

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Fix repo consistency (#40077)

fix

* added Textnet fast image processor (#39884)

* feat: add fast image processor implementation for TextNet model

* chore: override to_dict method to TextNetImageProcessorFast for slow processor compatibility tests

* chore: update init method

* chore: coding and style checks

* chore: fixed code quality issue

* chore: override resize to handle size_divisor, move all preprocessing logic to child class

* fix: autoImageProcessor issue for textnet

* chore: cleanup

* simplify resize

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* Fix `time_spent ` in `notification_service.py`. (#40081)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* chore: standardize DeBERTa model card (#37409)

* chore: standardize DeBERTa model card

* Apply suggestions from code review in docs

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: Update deberta.md with code cleanup suggestions

* Update docs/source/en/model_doc/deberta.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/deberta.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update deberta.md

* Update deberta.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [`GPT Big Code`] Fix attention scaling (#40041)

* fix

* update integration tests

* fmt

* add regression test

* feat: extract rev in attn_implementation kernels via @ (#40009)

* feat: extract rev in attn_implementation kernels via @

* fix: adjust for ruff

* fix: update regex and add explanatory comment

* fix: move attn_implementation kernel doc

* fix: remove extra line

* Update notification service MI325 (#40078)

add mi325 to amd_daily_ci_workflows

* Fix PerceptionLM image preprocessing for non-tiled image input. (#40006)

* Fix PerceptionLM image preprocessing for non-tiled image input.

* Add test for single tile vanilla image processing.

* ruff format

* recover missing test skip

* Simplify test.

* minor test name fix

* Revert FA2 kwargs construction (#40029)

* revert

* use imports

* went way too high in imports level

* style

* [fix] batch inference for llava_onevision (#40021)

* [fix] llava onevision batch inference

* style

* cannot pass inconsistent list & handle text-only case

* [docs] Zero Shot Object Detection Task (#40096)

* refactor zsod task docs

* keeping the image guided od section

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update docs/source/en/tasks/zero_shot_object_detection.md

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

* Update Glm4V processor and add tests (#39988)

* update GLm4V and add tests

* Update tests/models/glm4v/test_processor_glm4v.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* remove min/max pixels for BC

* fix video tests

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Add glm4.5&&glm4.5V doc (#40095)

* Docs: GLM-4-MoE & GLM-4V-MoE pages

* Docs: polish GLM-4V-MoE intro, remove placeholders; pin image

* Docs

---------

Co-authored-by: wujiahan <lambert@gmail.com>

* Causal loss for `ForConditionalGeneration` (#39973)

* feat: add ForConditionalGeneration loss to LOSS_MAPPING

* consistent spelling of "recognized"

* Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock (#39743)

audio encodings now match conv weight dtype in Gemma3nAudioSSCPConvBlock

* New DynamicSlidingWindowLayer & associated Cache (#40039)

* start adding the layer

* style

* improve

* modular

* fix

* fix

* improve

* generate integration

* comment

* remove old one

* remove

* fix

* fix

* fix

* fix all recompiles

* fix

* doc

* fix

* add text config check

* fix encoderdecoder cache

* add it for all models with sliding/hybrid support

* revert

* start fixing

* prophetnet

* fsmt

* fix ddp_data

* add test for mistral

* improve mistral test and add gemma2 test

* docstrings

* Enable SIM rules (#39806)

* Enable SIM rules

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>

* feat: add `is_fast` to ImageProcessor (#39603)

* feat: add `is_fast` to ImageProcessor

* test_image_processing_common.py 업데이트

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* feat: add missing BaseImageProcessorFast import

* fix: `issubclass` for discriminating subclass of BaseImageProcessorFast

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* Re-apply make style (#40106)

make style

* Replace `logger.warning` with `logger.warning_once` in `GradientCheckpointingLayer` (#40091)

* Fix regression in mllama vision encoder (#40083)

fix mllama vision encoder

Signed-off-by: Isotr0py <2037008807@qq.com>

* Switch the order of args in StaticCache (for BC and future logic) (#40100)

* switch order for BC and future logic

* in generate as well

* Fix Qwen3 MoE GGUF architecture mismatch (#39976)

* fix qwen3moe gguf architecture

* Fix Qwen3Moe GGUF loading

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Jinuk Kim <jusjinuk@snu.ac.kr>

* Fix error on importing unavailable torch.distributed (#40038)

Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.

* Default to dequantize if cpu in device_map for mxfp4 (#39993)

* default to dq if cpu

* an other check

* style

* revert some changes

* [`Flash Attention`] Fix flash attention integration (#40002)

* fix flash attention

* i got a stroke reading that comment

* change dropout kwarg back to before

* rename _fa3... as it's used for multiple variants and should work as fallback instead

* simplify imports and support kwargs for fa

* style

* fix comments order

* small fix

* skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart

* style

* allow fullgraph by preloading on init

* make globals "private"

* ci pls be happy

* change skip conditions based on backend flag (indicating missing mask interface)

* move globals support to a function to prepare kwargs

* style

* generalize supported kwargs

* small change to doc

* fix

* add comments

* style

* revert prep during generate

* style

* revert weird style changes

* add fa kwarg prep during generate with fixes back

* how did this even happen

* how

* add comment

* [trainer] ensure special tokens in model configs are aligned with tokenizer at train time (#38441)

* tmp commit

* add test

* make fixup

* reset warns/info in test

* Fix Causality Handling in Flash Attention to Support Bidirectional Attention (#39707)

Fix the is_causal logic to enable bidirectional attention

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [docs] Add reference to HF-maintained `custom_generate` collections (#39894)

decoding -> generation; add collections

* Add model card for MobileViT (#40033)

* Add model card for MobileViT

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* remove sequence parallel in llama4 (#40084)

* 🌐 [i18n-KO] Translated `tiny_agents.md` to Korean (#39913)

* docs: ko: tiny_agents.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits

* [bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM (#39975)

* [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models

* to cuda

* changed xLSTMRMSNorm to RMSNorm (#40113)

* changed xLSTMRMS.. to RMS...

* fix linter error

---------

Co-authored-by: Nikita <nikita@Nikitas-MacBook-Pro.local>

* Fix QuantoQuantizedCache import issues (#40109)

* fix quantoquantized

* [serve] allow array `content` inputs for LLMs (#39829)

fix bug; add tests

* `decoding_method` argument in generate (#40085)

* factor out expand inputs

* callable arg

* improve docs, add test

* Update docs/source/en/generation_strategies.md

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Collated reports (#40080)

* Add initial collated reports script and job definition

* provide commit hash for this run. Also use hash in generated artifact name. Json formatting

* tidy

* Add option to upload collated reports to hf hub

* Add glob pattern for test report folders

* Fix glob

* Use machine_type as path filter instead of glob. Include machine_type in collated report

* DOCS: Add missing space in SECURITY.md (#40087)

* [trainer] handle case where EOS token is None in `generation_config` (#40127)

* handle case where EOS token is None in gen config

* update eli5 dataset

* Fix hidden torchvision>=0.15 dependency issue (#39928)

* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT

* fix min torchvision version

* use InterpolationMode directly

* remove unused is_torchvision_greater_or_equal,

* nit

* 🌐 [i18n-KO] Translated `main_classes/processors.md` to Korean (#39519)

* docs: ko: processors.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* 🌐 [i18n-KO] Translated `jamba.md` to Korean (#39890)

* docs: ko: jamba.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestion

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

---------

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `main_classes/optimizer_schedules.md` to Korean (#39713)

* docs: ko: main_classes/optimizer_schedules

* feat: nmt draft

* fix: improve TOC anchors and expressions in optimizer_schedules

- Add TOC anchors to all section headers
- Fix terminology and improve Korean expressions

* fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된'

Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization.

* fix: Use more natural Korean inheritance expression

Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology.

* fix: Use consistent '미세 조정' translation for 'finetuned models'

Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.

* 🚨🚨  [generate] ignore `cache_implementation="hybrid"` hub defaults (#40135)

* working?

* fix tests

* 🌐 [i18n-KO] Translated `gpt2.md` to Korean (#39808)

* docs: ko: bamba.md

* feat: nmt draft

* fix: manual edits

* docs: ko: gpt2.md

* feat: nmt draft

* fix: manual edits

* Remove bamba.md from docs/source/ko/model_doc/

* Update _toctree.yml

* 🌐 [i18n-KO] Translated `optimizers.md` to Korean (#40011)

* docs: ko: optimizers.md

* feat: optimizers draft

* fix: manual edits

* docs: ko: update optimizers.md

* Update docs/source/ko/optimizers.md

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* Update docs/source/ko/optimizers.md

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* Update docs/source/ko/optimizers.md

Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>

* docs: ko: final updates to optimizers and toctree

---------

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>
Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>

* 🌐 [i18n-KO] Translated grounding-dino.md to Korean (#39861)

* docs: ko: grounding-dino.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* docs: add AP explanation for better readability

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* 🚨 Use lru_cache for sine pos embeddings MaskFormer (#40007)

* use lru_cache for sine pos embeddings maskformer

* fix calls to pos embed

* change maxsize to 1

* 🌐 [i18n-KO] Translated `pipelines.md` to Korean (#39577)

* docs: ko: pipelines.md

* feat: gpt draft

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update _toctree.yml

* Update _toctree.yml

번역 문서 수정

* Update pipelines.md

ToC 수정

* Update pipelines.md

---------

Co-authored-by: xhaktm <tnwjd318@hs.ac.kr>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* gpt oss is important (#40139)

* Fix Janus (#40140)

fix

* Add Segment Anything 2 (SAM2) (#32317)

* initial comment

* test

* initial conversion for outline

* intermediate commit for configuration

* chore:init files for sam2

* adding arbitary undefined config

* check

* add vision

* make style

* init sam2 base model

* Fix imports

* Linting

* chore:sam to sam2 classes

* Linting

* Add sam2 to models.__init__

* chore:match prompt encoder with sam2 code

* chore:prepare kwargs for mask decoder

* Add image/video predictors

* Add CUDA kernel

* Add output classes

* linting

* Add logging info

* tmp commit

* docs for sam2

* enable image processing

* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize

* enable promptencoder of sam2

* fix promprencoder

* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)

* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)

* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)

* SamModel is now available (Need more chore for name)

* make fix-copies

* make style

* make CI happy

* Refactor VisionEncoder and PostioinEmbedding

* TO DO : fix the image_embeddings and sparse_embeddings part

* pure image inference done

* reusable features fix and make style

* styling

* refactor memoryattention

* tmp

* tmp

* refactor memoryencoder
TO DO : convert and inference the video pipeline

* TO DO : fix the image_encoder shape

* conversion finish
TO DO: need to check video inference

* make style

* remove video model

* lint

* change

* python utils/check_docstringspy --check_all

* python utils/check_config_attributes.py

* remove copies for sam2promptencoder due to configuration

* change __init__.py

* remove tensorflow version

* fix that to not use direct comparison

* make style

* add missing import

* fix image_embedding_size

* refactor Sam2 Attention

* add fully working video inference (refactoring todo)

* clarify _prepare_memory_conditioned_features

* simplify modeling code, remove unused paths

* use one model

* use auto_docstring

* refactor rope embeddings

* nit

* not using multimask when several points given

* add all sam2.1

* add video tmp

* add Sam2VideoSessionState + fast image proc + video proc

* remove init_states from model

* fix batch inference

* add image integration tests

* uniformize modeling code with other sam models and use modular

* pass vision tests an most model tests

* All tests passing

* add offloading inference state and video to cpu

* fix inference from image embedding and existing mask

* fix multi_boxes mask inference

* Fix batch images + batch boxes inference

* improve processing for image inference

* add support for mask generation pipeline

* add support for get_connected_components post processing in mask generation

* add fast image processor sam, image processor tests and use modular for sam2 image processor

* fix mistake in sam after #39120

* fix init weights

* refactor convert

* add integration tests for video + other improvements

* add needed missing docstrings

* Improve docstrings and

* improve inference speed by avoiding cuda sync

* add test

* skip test for vision_model

* minor fix for vision_model

* fix vision_model by adding sam2model and change the torch dependencies

* remove patch_size

* remove image_embedding_size

* fix patch_size

* fix test

* make style

* Separate hieradet and vision encoder in sam2

* fixup

* review changes part 1

* remove MemoryEncoderConfig and MemoryAttentionConfig

* pass q_stride instead of q_pool module

* add inference on streamed videos

* explicitely process streamed frames

* nit

* Improve docstrings in Sam2Model

* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel

* improve video inference api

* change inference_state to inference_session

* use modular for Sam2Model

* fix convert sam2 hf

* modular

* Update src/transformers/models/sam2/video_processing_sam2.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix minor config

* fix attention loading error

* update modeling tests to use hub checkpoints

* Use CI A10 runner for integration tests values + higher tolerance for video integration tests

* PR review part 1

* fix doc

* nit improvements

* enforce one input format for points, labels and boxes

* nit

* last few nits from PR review

* fix style

* fix the input type

* fix docs

* add sam2 model as conversion script

* improve sam2 doc

* nit fixes + optimization

* split sam2 and sam2_video in two models

* PR review part 1

* fix None for default slow processor of sam2

* remove unecessary code path in sam2_video

* refactor/simplify RoPE

* replace embedding module list with embedding matrix

* fix tests

* remove kernel

* nit

* use lru_cache for sine_pos_embeddings

* reorder sam2_video methods

* simplify sam2_video

* PR review part 1

* simplify sam2 video a lot

* more simplification

* update integration tests with updated conftest

* more explicit config for hieradet

* do post_processing outside of sam2 video model

* Improve Sam2VideoVisionRotaryEmbedding

* fix tests

* update docs and fix mask2former/oneformer

* avoid unnecessary reshapes/permute

* fix device concatenating points

* small dtype fix

* PR review

* nit

* fix style and finish up doc

* fix style

* fix docstrings

* fix modular

---------

Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* [docs] Fix ko toctree (#40138)

Update _toctree.yml

* Remove an old badly designed test (#40142)

remove it

* updated visualBERT modelcard (#40057)

* updated visualBERT modelcard

* fix: Review for VisualBERT card

* 🌐 [i18n-KO] Translated `gemma3.md` to Korean (#39865)

* docs: ko: gemma3.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions

---------

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* Fix quantized cache with only cache_implementation in generate (#40144)

* fix args

* comment

* Add pytest marker: `torch_compile_test` and `torch_export_test` (#39950)

* new marker

* trigger CI

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update Dockerfiles to install packages inside a virtual environment (#39098)

* Removed un-necessary virtual environment creation in Dockerfiles.

* Updated Dockerfiles to install packages in a virtual environment.

* use venv's python

* update

* build and trigger

* trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Create self-scheduled-amd-mi355-caller.yml (#40134)

* [Cohere2Vision] remove unused arg (#40103)

* remove unused arg

* remove the arg from test as well

* [efficientloftr] fix bugs and follow original cross attn implementation strictly (#40141)

* fix: changed is_causal to be False

* fix: Added original cross attention bug

* fix: fixed the way bordel removal is computed

* fix: added missing normalization on coarse features

* test: fixed integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix CI: Use correct import in SAM for torchvision InterpolationMode (#40160)

fix ci

* [Continous Batching] set head_dim when config.head_dim is None (#40159)

* set head_dim when config.head_dim is None

* use model's actual TP setting

* Replace `self.tokenizer` by `self.processing_class` (#40119)

* [FA2] Fix it finally - revert fa kwargs preparation (#40161)

revert

* [bugfix] fix flash-attention2 unavailable error for Ascend NPU (#40151)

* [bugfix] fix flash-attention2 unavailable error for Ascend NPU

* remove redundant apply_rotary_emb usage

* fix ruff check error

* pad_input and unpad_input use same implementation as fa2

* rollback redundant codes

* fix ruff check error

* optimize fa2 judgement logic

* Fix docs typo (#40167)

* DINOv3 model

* working version

* linter revert

* linter revert

* linter revert

* fix init

* remove flex and add convert to hf script

* DINOv3 convnext

* working version of convnext

* adding to auto

* Dinov3 -> DINOv3

* PR feedback

* complete convert checkpoint

* fix assertion

* bf16 -> fp32

* add fast image processor

* fixup

* change conversion script

* Use Pixtral attention

* minor renaming

* simplify intermediates capturing

* refactor DINOv3ViTPatchEmbeddings

* Refactor DINOv3ViTEmbeddings

* [WIP] rope: remove unused params

* [WIP] rope: rename period -> inv_freq for consistency

* [WIP] rope: move augs

* change inv_freq init (not persistent anymore)

* [WIP] rope: move coords to init

* rope - done!

* use default LayerScale

* conversion: truncate expected outputs

* remove commented code

* Refactor MLP layers

* nit

* clean up config params

* nit docs

* simplify embeddings

* simplify compile compat lru_cache

* fixup

* dynamic patch coords

* move augmentation

* Fix docs

* fixup and type hints

* fix output capturing

* fix tests

* fixup

* fix auto mappings

* Add draft docs

* fix dtype cast issue

* add push to hub

* add image processor tests

* fixup

* add modular

* update modular

* convert and test convnext

* update conversion script

* update prefix

* Update LayerNorm

* refactor DINOv3ConvNextLayer

* rename

* refactor convnext model

* fix doc check

* fix docs

* fix convnext config

* tmp fix for check docstring

* remove unused arg

* fix tests

* (nit) change init

* standardize gated MLP

* clear namings and sat493m

* fix tensors on different devices

* revert linter

* pr

* pr feedbak ruff format

* missing headers

* fix code snippet and collection link in docs

* DINOv3 description

* fix checkpoints in tests

* not doc fixes in configs

* output_hidden_states

* x -> features

* remove sequential

---------

Co-authored-by: Cijo Jose <cijose@meta.com>

* build: Add fast image processor tvp (#39529)

* build: add TvpImageProcessorFast

- Introduced TvpImageProcessorFast to enhance image processing capabilities.
- Updated image processing auto registration to include the new fast processor.
- Modified tests to accommodate both TvpImageProcessor and TvpImageProcessorFast, ensuring comprehensive coverage for both classes.

* fix: TvpImageProcessorFast with new resize method and update processing logic

* build: add TvpImageProcessorFast

* refactor: clean up whitespace and formatting in TvpImageProcessorFast and related tests

- Removed unnecessary whitespace and ensured consistent formatting in image_processing_tvp_fast.py.
- Updated import order in test_image_processing_tvp.py for clarity.
- Minor adjustments to maintain code readability and consistency.

* fix: Enhance TvpFastImageProcessorKwargs and update documentation

- Added TvpFastImageProcessorKwargs class to define valid kwargs for TvpImageProcessorFast.
- Updated the documentation in tvp.md to include the new class and its parameters.
- Refined the image processing logic in image_processing_tvp_fast.py for better handling of padding and resizing.
- Improved test cases in test_image_processing_tvp.py to ensure compatibility with the new processing logic and tensor inputs.

* fix: tested now with python 3.9

* fix: remove tvp kwargs from docs

* simplify processing

* remove import and fix tests

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* Add GptOssForSequenceClassification for GPT-OSS models (#40043)

* Add GptOssForSequenceClassification

* Tiny fix

* make fixup

* trigger CI rerun

* Check config type instead

---------

Co-authored-by: Yuefeng Zhan <yuefzh@microsoft.com>

* Standardize BARTpho model card: badges, new examples, fixed broken im… (#40051)

* Standardize BARTpho model card: badges, new examples, fixed broken image section, and links (#36979)Update bartpho.md

* Update bartpho.md

Removed non-required/unsupported sections: Quantization, Attention visualizer, and Resources (plus stray tokenizer header).

Added code snippets which were suggested

* Update bartpho.md

Updated with necessary tags

* Update bartpho.md

* Update bartpho.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Add dates to the model docs (#39320)

* added dates to the models with a single hf papers link

* added the dates for models with multiple papers

* half of no_papers models done

* rest of no_papers models also done, only the exceptions left

* added copyright disclaimer to sam_hw, cohere, cohere2 + dates

* some more fixes, hf links + typo

* some new models + a rough script

* the script looks robust, changed all paper links to hf

* minor change to handle technical reports along with blogs

* ran make fixup to remove the white space

* refactor

* Pin torch to 2.7.1 on CircleCI for now (#40174)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update dynamic attnt setter for multimodals (#39908)

* update

* fix the test for DepthPro

* PR comments

* wait, I didn't delete this in prev commit?

* fix

* better way

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* [MINOR:TYPO] Update base.py (#40169)

* [MINOR:TYPO] Update base.py

All other occurrences in the docs use lowercase. (https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20translation_XX_to_YY&type=code)

Also, using uppercase doesn't work: tested with "translation_EN_to_FR" which doesn't work and instead returns:  `ValueError: The task does not provide any default models for options ('EN', 'FR')`

It might be a good idea to allow for uppercase, but that's for another issue.

* [MINOR:TYPO] Update __init__.py

* make model doc device agnostic (#40143)

* make model doc device agnostic

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update align.md

* Update aya_vision.md

* Update byt5.md

* refine

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update granitevision.md

* Update src/transformers/pytorch_utils.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add doc

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* 3 more

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix to avoid modifying a view in place (#40162)

* fix to avoid modifying a view in place

* add backward test in tensor parallel

* add test to test_modelig_gpt_oss.py

* linting

* Fix fsdp for generic-task models (#40191)

* remove abc inheritance

* add fast test

* Add repr to EncoderDecoderCache (#40195)

* add repr

* oups

* Fix typos (#40175)

Signed-off-by: cyy <cyyever@outlook.com>

* Remove _prepare_flash_attention_from_position_ids (#40069)

Signed-off-by: cyy <cyyever@outlook.com>

* Avoid CUDA stream sync (#40060)

Signed-off-by: cyy <cyyever@outlook.com>

* Fix various Pylint warnings (#40107)

Tidy code

Signed-off-by: cyy <cyyever@outlook.com>

* Update: add type hints to check_tokenizers.py (#40094)

* Update check_tokenizers.py

chore(typing): add type hints to check_tokenizers script

- Annotate params/returns for helper functions
- Keep tokenizer instances as `Any` to avoid runtime coupling
- Make `check_LTR_mark` return `bool` explicitly (no behavior change)

* Update check_tokenizers.py

chore(typing): replace Any with PreTrainedTokenizerBase in check_tokenizers.py

- Use transformers.tokenization_utils_base.PreTrainedTokenizerBase for `slow` and `fast` params
- Covers both PreTrainedTokenizer and PreTrainedTokenizerFast
- Exposes required methods (encode, decode, encode_plus, tokenize)
- Removes generic Any typing while staying implementation-agnostic

* Benchmarking improvements (#39768)

* Start revamping benchmarking

* Start refactoring benchmarking

* Use Pandas for CSV

* import fix

* Remove benchmark files

* Remove sample data

* Address review comments

* Add X-Codec model (#38248)

* add working x-codec

* nit

* fix styling + copies

* fix docstring

* fix docstring and config attribute

* Update args + config

* update convertion script

* update docs + cleanup

* Ruff fix

* fix doctrings

* Fix GPT-OSS `swiglu_limit` not passed in for MXFP4 (#40197)

Add swiglu_limit = 7.0

* docs: Update LayoutLM model card according to new standardized format (#40129)

* docs: Update LayoutLM model card with standardized format

* Apply suggestions from code review

This commit incorporates all suggestions provided in the recent review. Further changes will be committed separately to address remaining comments.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Address remaining review comments

* Address few more review comments:
1. remove transformer-cli section
2. put resources after notes
3. change API refs to 2nd level header

* Update layoutlm.md

* Update layoutlm.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Revert "Pin torch to 2.7.1 on CircleCI for now" + Final fix for `too long with no output` (#40201)

* Revert "Pin torch to 2.7.1 on CircleCI for now (#40174)"

This reverts commit 31b6e6e1dac0d32f74ec5cd6b3c1868534ccd7b5.

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Use correct `model_input_names` for PixtralImageProcessor (#40226)

add image_sizes to model_input_names

* fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function (#40130)

* fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* fix similar errer at qwen2_vl and do make fix-copies

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* pass in kwargs for loss_func at qwen2_vl and qwen2_5_vl

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* Apply style fixes

---------

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* [SAM 2] Change checkpoints in docs and tests (#40213)

* change checkpoints in docs and tests

* add notebook

* Fix more typos (#40212)

Signed-off-by: cyy <cyyever@outlook.com>

* Fix ESM token_dropout crash when using inputs_embeds instead of input_ids (#40181)

* fix: Error after calling ESM model with input embeddings not input ids

* propagate changes to other models

* AMD scheduled CI ref env file (#40243)

* Reference env-file to be used in docker running the CI

* Disable MI300 CI for now

* Add Ovis2 model and processor implementation (#37088)

* Add Ovis2 model and processor implementation

* Apply style fixes

* Add unit tests for Ovis2 image processing and processor

* Refactor image processing functions for clarity and efficiency

* Add Ovis2 ImageProcessorFast

* Refactor Ovis2 code

* Refactor Ovis2 model components and update processor functionality

* Fix repo consistency issues for Ovis2: docstring, config cleanup

* Update Ovis2 model integration tests

* Update Ovis2 configuration and processing classes for improved documentation

* Remove duplicate entry for 'ovis2' in VLM_CLASS_NAMES

* Fix conflict

* Fix import order

* Update image processor class names

* Update Ovis2 model structure

* Refactor Ovis2 configuration

* Fix typos

* Refactor Ovis2 model classes and remove unused code

* Fix typos

* Refactor Ovis2 model initialization

* Fiix typos

* Remove Ovis2 model mapping from MODEL_MAPPING_NAMES in modeling_auto.py

* Add license and update type hints

* Refactor token function and update docstring handling

* Add license

* Add Ovis2 model support and update documentation

* Refactor Ovis2 model structure and enhance multimodal capabilities

* Update Ovis2 weight mapping for consistency and clarity in key patterns

* Remove unused 'grids' parameter from Ovis2 model and Update processing logic to handle image grids more efficiently.

* Refactor Ovis2 model test structure to include Ovis2Model

* Add optional disable_grouping param to Ovis2ImageProcessorFast

* Refactor type hints in Ovis2 modules

* Add licensing information in Ovis2 modules and tests

* Refactor Ovis2 model by removing unused methods

* Refactor Ovis2 model tests by renaming test classes and removing skipped tests

* Refactor Ovis2 model output classes

* Refactor Ovis2 weight conversion and Update model embedding classes

* Refactor Ovis2 model imports and remove unused functions

* Enhance vision configuration extraction in Ovis2 weight conversion

* Refactor Ovis2 model's forward method to remove interpolation option

* Update Ovis2 model documentation

* Refactor Ovis2 model input handling and tokenizer configuration

* Update return type hints in Ovis2 model

* Remove commented-out code

* fix config for tests and remove key mappings

* Update tokenizer configuration to use add_special_tokens method

* skip torchscript

* Fix image placeholder generation in Ovis2Processor

* Refactor Ovis2 model to rename visual_table to visual_embeddings_table

* Enhance Ovis2 model by adding vision_feature_select_strategy parameter

* Refactor Ovis2 model weights conversion and architecture

* Refactor Ovis2 model by removing vision_feature_select_strategy parameter

* Update Ovis2 model examples

* Refactor Ovis2 model

* Update Ovis2 model

* Update Ovis2 model configuration

* Refactor Ovis2 model test setup

* Refactor flash attention support

* Refactor

* Fix typo

* Refactor

* Refactor model classes

* Update expected output in Ovis2

* Refactor docstrings

* Fix

* Fix

* Fix

* Update input in tests

* Fix

* Fix get_decoder method

* Refactor

* Refactor Ovis2

* Fix

* Fix

* Fix test

* Add get_placeholder_mask

* Refactor Ovis2 model tests

* Fix

* Refactor

* Fix

* Fix

* Fix Ovis2 test

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fix more pylint warnings (#40204)

Fix pylint warnings

Signed-off-by: cyy <cyyever@outlook.com>

* 🚨 Always return Cache objects in modelings (to align with generate) (#39765)

* watch the world burn

* fix models, pipelines

* make the error a warning

* remove kwargs and return_legacy_cache

* fix reformer

* remove transpose_for_scores call in ESM-2 (#40210)

* remove transpose_for_scores call

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* fix copied evolla code

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* Add `chat_template` (`jinja2`) as an extra dependency (#40128)

* add jinja2 as a dependency

* Make jinja2 a core dependency in install_requires

- Add jinja2 to install_requires list in setup.py for automatic installation
- Add jinja2 to runtime version checks in dependency_versions_check.py
- Resolves issue where pip install transformers doesn't install jinja2

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Make jinja2 a core dependency in install_requires

* Make jinja2 an extra dependency instead of adding a core dep

---------

Co-authored-by: Claude <noreply@anthropic.com>

* [typing] fix type annotation error in DepthPro model image processor (#40238)

* fix type annotation error in DepthPro model image processor

* fix

* run make fix-copies

* [serve] guard imports (#39825)

guard imports

* [`CI`] Fix repo consistency (#40249)

* fix

* doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fixes for EncoderDecoderCache (#40008)

* Add expectation to t5 for rocm 9.4

* Made EncoderDecoderCache compatible with nn.DataParallel

* Fixed t5gemma EncoderDecoderCache

* Added todos in autoformer

* Ruff

* Init is self-contained

* Review compliance

* Fixed kwargs init of EncoderDecoderCache

* fix: Catch correct ConnectionError for additional_chat_templates (#39874)

* fix: Catch correct ConnectionError for additional_chat_templates

* fix: don't catch timeout

* fix: formatting

* Model card for NLLB (#40074)

* initializing branch and draft PR

* updated model card .md file

* minor

* minor

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* resolving comments + adding visuals

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* NllbTokenizerFast and NllbTokenizer added

* endline

* minor

* Update nllb.md

---------

Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Correct typo and update notes in docs Readme (#40234)

* Correct typo and update notes in docs readme

* Update docs/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix benchmark workflow (#40254)

Correct init_db.sql path

Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>

* docs: Update OLMo model card (#40233)

* Updated OLMo model card

* Update OLMo description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix cli typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix cli example

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Add bitsandbytes info

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Skip broken tests (#40157)

skip these tests

* Remove MI300 CI (#40270)

Remove MI300 CI (in history if we need it back)

* set inputs_embeds to None while generate to avoid audio encoder forward in generation process (#40248)

* set inputs_embeds to None while generate to avoid audio encoder forward in generation process

* set input_features to none instead

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>

* [detection] fix attention mask for RT-DETR-based models (#40269)

* Fix get_contrastive_denoising_training_group attention

* Add bool attention_mask conversion

* Fix slow static cache export tests (#40261)

* 🚨🚨 Switch default compilation to fullgraph=False (#40137)

* switch default

* docstring

* docstring

* rework tests and remove outdated restrictions

* simplify

* we need a check for static cache

* fix

* rename var

* fix

* revert

* style

* rename test

* Fix setting attention for multimodal models (#39984)

* fix

* use non-explicit `None`

* keep previously set attn if exists

* [detection] fix correct `k_proj` weight and bias slicing in D-FINE (#40257)

Fix: correct k_proj weight and bias conversion in D-FINE

* Add Kosmos-2.5 (#31711)

Add Microsoft Kosmos-2.5

---------

Co-authored-by: kirp@umich.edu <tic-top>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Skipping pytree registration in case fsdp is enabled (#40075)

* Skipping pytree registration in case fsdp is enabled

* Beauty changes

* Beauty changes

* Moved the is_fsdp_available function to import utils

* Moved is_fsdp_available to integrations.fsdp

* Skipping pytree registration in case fsdp is enabled

* Beauty changes

* Beauty changes

* Moved the is_fsdp_available function to import utils

* Moved is_fsdp_available to integrations.fsdp

* Added pytree registration inside dynamic cache class

* Making ci/cd lords happy

* Adding a check if DynamicCache is already a leaf

* Adding try/catch for multiple initializations of DynamicCache in test suites

* Moving dynamic cache pytree registration to executorch

* Adding try catch back

* Update image_processing_perception_lm_fast.py to allow for proper override of vision_input_type (#40252)

* Update image_processing_perception_lm_fast.py

Allow for a proper override of vision_input_type in hf fast image processor, otherwise we need to resort to manually setting the attribute.

* Update processing_perception_lm.py to match kwargs vision input type

* Update image_processing_perception_lm_fast.py kwargs to signature args

* fix which routing method (#40283)

* Fix chat CLI GPU loading and request_id validation issues (#40230) (#40232)

* Fix chat CLI GPU loading and request_id validation issues (#40230)

This commit addresses two critical bugs in the transformers chat CLI:

1. **GPU Loading Issue**: Changed default device from "cpu" to "auto" in ChatArguments
   - Chat CLI now automatically uses GPU when available instead of defaulting to CPU
   - Matches the behavior of the underlying serving infrastructure

2. **Request ID Validation Error**: Added request_id field to TransformersCompletionCreateParamsStreaming schema
   - Fixes "Unexpected keys in the request: {'request_id'}" error on second message
   - Allows request_id to be properly sent and validated by the server

Both fixes target the exact root causes identified in issue #40230:
- Users will now get GPU acceleration by default when available
- Chat sessions will no longer break after the second message

* Remove unrelated request_id field from TransformersCompletionCreateParamsStreaming

* docs(layoutlm): add missing `id=usage` to `<hfoptions>` tag in LayoutLM model card (#40273)

docs(layoutlm): add missing 'id=usage' to <hfoptions> tag in LayoutLM model card

* Standardize RAG model card (#40222)

* Standardize RAG model card

Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added pipeline and AutoModel usage examples
- Included quantization example with BitsAndBytesConfig
- Added notes and resources sections
- Removed abstract and FlashAttention badge

* Standardize RAG model card

Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added AutoModel usage example
- Included quantization example with BitsAndBytesConfig

* docs: Update TrOCR model card to new format (#40240)

* docs: Update TrOCR model card to new format

* Updated Sugegestions

* Update model card for gpt neox japanese (#39862)

* Update GPT-NeoX-Japanese model card

* Apply suggestions from code review

* Update gpt_neox_japanese.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* SmolVLM and InternVL: Ensure pixel values are converted to the correct dtype for fp16/bf16 (#40121)

* Ensure pixel values are converted to the correct dtype for fp16/bf16

* add to modular

* Standardize BertGeneration model card (#40250)

* Standardize BertGeneration model card: new format, usage examples, quantization

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply reviewer feedback: update code examples

* Add missing code example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Adjust ROCm test output expectations (#40279)

Adjust ROCm output expectations

* SmolVLM test fixes (#40275)

* Fix SmolVLM tests

* Add the proper CUDA expectations as well

* Split 'A10 and A100 expectations

* Ruff

---------

Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>

* make model docs device agnostic (2) (#40256)

* doc cont.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* more models

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mixtral.md

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [3/3] make docs device agnostic, all en docs for existing models done  (#40298)

docs to device agnostic cont.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Add MetaCLIP 2 (#39826)

* First draft

* Make fixup

* Use eos_token_id

* Improve tests

* Update clip

* Make fixup

* Fix processor tests

* Add conversion script

* Update docs

* Update tokenization_auto

* Make fixup

* Use check_model_inputs

* Rename to lowercase

* Undo CLIP changes

* Address comment

* Convert all checkpoints

* Update auto files

* Rename checkpoints

* Allow to be able to run `torch.compile` tests with `fullgraph=True` (#40164)

* fix

* address comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`FA`] Fix dtype in varlen with position ids (#40295)

fix

* [docs] delete more TF/Flax docs (#40289)

* delete some TF docs

* update documentation checks to ignore tf/flax

* a few more removals

* nit

* Update utils/check_repo.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clean up X-Codec. (#40271)

* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.

* Remove OTel SDK dependencies (#40305)

* Fix GOT-OCR2 and Cohere2Vision image processor patches caculation (#40312)

fix got-ocr patches caculation

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* [`fix`] Pass adamw optimizer parameters to StableAdamW (#40184)

* fix: pass adamw optimizer parameters to StableAdamW

* add test for stable_adamw initialization with trainer arguments

* address copilot suggestion

* fix: update weight_decay handling in stable_adamw kwargs

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* chore: fix typo in `find_executable_batch_size` to match new 0.9 ratio (#40206)

* :rotating_light: [`Flash Attention`] Fix sliding window size (#40163)

* swa fix

* add comment, make fix symmetrical

* modify fa inference test to force swa correctness check

* fixup comment

* Remove unnecessary contiguous calls for modern torch (#40315)

* Add support for Florence-2 (#38188)

* init

* add modular

* fixup

* update configuration

* add processing file

* update auto files

* update

* update modular

* green setup_and_quality ci

* it works

* fix some tests

* commit florence2

* update test

* make test cases done - 16 left

* style

* fix few test cases

* fix some tests

* fix init test

* update florence2 vision style

* hope is green

* fix init test

* fix init

* update modular

* refactor vision module

* fix: channel attention use dynamic scale

* update modular

* update

* update attention mask

* update

* fix naming

* Update src/transformers/models/florence2/processing_florence2.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* spatial block works

* more beautiful

* more more beautiful

* merge main

* merge main and fixup

* fix typing hint

* update modeling

* fix eager matches sdpa

* fix style

* fix compile test - all green

* remove florence2 language

* remove Florence2LanguageModel things

* fix style

* update florence2 model

* override prepare encoder_decoder for generation

* add weight conversion script

* rewrite channel attention to use sdpa

* eleminate 1 tranpose op

* support fa2

* fix quality check

* chore: reformat `test_modeling_florence2.py`

* some refactor for processor

* some refactor for processor

* update naming convention and remove BC

* make it pass the test

* fix: correct Embedding Cosine

* update comments and docstring

* support input_embeds

* support input embeds ideally

* fix style

* fix style

* fix style again :D

* add test prcoessor

* refactor processor and add test for processor

* reformat test processor

* make fixup

* fix schema check

* remove image_token

* ensure image token in tokenizer and fix integration tests

* fix processor test

* add more integration tests for large model and rename test_processor to test_processing

* test_assisted_decoding_sample should pass

* update doc and make model work with image text to text pipeline

* docs: add sdpa bagde

* resolve cyril's comments

* fix import torch error

* add helper get_placeholder_mask

* inherit from llava

* florence2 may not _supports_attention_backend because of bart ...

* move florence2 model card to multimodal

* let base model always return_dict

* fix style

* tiny update doc

* set   _checkpoint_conversion_mapping = {}

* fix code quality

* support flex and compile graph and move external func to internal func

* remove condition because it always true

* remove window funcs

* move post processor config out

* fix ci

* new intro to trigger test

* remove `kernel_size` argument

---------

Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Qwen2.5-Omni test fixes (#40307)

Updated expectations, and mp tests

* Add back `_tp_plan` attribute (#39944)

* Update modeling_utils.py

* make sure we update with the module's plan

* use public api

* oups

* update

* fix failing test

* Update src/transformers/integrations/tensor_parallel.py

* Update src/transformers/integrations/tensor_parallel.py

* fix

* make the API more friendly!

* fix tests

* fix styling

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* byebye torch 2.1 (#40317)

* Bump minimum torch version to 2.2

* Remove is_torch_greater_or_equal_than_2_2

* update versions table

* Deprecate is_torch_sdpa_available (except for backward compat), remove require_torch_sdpa

* No more `natten` (#40287)

get rid off natten

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`GPT OSS`] Refactor the tests as it was not properly checking the outputs (#40288)

* it was long due!

* use the official kernel

* more permissive

* update the kernel as well

* mmm should it be this?

* up pu

* fixup

* Update test_modeling_gpt_oss.py

* style

* start with 20b

* Update CI with nightly torch workflow file (#40306)

* fix nightly ci

* Apply suggestions from code review

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>

* Fix: Apply `get_placeholder_mask` in Ovis2 (#40280)

* Refactor special image mask

* Refactor get_placeholder_mask method

* Revert "Refactor special image mask"

This reverts commit 9eb1828ae930329656d6f323a510c5e6033e1f85.

* Fix

* Revert "Refactor get_placeholder_mask method"

This reverts commit 07aad6484bb08d6351d5b605e9db574d28edcd15.

* Update notification service amd_daily_ci_workflows definition (#40314)

* One cache class to rule them all (#40276)

* remove all classes

* fix generate

* start replacing everywhere

* finish removing everywhere

* typo

* typo

* fix

* typo

* remove num_layers=1

* CI

* fix all docstrings

* review

* style

* Fix chunked attention mask with left-padding (#40324)

* add fix

* add test

* raise proper warning for older versions

* fix

* fix and add 2nd test

* fix for flex and torch 2.5

* [docs] remove flax references from `/en/model_doc` (#40311)

* 1st commit

* all models up to D

* all models up to G

* all models up to M

* all remaining models

* Fix qwen-omni processor text only mode (#40336)

* Fix qwen-omni processor text only mode

* remove try except

---------

Co-authored-by: yuekaiz <yuekaiz@mgmt1-login.cm.cluster>

* Change Qwen2RMSNorm to RMSNorm from PyTorch (#40066)

* Unify Qwen2RMSNorm definitions and use RMSNorm from PyTorch

Signed-off-by: cyy <cyyever@outlook.com>

* subclass RMSNorm

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>

* Add DeepseekV3ForSequenceClassification for Deepseek V3 models (#40200)

* Add Sequence Classification Support for Deepseek v3 model DeepseekV3ForSequenceClassification

* After run make fixup

* Fix deprecation warning version (#40343)

fix

* Add missing arguments to class constructors (#40068)

* Add missing arguments

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typos

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>

* [docs] remove TF references from `/en/model_doc` (#40344)

* models up to F

* models up to M

* all models

* Fix: Only call Trainer.align_special_tokens if model has "config" attribute (#40322)

* Only call Trainer.align_special_tokens if model has "config" attribute

* Add efficient test for training a model without model.config

* Reformat

* add type hints (#40319)

* add basic type hints to import module

* run make fixup

* remove optional

* fixes

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Fix an infinite loop bug in recursive search of relative imports (#40326)

Fix bug in recursive search of relative imports

* Fix links in Glm4vMoe configuration classes to point to the correct H… (#40310)

* Fix links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository

* run fixup to update links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository

* T5 test and target device fixes (#40313)

* Fix cache setup related issues

* Fix target-device-related issues

* Ruff

* Address review comments

* Update `test_spm_converter_bytefallback_warning` (#40284)

fff

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* (small) fix conditional for input_ids and input_embeds in marian (#40045)

* (small) fix conditional for input_ids and input_embeds in marian

* address comment

* Fix attention vizualizer (#40285)

* make visualizer rely on create causal mask

* format

* fixup

* fixup

* read token

* read token, duh

* what is up with that token

* small tests?

* adjust

* try with flush

* normalize for ANSI

* buffer shenanigans

* [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification (#35991)

* [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification

* fix the modular conversion

* Clean up XCodec and other codecs (#40348)

* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.

* Polish XCodec and standardize across codecs.

* Update src/transformers/models/xcodec/modeling_xcodec.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Format and fix test.

* Update tol.

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* [serve] add cors warnings (#40112)

* add cors warnings

* Update src/transformers/commands/serving.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/commands/serving.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* make fixup

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [detection] use consistent dtype for Conditional and DAB DETR positional embeddings (#40300)

fix: use consistent dtype for sine positional embeddings

* Remove more PyTorch 2.2 compatible code (#40337)

Signed-off-by: cyy <cyyever@outlook.com>

* [`FA`] Fix some model tests (#40350)

* fix

* cleanup, revert aimv2 fa changes

* fix aria

* i searched a long time but the cross dependency is for the recent models so...

* this was something... evolla

* fix modernbert decoder + make fa test more robust

* nit

* Qwen2.5-VL test fixes for ROCm (#40308)

* [generate] handle support for cache classes when num enc layers != num dec layers (#40277)

* handle support for cache classes when num enc layers != num dec layers

* handle overwrites

* one more corner case

* Update src/transformers/generation/utils.py

* Update src/transformers/generation/utils.py

* Apply suggestions from code review

* handle corner case :o

* [4/N]more docs to device agnostic (#40355)

* more docs to device agnostic

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* 1

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* 2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update vitpose.md

* Update camembert.md

* Update camembert.md

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* DOCS: Clarification on the use of `label_names` as an argument to TrainingArguments (#40353)

* Update trainer.md

* Update trainer.md

Removed the detail about label_names argument usage from the tip/ warning section

* Update training_args.py

Added the label_names usage clarification in the docstring

* Update trainer.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* HunYuan opensource (#39606)

* merge opensource_hunyuan

* add head_dim

* fix assertion error

* fix seen_tokens

* ready_for_upstream (merge request !17)

Squash merge branch 'ready_for_upstream' into 'main'

* fix configuration type&docstring
* fix style

* ready_for_upstream (merge request !18)

Squash merge branch 'ready_for_upstream' into 'main'
* add doc
* fix testcode
* fix configuration type&docstring

* rename base model

* remove assert

* update

* remove tiktoken

* update

* fix moe and code style (#3)

* update

* fix format

* update

* revert makefile

* fix moe config

* fix numel()

* remove prepare_inputs_for_generation

* fix kv_seq_len

* add docs/toctree

* remove unused paramter&add licence

* add licence

* remove unused paramter

* fix code

* dense modular

update import

fix

fix

use mistralmodel

fix qknorm

add sliding_window

make style

fix

dense done

hunyuan moe

fix import

fix modular

fixup

fixup

* update model path

* fix mlp_bias

* fix modular

* Fix modeling (#5)
…
Guo-Chenxu added a commit to Guo-Chenxu/transformers that referenced this pull request Aug 27, 2025
* unpin `torchcodec==0.5.0` and use `torch 2.8` on daily CI (#40072)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix: resolve triton version check compatibility on windows (#39986)

* fix: resolve triton version check compatibility on windows

* style: remove trailing space

* fix: fix typo

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* [qwen-vl] fix beam search with videos (#39726)

* fix

* fix copies

* [gemma3] update conversion key mapping (#39778)

update conversion key mapping

* fix: move super().__init__ after vision_config init in Mistral3Config (#40063)

fix: move super().__init__ after vision_config init in Mistral3Config (#40062)

* Remove deprecated cache-related objects (#40035)

remove them

* guard on model.eval when using torch.compile + FSDP2 (#37413)

guard on model.eval

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Fix repo consistency (#40077)

fix

* added Textnet fast image processor (#39884)

* feat: add fast image processor implementation for TextNet model

* chore: override to_dict method to TextNetImageProcessorFast for slow processor compatibility tests

* chore: update init method

* chore: coding and style checks

* chore: fixed code quality issue

* chore: override resize to handle size_divisor, move all preprocessing logic to child class

* fix: autoImageProcessor issue for textnet

* chore: cleanup

* simplify resize

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* Fix `time_spent ` in `notification_service.py`. (#40081)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* chore: standardize DeBERTa model card (#37409)

* chore: standardize DeBERTa model card

* Apply suggestions from code review in docs

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: Update deberta.md with code cleanup suggestions

* Update docs/source/en/model_doc/deberta.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/deberta.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update deberta.md

* Update deberta.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [`GPT Big Code`] Fix attention scaling (#40041)

* fix

* update integration tests

* fmt

* add regression test

* feat: extract rev in attn_implementation kernels via @ (#40009)

* feat: extract rev in attn_implementation kernels via @

* fix: adjust for ruff

* fix: update regex and add explanatory comment

* fix: move attn_implementation kernel doc

* fix: remove extra line

* Update notification service MI325 (#40078)

add mi325 to amd_daily_ci_workflows

* Fix PerceptionLM image preprocessing for non-tiled image input. (#40006)

* Fix PerceptionLM image preprocessing for non-tiled image input.

* Add test for single tile vanilla image processing.

* ruff format

* recover missing test skip

* Simplify test.

* minor test name fix

* Revert FA2 kwargs construction (#40029)

* revert

* use imports

* went way too high in imports level

* style

* [fix] batch inference for llava_onevision (#40021)

* [fix] llava onevision batch inference

* style

* cannot pass inconsistent list & handle text-only case

* [docs] Zero Shot Object Detection Task (#40096)

* refactor zsod task docs

* keeping the image guided od section

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update docs/source/en/tasks/zero_shot_object_detection.md

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

* Update Glm4V processor and add tests (#39988)

* update GLm4V and add tests

* Update tests/models/glm4v/test_processor_glm4v.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* remove min/max pixels for BC

* fix video tests

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Add glm4.5&&glm4.5V doc (#40095)

* Docs: GLM-4-MoE & GLM-4V-MoE pages

* Docs: polish GLM-4V-MoE intro, remove placeholders; pin image

* Docs

---------

Co-authored-by: wujiahan <lambert@gmail.com>

* Causal loss for `ForConditionalGeneration` (#39973)

* feat: add ForConditionalGeneration loss to LOSS_MAPPING

* consistent spelling of "recognized"

* Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock (#39743)

audio encodings now match conv weight dtype in Gemma3nAudioSSCPConvBlock

* New DynamicSlidingWindowLayer & associated Cache (#40039)

* start adding the layer

* style

* improve

* modular

* fix

* fix

* improve

* generate integration

* comment

* remove old one

* remove

* fix

* fix

* fix

* fix all recompiles

* fix

* doc

* fix

* add text config check

* fix encoderdecoder cache

* add it for all models with sliding/hybrid support

* revert

* start fixing

* prophetnet

* fsmt

* fix ddp_data

* add test for mistral

* improve mistral test and add gemma2 test

* docstrings

* Enable SIM rules (#39806)

* Enable SIM rules

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>

* feat: add `is_fast` to ImageProcessor (#39603)

* feat: add `is_fast` to ImageProcessor

* test_image_processing_common.py 업데이트

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* feat: add missing BaseImageProcessorFast import

* fix: `issubclass` for discriminating subclass of BaseImageProcessorFast

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* Re-apply make style (#40106)

make style

* Replace `logger.warning` with `logger.warning_once` in `GradientCheckpointingLayer` (#40091)

* Fix regression in mllama vision encoder (#40083)

fix mllama vision encoder

Signed-off-by: Isotr0py <2037008807@qq.com>

* Switch the order of args in StaticCache (for BC and future logic) (#40100)

* switch order for BC and future logic

* in generate as well

* Fix Qwen3 MoE GGUF architecture mismatch (#39976)

* fix qwen3moe gguf architecture

* Fix Qwen3Moe GGUF loading

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Jinuk Kim <jusjinuk@snu.ac.kr>

* Fix error on importing unavailable torch.distributed (#40038)

Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.

* Default to dequantize if cpu in device_map for mxfp4 (#39993)

* default to dq if cpu

* an other check

* style

* revert some changes

* [`Flash Attention`] Fix flash attention integration (#40002)

* fix flash attention

* i got a stroke reading that comment

* change dropout kwarg back to before

* rename _fa3... as it's used for multiple variants and should work as fallback instead

* simplify imports and support kwargs for fa

* style

* fix comments order

* small fix

* skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart

* style

* allow fullgraph by preloading on init

* make globals "private"

* ci pls be happy

* change skip conditions based on backend flag (indicating missing mask interface)

* move globals support to a function to prepare kwargs

* style

* generalize supported kwargs

* small change to doc

* fix

* add comments

* style

* revert prep during generate

* style

* revert weird style changes

* add fa kwarg prep during generate with fixes back

* how did this even happen

* how

* add comment

* [trainer] ensure special tokens in model configs are aligned with tokenizer at train time (#38441)

* tmp commit

* add test

* make fixup

* reset warns/info in test

* Fix Causality Handling in Flash Attention to Support Bidirectional Attention (#39707)

Fix the is_causal logic to enable bidirectional attention

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [docs] Add reference to HF-maintained `custom_generate` collections (#39894)

decoding -> generation; add collections

* Add model card for MobileViT (#40033)

* Add model card for MobileViT

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* remove sequence parallel in llama4 (#40084)

* 🌐 [i18n-KO] Translated `tiny_agents.md` to Korean (#39913)

* docs: ko: tiny_agents.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits

* [bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM (#39975)

* [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models

* to cuda

* changed xLSTMRMSNorm to RMSNorm (#40113)

* changed xLSTMRMS.. to RMS...

* fix linter error

---------

Co-authored-by: Nikita <nikita@Nikitas-MacBook-Pro.local>

* Fix QuantoQuantizedCache import issues (#40109)

* fix quantoquantized

* [serve] allow array `content` inputs for LLMs (#39829)

fix bug; add tests

* `decoding_method` argument in generate (#40085)

* factor out expand inputs

* callable arg

* improve docs, add test

* Update docs/source/en/generation_strategies.md

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Collated reports (#40080)

* Add initial collated reports script and job definition

* provide commit hash for this run. Also use hash in generated artifact name. Json formatting

* tidy

* Add option to upload collated reports to hf hub

* Add glob pattern for test report folders

* Fix glob

* Use machine_type as path filter instead of glob. Include machine_type in collated report

* DOCS: Add missing space in SECURITY.md (#40087)

* [trainer] handle case where EOS token is None in `generation_config` (#40127)

* handle case where EOS token is None in gen config

* update eli5 dataset

* Fix hidden torchvision>=0.15 dependency issue (#39928)

* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT

* fix min torchvision version

* use InterpolationMode directly

* remove unused is_torchvision_greater_or_equal,

* nit

* 🌐 [i18n-KO] Translated `main_classes/processors.md` to Korean (#39519)

* docs: ko: processors.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* 🌐 [i18n-KO] Translated `jamba.md` to Korean (#39890)

* docs: ko: jamba.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestion

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

---------

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `main_classes/optimizer_schedules.md` to Korean (#39713)

* docs: ko: main_classes/optimizer_schedules

* feat: nmt draft

* fix: improve TOC anchors and expressions in optimizer_schedules

- Add TOC anchors to all section headers
- Fix terminology and improve Korean expressions

* fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된'

Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization.

* fix: Use more natural Korean inheritance expression

Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology.

* fix: Use consistent '미세 조정' translation for 'finetuned models'

Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.

* 🚨🚨  [generate] ignore `cache_implementation="hybrid"` hub defaults (#40135)

* working?

* fix tests

* 🌐 [i18n-KO] Translated `gpt2.md` to Korean (#39808)

* docs: ko: bamba.md

* feat: nmt draft

* fix: manual edits

* docs: ko: gpt2.md

* feat: nmt draft

* fix: manual edits

* Remove bamba.md from docs/source/ko/model_doc/

* Update _toctree.yml

* 🌐 [i18n-KO] Translated `optimizers.md` to Korean (#40011)

* docs: ko: optimizers.md

* feat: optimizers draft

* fix: manual edits

* docs: ko: update optimizers.md

* Update docs/source/ko/optimizers.md

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* Update docs/source/ko/optimizers.md

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* Update docs/source/ko/optimizers.md

Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>

* docs: ko: final updates to optimizers and toctree

---------

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>
Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>

* 🌐 [i18n-KO] Translated grounding-dino.md to Korean (#39861)

* docs: ko: grounding-dino.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* docs: add AP explanation for better readability

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* 🚨 Use lru_cache for sine pos embeddings MaskFormer (#40007)

* use lru_cache for sine pos embeddings maskformer

* fix calls to pos embed

* change maxsize to 1

* 🌐 [i18n-KO] Translated `pipelines.md` to Korean (#39577)

* docs: ko: pipelines.md

* feat: gpt draft

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update _toctree.yml

* Update _toctree.yml

번역 문서 수정

* Update pipelines.md

ToC 수정

* Update pipelines.md

---------

Co-authored-by: xhaktm <tnwjd318@hs.ac.kr>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* gpt oss is important (#40139)

* Fix Janus (#40140)

fix

* Add Segment Anything 2 (SAM2) (#32317)

* initial comment

* test

* initial conversion for outline

* intermediate commit for configuration

* chore:init files for sam2

* adding arbitary undefined config

* check

* add vision

* make style

* init sam2 base model

* Fix imports

* Linting

* chore:sam to sam2 classes

* Linting

* Add sam2 to models.__init__

* chore:match prompt encoder with sam2 code

* chore:prepare kwargs for mask decoder

* Add image/video predictors

* Add CUDA kernel

* Add output classes

* linting

* Add logging info

* tmp commit

* docs for sam2

* enable image processing

* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize

* enable promptencoder of sam2

* fix promprencoder

* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)

* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)

* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)

* SamModel is now available (Need more chore for name)

* make fix-copies

* make style

* make CI happy

* Refactor VisionEncoder and PostioinEmbedding

* TO DO : fix the image_embeddings and sparse_embeddings part

* pure image inference done

* reusable features fix and make style

* styling

* refactor memoryattention

* tmp

* tmp

* refactor memoryencoder
TO DO : convert and inference the video pipeline

* TO DO : fix the image_encoder shape

* conversion finish
TO DO: need to check video inference

* make style

* remove video model

* lint

* change

* python utils/check_docstringspy --check_all

* python utils/check_config_attributes.py

* remove copies for sam2promptencoder due to configuration

* change __init__.py

* remove tensorflow version

* fix that to not use direct comparison

* make style

* add missing import

* fix image_embedding_size

* refactor Sam2 Attention

* add fully working video inference (refactoring todo)

* clarify _prepare_memory_conditioned_features

* simplify modeling code, remove unused paths

* use one model

* use auto_docstring

* refactor rope embeddings

* nit

* not using multimask when several points given

* add all sam2.1

* add video tmp

* add Sam2VideoSessionState + fast image proc + video proc

* remove init_states from model

* fix batch inference

* add image integration tests

* uniformize modeling code with other sam models and use modular

* pass vision tests an most model tests

* All tests passing

* add offloading inference state and video to cpu

* fix inference from image embedding and existing mask

* fix multi_boxes mask inference

* Fix batch images + batch boxes inference

* improve processing for image inference

* add support for mask generation pipeline

* add support for get_connected_components post processing in mask generation

* add fast image processor sam, image processor tests and use modular for sam2 image processor

* fix mistake in sam after #39120

* fix init weights

* refactor convert

* add integration tests for video + other improvements

* add needed missing docstrings

* Improve docstrings and

* improve inference speed by avoiding cuda sync

* add test

* skip test for vision_model

* minor fix for vision_model

* fix vision_model by adding sam2model and change the torch dependencies

* remove patch_size

* remove image_embedding_size

* fix patch_size

* fix test

* make style

* Separate hieradet and vision encoder in sam2

* fixup

* review changes part 1

* remove MemoryEncoderConfig and MemoryAttentionConfig

* pass q_stride instead of q_pool module

* add inference on streamed videos

* explicitely process streamed frames

* nit

* Improve docstrings in Sam2Model

* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel

* improve video inference api

* change inference_state to inference_session

* use modular for Sam2Model

* fix convert sam2 hf

* modular

* Update src/transformers/models/sam2/video_processing_sam2.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix minor config

* fix attention loading error

* update modeling tests to use hub checkpoints

* Use CI A10 runner for integration tests values + higher tolerance for video integration tests

* PR review part 1

* fix doc

* nit improvements

* enforce one input format for points, labels and boxes

* nit

* last few nits from PR review

* fix style

* fix the input type

* fix docs

* add sam2 model as conversion script

* improve sam2 doc

* nit fixes + optimization

* split sam2 and sam2_video in two models

* PR review part 1

* fix None for default slow processor of sam2

* remove unecessary code path in sam2_video

* refactor/simplify RoPE

* replace embedding module list with embedding matrix

* fix tests

* remove kernel

* nit

* use lru_cache for sine_pos_embeddings

* reorder sam2_video methods

* simplify sam2_video

* PR review part 1

* simplify sam2 video a lot

* more simplification

* update integration tests with updated conftest

* more explicit config for hieradet

* do post_processing outside of sam2 video model

* Improve Sam2VideoVisionRotaryEmbedding

* fix tests

* update docs and fix mask2former/oneformer

* avoid unnecessary reshapes/permute

* fix device concatenating points

* small dtype fix

* PR review

* nit

* fix style and finish up doc

* fix style

* fix docstrings

* fix modular

---------

Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* [docs] Fix ko toctree (#40138)

Update _toctree.yml

* Remove an old badly designed test (#40142)

remove it

* updated visualBERT modelcard (#40057)

* updated visualBERT modelcard

* fix: Review for VisualBERT card

* 🌐 [i18n-KO] Translated `gemma3.md` to Korean (#39865)

* docs: ko: gemma3.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions

---------

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* Fix quantized cache with only cache_implementation in generate (#40144)

* fix args

* comment

* Add pytest marker: `torch_compile_test` and `torch_export_test` (#39950)

* new marker

* trigger CI

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update Dockerfiles to install packages inside a virtual environment (#39098)

* Removed un-necessary virtual environment creation in Dockerfiles.

* Updated Dockerfiles to install packages in a virtual environment.

* use venv's python

* update

* build and trigger

* trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Create self-scheduled-amd-mi355-caller.yml (#40134)

* [Cohere2Vision] remove unused arg (#40103)

* remove unused arg

* remove the arg from test as well

* [efficientloftr] fix bugs and follow original cross attn implementation strictly (#40141)

* fix: changed is_causal to be False

* fix: Added original cross attention bug

* fix: fixed the way bordel removal is computed

* fix: added missing normalization on coarse features

* test: fixed integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix CI: Use correct import in SAM for torchvision InterpolationMode (#40160)

fix ci

* [Continous Batching] set head_dim when config.head_dim is None (#40159)

* set head_dim when config.head_dim is None

* use model's actual TP setting

* Replace `self.tokenizer` by `self.processing_class` (#40119)

* [FA2] Fix it finally - revert fa kwargs preparation (#40161)

revert

* [bugfix] fix flash-attention2 unavailable error for Ascend NPU (#40151)

* [bugfix] fix flash-attention2 unavailable error for Ascend NPU

* remove redundant apply_rotary_emb usage

* fix ruff check error

* pad_input and unpad_input use same implementation as fa2

* rollback redundant codes

* fix ruff check error

* optimize fa2 judgement logic

* Fix docs typo (#40167)

* DINOv3 model

* working version

* linter revert

* linter revert

* linter revert

* fix init

* remove flex and add convert to hf script

* DINOv3 convnext

* working version of convnext

* adding to auto

* Dinov3 -> DINOv3

* PR feedback

* complete convert checkpoint

* fix assertion

* bf16 -> fp32

* add fast image processor

* fixup

* change conversion script

* Use Pixtral attention

* minor renaming

* simplify intermediates capturing

* refactor DINOv3ViTPatchEmbeddings

* Refactor DINOv3ViTEmbeddings

* [WIP] rope: remove unused params

* [WIP] rope: rename period -> inv_freq for consistency

* [WIP] rope: move augs

* change inv_freq init (not persistent anymore)

* [WIP] rope: move coords to init

* rope - done!

* use default LayerScale

* conversion: truncate expected outputs

* remove commented code

* Refactor MLP layers

* nit

* clean up config params

* nit docs

* simplify embeddings

* simplify compile compat lru_cache

* fixup

* dynamic patch coords

* move augmentation

* Fix docs

* fixup and type hints

* fix output capturing

* fix tests

* fixup

* fix auto mappings

* Add draft docs

* fix dtype cast issue

* add push to hub

* add image processor tests

* fixup

* add modular

* update modular

* convert and test convnext

* update conversion script

* update prefix

* Update LayerNorm

* refactor DINOv3ConvNextLayer

* rename

* refactor convnext model

* fix doc check

* fix docs

* fix convnext config

* tmp fix for check docstring

* remove unused arg

* fix tests

* (nit) change init

* standardize gated MLP

* clear namings and sat493m

* fix tensors on different devices

* revert linter

* pr

* pr feedbak ruff format

* missing headers

* fix code snippet and collection link in docs

* DINOv3 description

* fix checkpoints in tests

* not doc fixes in configs

* output_hidden_states

* x -> features

* remove sequential

---------

Co-authored-by: Cijo Jose <cijose@meta.com>

* build: Add fast image processor tvp (#39529)

* build: add TvpImageProcessorFast

- Introduced TvpImageProcessorFast to enhance image processing capabilities.
- Updated image processing auto registration to include the new fast processor.
- Modified tests to accommodate both TvpImageProcessor and TvpImageProcessorFast, ensuring comprehensive coverage for both classes.

* fix: TvpImageProcessorFast with new resize method and update processing logic

* build: add TvpImageProcessorFast

* refactor: clean up whitespace and formatting in TvpImageProcessorFast and related tests

- Removed unnecessary whitespace and ensured consistent formatting in image_processing_tvp_fast.py.
- Updated import order in test_image_processing_tvp.py for clarity.
- Minor adjustments to maintain code readability and consistency.

* fix: Enhance TvpFastImageProcessorKwargs and update documentation

- Added TvpFastImageProcessorKwargs class to define valid kwargs for TvpImageProcessorFast.
- Updated the documentation in tvp.md to include the new class and its parameters.
- Refined the image processing logic in image_processing_tvp_fast.py for better handling of padding and resizing.
- Improved test cases in test_image_processing_tvp.py to ensure compatibility with the new processing logic and tensor inputs.

* fix: tested now with python 3.9

* fix: remove tvp kwargs from docs

* simplify processing

* remove import and fix tests

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* Add GptOssForSequenceClassification for GPT-OSS models (#40043)

* Add GptOssForSequenceClassification

* Tiny fix

* make fixup

* trigger CI rerun

* Check config type instead

---------

Co-authored-by: Yuefeng Zhan <yuefzh@microsoft.com>

* Standardize BARTpho model card: badges, new examples, fixed broken im… (#40051)

* Standardize BARTpho model card: badges, new examples, fixed broken image section, and links (#36979)Update bartpho.md

* Update bartpho.md

Removed non-required/unsupported sections: Quantization, Attention visualizer, and Resources (plus stray tokenizer header).

Added code snippets which were suggested

* Update bartpho.md

Updated with necessary tags

* Update bartpho.md

* Update bartpho.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Add dates to the model docs (#39320)

* added dates to the models with a single hf papers link

* added the dates for models with multiple papers

* half of no_papers models done

* rest of no_papers models also done, only the exceptions left

* added copyright disclaimer to sam_hw, cohere, cohere2 + dates

* some more fixes, hf links + typo

* some new models + a rough script

* the script looks robust, changed all paper links to hf

* minor change to handle technical reports along with blogs

* ran make fixup to remove the white space

* refactor

* Pin torch to 2.7.1 on CircleCI for now (#40174)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update dynamic attnt setter for multimodals (#39908)

* update

* fix the test for DepthPro

* PR comments

* wait, I didn't delete this in prev commit?

* fix

* better way

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* [MINOR:TYPO] Update base.py (#40169)

* [MINOR:TYPO] Update base.py

All other occurrences in the docs use lowercase. (https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20translation_XX_to_YY&type=code)

Also, using uppercase doesn't work: tested with "translation_EN_to_FR" which doesn't work and instead returns:  `ValueError: The task does not provide any default models for options ('EN', 'FR')`

It might be a good idea to allow for uppercase, but that's for another issue.

* [MINOR:TYPO] Update __init__.py

* make model doc device agnostic (#40143)

* make model doc device agnostic

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update align.md

* Update aya_vision.md

* Update byt5.md

* refine

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update granitevision.md

* Update src/transformers/pytorch_utils.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add doc

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* 3 more

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix to avoid modifying a view in place (#40162)

* fix to avoid modifying a view in place

* add backward test in tensor parallel

* add test to test_modelig_gpt_oss.py

* linting

* Fix fsdp for generic-task models (#40191)

* remove abc inheritance

* add fast test

* Add repr to EncoderDecoderCache (#40195)

* add repr

* oups

* Fix typos (#40175)

Signed-off-by: cyy <cyyever@outlook.com>

* Remove _prepare_flash_attention_from_position_ids (#40069)

Signed-off-by: cyy <cyyever@outlook.com>

* Avoid CUDA stream sync (#40060)

Signed-off-by: cyy <cyyever@outlook.com>

* Fix various Pylint warnings (#40107)

Tidy code

Signed-off-by: cyy <cyyever@outlook.com>

* Update: add type hints to check_tokenizers.py (#40094)

* Update check_tokenizers.py

chore(typing): add type hints to check_tokenizers script

- Annotate params/returns for helper functions
- Keep tokenizer instances as `Any` to avoid runtime coupling
- Make `check_LTR_mark` return `bool` explicitly (no behavior change)

* Update check_tokenizers.py

chore(typing): replace Any with PreTrainedTokenizerBase in check_tokenizers.py

- Use transformers.tokenization_utils_base.PreTrainedTokenizerBase for `slow` and `fast` params
- Covers both PreTrainedTokenizer and PreTrainedTokenizerFast
- Exposes required methods (encode, decode, encode_plus, tokenize)
- Removes generic Any typing while staying implementation-agnostic

* Benchmarking improvements (#39768)

* Start revamping benchmarking

* Start refactoring benchmarking

* Use Pandas for CSV

* import fix

* Remove benchmark files

* Remove sample data

* Address review comments

* Add X-Codec model (#38248)

* add working x-codec

* nit

* fix styling + copies

* fix docstring

* fix docstring and config attribute

* Update args + config

* update convertion script

* update docs + cleanup

* Ruff fix

* fix doctrings

* Fix GPT-OSS `swiglu_limit` not passed in for MXFP4 (#40197)

Add swiglu_limit = 7.0

* docs: Update LayoutLM model card according to new standardized format (#40129)

* docs: Update LayoutLM model card with standardized format

* Apply suggestions from code review

This commit incorporates all suggestions provided in the recent review. Further changes will be committed separately to address remaining comments.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Address remaining review comments

* Address few more review comments:
1. remove transformer-cli section
2. put resources after notes
3. change API refs to 2nd level header

* Update layoutlm.md

* Update layoutlm.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Revert "Pin torch to 2.7.1 on CircleCI for now" + Final fix for `too long with no output` (#40201)

* Revert "Pin torch to 2.7.1 on CircleCI for now (#40174)"

This reverts commit 31b6e6e1dac0d32f74ec5cd6b3c1868534ccd7b5.

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Use correct `model_input_names` for PixtralImageProcessor (#40226)

add image_sizes to model_input_names

* fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function (#40130)

* fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* fix similar errer at qwen2_vl and do make fix-copies

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* pass in kwargs for loss_func at qwen2_vl and qwen2_5_vl

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* Apply style fixes

---------

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* [SAM 2] Change checkpoints in docs and tests (#40213)

* change checkpoints in docs and tests

* add notebook

* Fix more typos (#40212)

Signed-off-by: cyy <cyyever@outlook.com>

* Fix ESM token_dropout crash when using inputs_embeds instead of input_ids (#40181)

* fix: Error after calling ESM model with input embeddings not input ids

* propagate changes to other models

* AMD scheduled CI ref env file (#40243)

* Reference env-file to be used in docker running the CI

* Disable MI300 CI for now

* Add Ovis2 model and processor implementation (#37088)

* Add Ovis2 model and processor implementation

* Apply style fixes

* Add unit tests for Ovis2 image processing and processor

* Refactor image processing functions for clarity and efficiency

* Add Ovis2 ImageProcessorFast

* Refactor Ovis2 code

* Refactor Ovis2 model components and update processor functionality

* Fix repo consistency issues for Ovis2: docstring, config cleanup

* Update Ovis2 model integration tests

* Update Ovis2 configuration and processing classes for improved documentation

* Remove duplicate entry for 'ovis2' in VLM_CLASS_NAMES

* Fix conflict

* Fix import order

* Update image processor class names

* Update Ovis2 model structure

* Refactor Ovis2 configuration

* Fix typos

* Refactor Ovis2 model classes and remove unused code

* Fix typos

* Refactor Ovis2 model initialization

* Fiix typos

* Remove Ovis2 model mapping from MODEL_MAPPING_NAMES in modeling_auto.py

* Add license and update type hints

* Refactor token function and update docstring handling

* Add license

* Add Ovis2 model support and update documentation

* Refactor Ovis2 model structure and enhance multimodal capabilities

* Update Ovis2 weight mapping for consistency and clarity in key patterns

* Remove unused 'grids' parameter from Ovis2 model and Update processing logic to handle image grids more efficiently.

* Refactor Ovis2 model test structure to include Ovis2Model

* Add optional disable_grouping param to Ovis2ImageProcessorFast

* Refactor type hints in Ovis2 modules

* Add licensing information in Ovis2 modules and tests

* Refactor Ovis2 model by removing unused methods

* Refactor Ovis2 model tests by renaming test classes and removing skipped tests

* Refactor Ovis2 model output classes

* Refactor Ovis2 weight conversion and Update model embedding classes

* Refactor Ovis2 model imports and remove unused functions

* Enhance vision configuration extraction in Ovis2 weight conversion

* Refactor Ovis2 model's forward method to remove interpolation option

* Update Ovis2 model documentation

* Refactor Ovis2 model input handling and tokenizer configuration

* Update return type hints in Ovis2 model

* Remove commented-out code

* fix config for tests and remove key mappings

* Update tokenizer configuration to use add_special_tokens method

* skip torchscript

* Fix image placeholder generation in Ovis2Processor

* Refactor Ovis2 model to rename visual_table to visual_embeddings_table

* Enhance Ovis2 model by adding vision_feature_select_strategy parameter

* Refactor Ovis2 model weights conversion and architecture

* Refactor Ovis2 model by removing vision_feature_select_strategy parameter

* Update Ovis2 model examples

* Refactor Ovis2 model

* Update Ovis2 model

* Update Ovis2 model configuration

* Refactor Ovis2 model test setup

* Refactor flash attention support

* Refactor

* Fix typo

* Refactor

* Refactor model classes

* Update expected output in Ovis2

* Refactor docstrings

* Fix

* Fix

* Fix

* Update input in tests

* Fix

* Fix get_decoder method

* Refactor

* Refactor Ovis2

* Fix

* Fix

* Fix test

* Add get_placeholder_mask

* Refactor Ovis2 model tests

* Fix

* Refactor

* Fix

* Fix

* Fix Ovis2 test

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fix more pylint warnings (#40204)

Fix pylint warnings

Signed-off-by: cyy <cyyever@outlook.com>

* 🚨 Always return Cache objects in modelings (to align with generate) (#39765)

* watch the world burn

* fix models, pipelines

* make the error a warning

* remove kwargs and return_legacy_cache

* fix reformer

* remove transpose_for_scores call in ESM-2 (#40210)

* remove transpose_for_scores call

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* fix copied evolla code

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* Add `chat_template` (`jinja2`) as an extra dependency (#40128)

* add jinja2 as a dependency

* Make jinja2 a core dependency in install_requires

- Add jinja2 to install_requires list in setup.py for automatic installation
- Add jinja2 to runtime version checks in dependency_versions_check.py
- Resolves issue where pip install transformers doesn't install jinja2

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Make jinja2 a core dependency in install_requires

* Make jinja2 an extra dependency instead of adding a core dep

---------

Co-authored-by: Claude <noreply@anthropic.com>

* [typing] fix type annotation error in DepthPro model image processor (#40238)

* fix type annotation error in DepthPro model image processor

* fix

* run make fix-copies

* [serve] guard imports (#39825)

guard imports

* [`CI`] Fix repo consistency (#40249)

* fix

* doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fixes for EncoderDecoderCache (#40008)

* Add expectation to t5 for rocm 9.4

* Made EncoderDecoderCache compatible with nn.DataParallel

* Fixed t5gemma EncoderDecoderCache

* Added todos in autoformer

* Ruff

* Init is self-contained

* Review compliance

* Fixed kwargs init of EncoderDecoderCache

* fix: Catch correct ConnectionError for additional_chat_templates (#39874)

* fix: Catch correct ConnectionError for additional_chat_templates

* fix: don't catch timeout

* fix: formatting

* Model card for NLLB (#40074)

* initializing branch and draft PR

* updated model card .md file

* minor

* minor

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* resolving comments + adding visuals

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* NllbTokenizerFast and NllbTokenizer added

* endline

* minor

* Update nllb.md

---------

Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Correct typo and update notes in docs Readme (#40234)

* Correct typo and update notes in docs readme

* Update docs/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix benchmark workflow (#40254)

Correct init_db.sql path

Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>

* docs: Update OLMo model card (#40233)

* Updated OLMo model card

* Update OLMo description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix cli typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix cli example

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Add bitsandbytes info

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Skip broken tests (#40157)

skip these tests

* Remove MI300 CI (#40270)

Remove MI300 CI (in history if we need it back)

* set inputs_embeds to None while generate to avoid audio encoder forward in generation process (#40248)

* set inputs_embeds to None while generate to avoid audio encoder forward in generation process

* set input_features to none instead

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>

* [detection] fix attention mask for RT-DETR-based models (#40269)

* Fix get_contrastive_denoising_training_group attention

* Add bool attention_mask conversion

* Fix slow static cache export tests (#40261)

* 🚨🚨 Switch default compilation to fullgraph=False (#40137)

* switch default

* docstring

* docstring

* rework tests and remove outdated restrictions

* simplify

* we need a check for static cache

* fix

* rename var

* fix

* revert

* style

* rename test

* Fix setting attention for multimodal models (#39984)

* fix

* use non-explicit `None`

* keep previously set attn if exists

* [detection] fix correct `k_proj` weight and bias slicing in D-FINE (#40257)

Fix: correct k_proj weight and bias conversion in D-FINE

* Add Kosmos-2.5 (#31711)

Add Microsoft Kosmos-2.5

---------

Co-authored-by: kirp@umich.edu <tic-top>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Skipping pytree registration in case fsdp is enabled (#40075)

* Skipping pytree registration in case fsdp is enabled

* Beauty changes

* Beauty changes

* Moved the is_fsdp_available function to import utils

* Moved is_fsdp_available to integrations.fsdp

* Skipping pytree registration in case fsdp is enabled

* Beauty changes

* Beauty changes

* Moved the is_fsdp_available function to import utils

* Moved is_fsdp_available to integrations.fsdp

* Added pytree registration inside dynamic cache class

* Making ci/cd lords happy

* Adding a check if DynamicCache is already a leaf

* Adding try/catch for multiple initializations of DynamicCache in test suites

* Moving dynamic cache pytree registration to executorch

* Adding try catch back

* Update image_processing_perception_lm_fast.py to allow for proper override of vision_input_type (#40252)

* Update image_processing_perception_lm_fast.py

Allow for a proper override of vision_input_type in hf fast image processor, otherwise we need to resort to manually setting the attribute.

* Update processing_perception_lm.py to match kwargs vision input type

* Update image_processing_perception_lm_fast.py kwargs to signature args

* fix which routing method (#40283)

* Fix chat CLI GPU loading and request_id validation issues (#40230) (#40232)

* Fix chat CLI GPU loading and request_id validation issues (#40230)

This commit addresses two critical bugs in the transformers chat CLI:

1. **GPU Loading Issue**: Changed default device from "cpu" to "auto" in ChatArguments
   - Chat CLI now automatically uses GPU when available instead of defaulting to CPU
   - Matches the behavior of the underlying serving infrastructure

2. **Request ID Validation Error**: Added request_id field to TransformersCompletionCreateParamsStreaming schema
   - Fixes "Unexpected keys in the request: {'request_id'}" error on second message
   - Allows request_id to be properly sent and validated by the server

Both fixes target the exact root causes identified in issue #40230:
- Users will now get GPU acceleration by default when available
- Chat sessions will no longer break after the second message

* Remove unrelated request_id field from TransformersCompletionCreateParamsStreaming

* docs(layoutlm): add missing `id=usage` to `<hfoptions>` tag in LayoutLM model card (#40273)

docs(layoutlm): add missing 'id=usage' to <hfoptions> tag in LayoutLM model card

* Standardize RAG model card (#40222)

* Standardize RAG model card

Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added pipeline and AutoModel usage examples
- Included quantization example with BitsAndBytesConfig
- Added notes and resources sections
- Removed abstract and FlashAttention badge

* Standardize RAG model card

Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added AutoModel usage example
- Included quantization example with BitsAndBytesConfig

* docs: Update TrOCR model card to new format (#40240)

* docs: Update TrOCR model card to new format

* Updated Sugegestions

* Update model card for gpt neox japanese (#39862)

* Update GPT-NeoX-Japanese model card

* Apply suggestions from code review

* Update gpt_neox_japanese.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* SmolVLM and InternVL: Ensure pixel values are converted to the correct dtype for fp16/bf16 (#40121)

* Ensure pixel values are converted to the correct dtype for fp16/bf16

* add to modular

* Standardize BertGeneration model card (#40250)

* Standardize BertGeneration model card: new format, usage examples, quantization

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply reviewer feedback: update code examples

* Add missing code example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Adjust ROCm test output expectations (#40279)

Adjust ROCm output expectations

* SmolVLM test fixes (#40275)

* Fix SmolVLM tests

* Add the proper CUDA expectations as well

* Split 'A10 and A100 expectations

* Ruff

---------

Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>

* make model docs device agnostic (2) (#40256)

* doc cont.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* more models

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mixtral.md

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [3/3] make docs device agnostic, all en docs for existing models done  (#40298)

docs to device agnostic cont.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Add MetaCLIP 2 (#39826)

* First draft

* Make fixup

* Use eos_token_id

* Improve tests

* Update clip

* Make fixup

* Fix processor tests

* Add conversion script

* Update docs

* Update tokenization_auto

* Make fixup

* Use check_model_inputs

* Rename to lowercase

* Undo CLIP changes

* Address comment

* Convert all checkpoints

* Update auto files

* Rename checkpoints

* Allow to be able to run `torch.compile` tests with `fullgraph=True` (#40164)

* fix

* address comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`FA`] Fix dtype in varlen with position ids (#40295)

fix

* [docs] delete more TF/Flax docs (#40289)

* delete some TF docs

* update documentation checks to ignore tf/flax

* a few more removals

* nit

* Update utils/check_repo.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clean up X-Codec. (#40271)

* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.

* Remove OTel SDK dependencies (#40305)

* Fix GOT-OCR2 and Cohere2Vision image processor patches caculation (#40312)

fix got-ocr patches caculation

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* [`fix`] Pass adamw optimizer parameters to StableAdamW (#40184)

* fix: pass adamw optimizer parameters to StableAdamW

* add test for stable_adamw initialization with trainer arguments

* address copilot suggestion

* fix: update weight_decay handling in stable_adamw kwargs

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* chore: fix typo in `find_executable_batch_size` to match new 0.9 ratio (#40206)

* :rotating_light: [`Flash Attention`] Fix sliding window size (#40163)

* swa fix

* add comment, make fix symmetrical

* modify fa inference test to force swa correctness check

* fixup comment

* Remove unnecessary contiguous calls for modern torch (#40315)

* Add support for Florence-2 (#38188)

* init

* add modular

* fixup

* update configuration

* add processing file

* update auto files

* update

* update modular

* green setup_and_quality ci

* it works

* fix some tests

* commit florence2

* update test

* make test cases done - 16 left

* style

* fix few test cases

* fix some tests

* fix init test

* update florence2 vision style

* hope is green

* fix init test

* fix init

* update modular

* refactor vision module

* fix: channel attention use dynamic scale

* update modular

* update

* update attention mask

* update

* fix naming

* Update src/transformers/models/florence2/processing_florence2.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* spatial block works

* more beautiful

* more more beautiful

* merge main

* merge main and fixup

* fix typing hint

* update modeling

* fix eager matches sdpa

* fix style

* fix compile test - all green

* remove florence2 language

* remove Florence2LanguageModel things

* fix style

* update florence2 model

* override prepare encoder_decoder for generation

* add weight conversion script

* rewrite channel attention to use sdpa

* eleminate 1 tranpose op

* support fa2

* fix quality check

* chore: reformat `test_modeling_florence2.py`

* some refactor for processor

* some refactor for processor

* update naming convention and remove BC

* make it pass the test

* fix: correct Embedding Cosine

* update comments and docstring

* support input_embeds

* support input embeds ideally

* fix style

* fix style

* fix style again :D

* add test prcoessor

* refactor processor and add test for processor

* reformat test processor

* make fixup

* fix schema check

* remove image_token

* ensure image token in tokenizer and fix integration tests

* fix processor test

* add more integration tests for large model and rename test_processor to test_processing

* test_assisted_decoding_sample should pass

* update doc and make model work with image text to text pipeline

* docs: add sdpa bagde

* resolve cyril's comments

* fix import torch error

* add helper get_placeholder_mask

* inherit from llava

* florence2 may not _supports_attention_backend because of bart ...

* move florence2 model card to multimodal

* let base model always return_dict

* fix style

* tiny update doc

* set   _checkpoint_conversion_mapping = {}

* fix code quality

* support flex and compile graph and move external func to internal func

* remove condition because it always true

* remove window funcs

* move post processor config out

* fix ci

* new intro to trigger test

* remove `kernel_size` argument

---------

Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Qwen2.5-Omni test fixes (#40307)

Updated expectations, and mp tests

* Add back `_tp_plan` attribute (#39944)

* Update modeling_utils.py

* make sure we update with the module's plan

* use public api

* oups

* update

* fix failing test

* Update src/transformers/integrations/tensor_parallel.py

* Update src/transformers/integrations/tensor_parallel.py

* fix

* make the API more friendly!

* fix tests

* fix styling

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* byebye torch 2.1 (#40317)

* Bump minimum torch version to 2.2

* Remove is_torch_greater_or_equal_than_2_2

* update versions table

* Deprecate is_torch_sdpa_available (except for backward compat), remove require_torch_sdpa

* No more `natten` (#40287)

get rid off natten

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`GPT OSS`] Refactor the tests as it was not properly checking the outputs (#40288)

* it was long due!

* use the official kernel

* more permissive

* update the kernel as well

* mmm should it be this?

* up pu

* fixup

* Update test_modeling_gpt_oss.py

* style

* start with 20b

* Update CI with nightly torch workflow file (#40306)

* fix nightly ci

* Apply suggestions from code review

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>

* Fix: Apply `get_placeholder_mask` in Ovis2 (#40280)

* Refactor special image mask

* Refactor get_placeholder_mask method

* Revert "Refactor special image mask"

This reverts commit 9eb1828ae930329656d6f323a510c5e6033e1f85.

* Fix

* Revert "Refactor get_placeholder_mask method"

This reverts commit 07aad6484bb08d6351d5b605e9db574d28edcd15.

* Update notification service amd_daily_ci_workflows definition (#40314)

* One cache class to rule them all (#40276)

* remove all classes

* fix generate

* start replacing everywhere

* finish removing everywhere

* typo

* typo

* fix

* typo

* remove num_layers=1

* CI

* fix all docstrings

* review

* style

* Fix chunked attention mask with left-padding (#40324)

* add fix

* add test

* raise proper warning for older versions

* fix

* fix and add 2nd test

* fix for flex and torch 2.5

* [docs] remove flax references from `/en/model_doc` (#40311)

* 1st commit

* all models up to D

* all models up to G

* all models up to M

* all remaining models

* Fix qwen-omni processor text only mode (#40336)

* Fix qwen-omni processor text only mode

* remove try except

---------

Co-authored-by: yuekaiz <yuekaiz@mgmt1-login.cm.cluster>

* Change Qwen2RMSNorm to RMSNorm from PyTorch (#40066)

* Unify Qwen2RMSNorm definitions and use RMSNorm from PyTorch

Signed-off-by: cyy <cyyever@outlook.com>

* subclass RMSNorm

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>

* Add DeepseekV3ForSequenceClassification for Deepseek V3 models (#40200)

* Add Sequence Classification Support for Deepseek v3 model DeepseekV3ForSequenceClassification

* After run make fixup

* Fix deprecation warning version (#40343)

fix

* Add missing arguments to class constructors (#40068)

* Add missing arguments

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typos

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>

* [docs] remove TF references from `/en/model_doc` (#40344)

* models up to F

* models up to M

* all models

* Fix: Only call Trainer.align_special_tokens if model has "config" attribute (#40322)

* Only call Trainer.align_special_tokens if model has "config" attribute

* Add efficient test for training a model without model.config

* Reformat

* add type hints (#40319)

* add basic type hints to import module

* run make fixup

* remove optional

* fixes

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Fix an infinite loop bug in recursive search of relative imports (#40326)

Fix bug in recursive search of relative imports

* Fix links in Glm4vMoe configuration classes to point to the correct H… (#40310)

* Fix links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository

* run fixup to update links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository

* T5 test and target device fixes (#40313)

* Fix cache setup related issues

* Fix target-device-related issues

* Ruff

* Address review comments

* Update `test_spm_converter_bytefallback_warning` (#40284)

fff

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* (small) fix conditional for input_ids and input_embeds in marian (#40045)

* (small) fix conditional for input_ids and input_embeds in marian

* address comment

* Fix attention vizualizer (#40285)

* make visualizer rely on create causal mask

* format

* fixup

* fixup

* read token

* read token, duh

* what is up with that token

* small tests?

* adjust

* try with flush

* normalize for ANSI

* buffer shenanigans

* [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification (#35991)

* [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification

* fix the modular conversion

* Clean up XCodec and other codecs (#40348)

* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.

* Polish XCodec and standardize across codecs.

* Update src/transformers/models/xcodec/modeling_xcodec.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Format and fix test.

* Update tol.

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* [serve] add cors warnings (#40112)

* add cors warnings

* Update src/transformers/commands/serving.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/commands/serving.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* make fixup

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [detection] use consistent dtype for Conditional and DAB DETR positional embeddings (#40300)

fix: use consistent dtype for sine positional embeddings

* Remove more PyTorch 2.2 compatible code (#40337)

Signed-off-by: cyy <cyyever@outlook.com>

* [`FA`] Fix some model tests (#40350)

* fix

* cleanup, revert aimv2 fa changes

* fix aria

* i searched a long time but the cross dependency is for the recent models so...

* this was something... evolla

* fix modernbert decoder + make fa test more robust

* nit

* Qwen2.5-VL test fixes for ROCm (#40308)

* [generate] handle support for cache classes when num enc layers != num dec layers (#40277)

* handle support for cache classes when num enc layers != num dec layers

* handle overwrites

* one more corner case

* Update src/transformers/generation/utils.py

* Update src/transformers/generation/utils.py

* Apply suggestions from code review

* handle corner case :o

* [4/N]more docs to device agnostic (#40355)

* more docs to device agnostic

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* 1

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* 2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update vitpose.md

* Update camembert.md

* Update camembert.md

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* DOCS: Clarification on the use of `label_names` as an argument to TrainingArguments (#40353)

* Update trainer.md

* Update trainer.md

Removed the detail about label_names argument usage from the tip/ warning section

* Update training_args.py

Added the label_names usage clarification in the docstring

* Update trainer.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* HunYuan opensource (#39606)

* merge opensource_hunyuan

* add head_dim

* fix assertion error

* fix seen_tokens

* ready_for_upstream (merge request !17)

Squash merge branch 'ready_for_upstream' into 'main'

* fix configuration type&docstring
* fix style

* ready_for_upstream (merge request !18)

Squash merge branch 'ready_for_upstream' into 'main'
* add doc
* fix testcode
* fix configuration type&docstring

* rename base model

* remove assert

* update

* remove tiktoken

* update

* fix moe and code style (#3)

* update

* fix format

* update

* revert makefile

* fix moe config

* fix numel()

* remove prepare_inputs_for_generation

* fix kv_seq_len

* add docs/toctree

* remove unused paramter&add licence

* add licence

* remove unused paramter

* fix code

* dense modular

update import

fix

fix

use mistralmodel

fix qknorm

add sliding_window

make style

fix

dense done

hunyuan moe

fix import

fix modular

fixup

fixup

* update model path

* fix mlp_bias

* fix modular

* Fix modeling (#5)

…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants