Support input_embeds in torch exportable decoders #39836

jackzhxng · 2025-08-01T01:50:36Z

What does this PR do?

Allows specifying inputs_embeds in TorchExportableModule's in order to support export of multimodal model's text decoders.

Adds config and generation_config to the constructors to support multimodal models since the TorchExportableModule will wrap the nested text decoder model, which doesn't have its config and generation config as attributes.

e.g. for exporting Voxtral's text decoder we need to:

voxtral = AutoModel.FromPretrained( ... )
TorchExportableModuleForDecoderOnlyLM(
    model=voxtral.language_model,
    config=voxtral.config.text_config,
    generation_config=voxtral.generation_config,
)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

@echarlaix @michaelbenayoun @zucchini-nlp

src/transformers/integrations/executorch.py

zucchini-nlp

Hey @jackzhxng , I left a few comments below.Not sure this will work with vision multimodal models right now

zucchini-nlp · 2025-08-04T09:33:10Z

src/transformers/integrations/executorch.py

+        if not config:
+            config = model.config
+        if not generation_config:
+            generation_config = model.generation_config
+
+        if not hasattr(config, "use_cache") or config.use_cache is False:


I think we dont need to explicitly pass configs. In case of multimodals, the text decoder config is available via model.config.get_text_config() which will return the text decoder config for any type of model. And the needed generation config is usually the model's own generation config, not the LM's generation config

So we can do

text_config = model.config.get_text_config() gen_config = model.generation_config # NOTE: model.language_model will not have an lm_head for all vision multimodal models # so we need to support exporting the whole model, but without multimodal inputs TorchExportableModuleForDecoderOnlyLM(model=voxtral)

src/transformers/integrations/executorch.py

zucchini-nlp · 2025-08-04T09:36:08Z

src/transformers/integrations/executorch.py

            ensuring that the exported model can be executed in `ExecuTorch` out-of-the-box.
        """
-        _, seqlen = input_ids.shape
-        position_ids = cache_position.unsqueeze(0)


any reason to remove position_ids?

Seemed unnecessary since models like llama already do this in the forward -

transformers/src/transformers/models/llama/modeling_llama.py

Line 386 in 049a674

position_ids = cache_position.unsqueeze(0)

Should I add it back?

ah right! As long as it doesn't break export, I'm fine. Wanted to make sure it wasn't deleted accidentally

Jack, are you sure all models to this internally? I remember position ids are used for cache updates and not sure if all models do this. lets make sure to verify this

jackzhxng · 2025-08-04T17:21:37Z

Hi @zucchini-nlp, thanks for the review! We are only keeping the decoder portion of multimodal in transformers right now, the rest of the exportable modules we are keeping in Optimum ET right now (huggingface/optimum-executorch#111) to iterate more quickly since there is a lot of work going on around multimodal at the moment

zucchini-nlp

Thanks for iterating, overall LGTM! I would like someone from optimum to review the PR as well before merging

zucchini-nlp · 2025-08-05T10:13:08Z

src/transformers/integrations/executorch.py

            ensuring that the exported model can be executed in `ExecuTorch` out-of-the-box.
        """
-        _, seqlen = input_ids.shape
-        position_ids = cache_position.unsqueeze(0)


ah right! As long as it doesn't break export, I'm fine. Wanted to make sure it wasn't deleted accidentally

zucchini-nlp · 2025-08-05T10:16:21Z

src/transformers/integrations/executorch.py

-            max_batch_size (int): Maximum batch size for the cache.
-            max_cache_len (int): Maximum sequence length for the cache.


Not sure if TorchExportableModuleWithHybridCache is used commonly, we can make a small deprecation cycle

Thanks for the review! I believe it's just used for Gemma at the moment

github-actions · 2025-08-05T18:41:18Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma, gemma2, gemma3, llama, olmo, phi3, qwen2, qwen3

kimishpatel · 2025-08-05T23:49:54Z

src/transformers/integrations/executorch.py

        self,
        model: PreTrainedModel,
-        max_batch_size: int = 1,
-        max_cache_len: int = 4096,


why remove these?

ok i guess you are relying on generation config

kimishpatel

looks good to me as well. can we merge

jackzhxng · 2025-08-06T16:16:22Z

@zucchini-nlp am I able to merge this myself or do I have to wait for someone from HF to merge it?

zucchini-nlp · 2025-08-07T08:39:36Z

Oke, let;s merge it

HuggingFaceDocBuilderDev · 2025-08-07T08:51:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2025-08-08T08:45:15Z

The export tests are failing several models after this PR. @jackzhxng will you have time to take a look? 👀

`pytest -k test_export tests/models/{below_models}`
      "gemma": 1,
       "gemma2": 1,
       "gemma3": 1,
       "llama": 1,
       "olmo": 1,
       "phi3": 1,
       "qwen2": 1,
       "qwen3": 1,
       "smolvlm": 1

jackzhxng · 2025-08-08T18:29:50Z

Yup, let me fix that

jackzhxng · 2025-08-13T18:13:38Z

@zucchini-nlp can I confirm that the error you are seeing is this?

tests/models/qwen3/test_modeling_qwen3.py:301:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/transformers/integrations/executorch.py:609: in generate
    result = exported_program.module().forward(
<eval_with_key>.36:398: in forward
    view_1 = torch.ops.aten.view.default(linear, [1, 6, -1, 128]);  linear = None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <OpOverload(op='aten.view', overload='default')>, args = (tensor([[[ 0.1143,  0.1040,  0.0566,  ..., -0.0137, -0.0889,  0.2158]]],
       dtype=torch.bfloat16, grad_fn=<UnsafeViewBackward0>), [1, 6, -1, 128]), kwargs = {}

    def __call__(self, /, *args, **kwargs):
>       return self._op(*args, **kwargs)
E       RuntimeError: shape '[1, 6, -1, 128]' is invalid for input of size 2048

Also is there a way for me to see the status / logs from these periodic slow tests?

zucchini-nlp · 2025-08-13T21:05:55Z

Yep, exactly. I am attaching the job links for all failed tests in case it helps. These tests are run everyday and we usually get pinged internally by a bot when anything new starts failing. Not sure if we can add you there.

Running all possibly affected tests before merging a PR is fine imo, and if anything fails even after that on export tests then we will tag you. Same as for this PR :)

 "zucchini-nlp": {
        "gemma": {
            "single-gpu": [
                {
                    "test": "tests/models/gemma/test_modeling_gemma.py::GemmaIntegrationTest::test_export_static_cache",
                    "commit": "6121e9e46c4fc4e5c91d9f927aef5490691850cf",
                    "pr_number": 39836,
                    "author": "jackzhxng",
                    "merged_by": "zucchini-nlp",
                    "job_link": "https://github.com/huggingface/transformers/actions/runs/16820781536/job/47647381350"
                }
            ]
        },
        "gemma2": {
            "single-gpu": [
                {
                    "test": "tests/models/gemma2/test_modeling_gemma2.py::Gemma2IntegrationTest::test_export_static_cache",
                    "commit": "6121e9e46c4fc4e5c91d9f927aef5490691850cf",
                    "pr_number": 39836,
                    "author": "jackzhxng",
                    "merged_by": "zucchini-nlp",
                    "job_link": "https://github.com/huggingface/transformers/actions/runs/16820781536/job/47647381360"
                }
            ]
        },
        "gemma3": {
            "single-gpu": [
                {
                    "test": "tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_export_text_only_with_hybrid_cache",
                    "commit": "6121e9e46c4fc4e5c91d9f927aef5490691850cf",
                    "pr_number": 39836,
                    "author": "jackzhxng",
                    "merged_by": "zucchini-nlp",
                    "job_link": "https://github.com/huggingface/transformers/actions/runs/16820781536/job/47647381377"
                }
            ]
        },
        "llama": {
            "single-gpu": [
                {
                    "test": "tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_export_static_cache",
                    "commit": "6121e9e46c4fc4e5c91d9f927aef5490691850cf",
                    "pr_number": 39836,
                    "author": "jackzhxng",
                    "merged_by": "zucchini-nlp",
                    "job_link": "https://github.com/huggingface/transformers/actions/runs/16820781536/job/47647382292"
                }
            ]
        },
        "olmo": {
            "single-gpu": [
                {
                    "test": "tests/models/olmo/test_modeling_olmo.py::OlmoIntegrationTest::test_export_static_cache",
                    "commit": "6121e9e46c4fc4e5c91d9f927aef5490691850cf",
                    "pr_number": 39836,
                    "author": "jackzhxng",
                    "merged_by": "zucchini-nlp",
                    "job_link": "https://github.com/huggingface/transformers/actions/runs/16820781536/job/47647379951"
                }
            ]
        },
        "phi3": {
            "single-gpu": [
                {
                    "test": "tests/models/phi3/test_modeling_phi3.py::Phi3IntegrationTest::test_export_static_cache",
                    "commit": "6121e9e46c4fc4e5c91d9f927aef5490691850cf",
                    "pr_number": 39836,
                    "author": "jackzhxng",
                    "merged_by": "zucchini-nlp",
                    "job_link": "https://github.com/huggingface/transformers/actions/runs/16820781536/job/47647380312"
                }
            ]
        },
        "qwen2": {
            "single-gpu": [
                {
                    "test": "tests/models/qwen2/test_modeling_qwen2.py::Qwen2IntegrationTest::test_export_static_cache",
                    "commit": "6121e9e46c4fc4e5c91d9f927aef5490691850cf",
                    "pr_number": 39836,
                    "author": "jackzhxng",
                    "merged_by": "zucchini-nlp",
                    "job_link": "https://github.com/huggingface/transformers/actions/runs/16820781536/job/47647380556"
                }
            ]
        },
        "qwen3": {
            "single-gpu": [
                {
                    "test": "tests/models/qwen3/test_modeling_qwen3.py::Qwen3IntegrationTest::test_export_static_cache",
                    "commit": "6121e9e46c4fc4e5c91d9f927aef5490691850cf",
                    "pr_number": 39836,
                    "author": "jackzhxng",
                    "merged_by": "zucchini-nlp",
                    "job_link": "https://github.com/huggingface/transformers/actions/runs/16820781536/job/47647380763"
                }
            ]
        },
        "smolvlm": {
            "single-gpu": [
                {
                    "test": "tests/models/smolvlm/test_modeling_smolvlm.py::SmolVLMForConditionalGenerationIntegrationTest::test_export_smolvlm_text_decoder",
                    "commit": "6121e9e46c4fc4e5c91d9f927aef5490691850cf",
                    "pr_number": 39836,
                    "author": "jackzhxng",
                    "merged_by": "zucchini-nlp",
                    "job_link": "https://github.com/huggingface/transformers/actions/runs/16820781536/job/47647381225"
                }
            ]
        }
    },

jackzhxng · 2025-08-18T22:36:32Z

@zucchini-nlp fix here! #40261

jackzhxng force-pushed the jz/multimodal-decoder branch 2 times, most recently from 49e7e97 to 89e912a Compare August 1, 2025 04:29

Support input_embeds in torch exportable decoders

79e095a

jackzhxng force-pushed the jz/multimodal-decoder branch from 89e912a to 79e095a Compare August 1, 2025 04:31

jackzhxng added 5 commits July 31, 2025 21:47

Hybrid cache update

1ad0627

Manually change some callsites

35bb9a4

AI changes the rest of the call sites

68f21b8

Make either input_ids/inputs_embeds mandatory

eda53a4

Clean up

62da12e

jackzhxng force-pushed the jz/multimodal-decoder branch from 51c518f to 62da12e Compare August 1, 2025 05:39

Ruff check --fix

e03b3e0

jackzhxng changed the title ~~Support input_embeds in torch exportable decoders~~ [WIP] Support input_embeds in torch exportable decoders Aug 1, 2025

jackzhxng marked this pull request as ready for review August 1, 2025 05:45

github-actions bot requested review from MekkCyber and SunMarc August 1, 2025 05:45

Fix test

bcb30e9

jackzhxng changed the title ~~[WIP] Support input_embeds in torch exportable decoders~~ Support input_embeds in torch exportable decoders Aug 1, 2025

larryliu0820 reviewed Aug 1, 2025

View reviewed changes

src/transformers/integrations/executorch.py Outdated Show resolved Hide resolved

larryliu0820 reviewed Aug 1, 2025

View reviewed changes

src/transformers/integrations/executorch.py Outdated Show resolved Hide resolved

pr review

8ea7821

jackzhxng force-pushed the jz/multimodal-decoder branch from d3aea48 to 8ea7821 Compare August 1, 2025 18:32

larryliu0820 mentioned this pull request Aug 1, 2025

Support image-text-to-text task huggingface/optimum-executorch#111

Open

mergennachin mentioned this pull request Aug 1, 2025

add multimodal executorch support #39832

Closed

zucchini-nlp reviewed Aug 4, 2025

View reviewed changes

jackzhxng force-pushed the jz/multimodal-decoder branch 2 times, most recently from a2c29fb to 8b06186 Compare August 5, 2025 00:02

Revert config/generation_config changes

14610ed

jackzhxng force-pushed the jz/multimodal-decoder branch from 8b06186 to 14610ed Compare August 5, 2025 00:05

jackzhxng requested a review from zucchini-nlp August 5, 2025 00:05

zucchini-nlp approved these changes Aug 5, 2025

View reviewed changes

larryliu0820 approved these changes Aug 5, 2025

View reviewed changes

jackzhxng added 2 commits August 5, 2025 09:38

Ruff check

e21cf43

Merge branch 'main' into jz/multimodal-decoder

540e187

jackzhxng force-pushed the jz/multimodal-decoder branch from 7747302 to 540e187 Compare August 5, 2025 18:40

Merge branch 'main' into jz/multimodal-decoder

702cc51

kimishpatel reviewed Aug 5, 2025

View reviewed changes

kimishpatel approved these changes Aug 6, 2025

View reviewed changes

zucchini-nlp enabled auto-merge (squash) August 7, 2025 08:39

zucchini-nlp merged commit 6121e9e into huggingface:main Aug 7, 2025
19 checks passed

jackzhxng mentioned this pull request Aug 18, 2025

Fix slow static cache export tests #40261

Merged

5 tasks

remi-or mentioned this pull request Aug 20, 2025

Make cache_config not mandatory #40316

Merged

		max_batch_size (int): Maximum batch size for the cache.
		max_cache_len (int): Maximum sequence length for the cache.

Support input_embeds in torch exportable decoders #39836

Support input_embeds in torch exportable decoders #39836

Uh oh!

Conversation

jackzhxng commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackzhxng Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackzhxng commented Aug 4, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

jackzhxng commented Aug 6, 2025

Uh oh!

zucchini-nlp commented Aug 7, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 7, 2025

Uh oh!

zucchini-nlp commented Aug 8, 2025

Uh oh!

jackzhxng commented Aug 8, 2025

Uh oh!

jackzhxng commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Aug 13, 2025

Uh oh!

jackzhxng commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jackzhxng commented Aug 1, 2025 •

edited

Loading

jackzhxng Aug 4, 2025 •

edited

Loading

jackzhxng commented Aug 13, 2025 •

edited

Loading