Make cache_config not mandatory #40316

remi-or · 2025-08-20T12:16:50Z

With recent changes introduced in #39836 the tests related to executorch.py such as test_export_text_only_with_hybrid_cache will fail if the model generation_config has no cache_config or if any of the required attribute is missing from cache_config . This is the case for models such has https://huggingface.co/google/gemma-3-4b-it which is already used in gemma3 models test.
This PR aims to remove the requirement of cache_config by restoring old default values, albeit while throwing a warning so that users are aware these attributes are missing from the config. This will ensure backwards-compatibility for old models without having to edit their hub config.

cc. @jackzhxng @zucchini-nlp as this relates to #39836

HuggingFaceDocBuilderDev · 2025-08-20T12:28:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

remi-or · 2025-08-20T12:40:54Z

Converting to draft, I think there are other places in the code that need the same treatment.

zucchini-nlp

I just noticed that static cache config has been removed starting from #39106 and apparently we don't anymore need to pass anything to initiate a Cache class except for max_cache_len

@gante are we going to get rid of cache config totally? In that case we can update export to require *kwargs for earlt initialization

gante · 2025-08-20T13:25:49Z

@zucchini-nlp cache config as a plain dict is still supported, as a means to fully parameterize the desired cache from a generation_config object.

[However, due to all recent changes, I would expect bugs in its utilization 🤗 ]

remi-or · 2025-08-20T13:58:19Z

Considering cache_config is still supported as a dict, what this PR does is only remove some constraint when it is indeed a dict.
It does not conflict with #40313 so I am removing draft status.
Should we still throw a warning when cache_config is not present? I would say no because this is a fine-grained control, but I can leave the warning if you think it's needed.

Cyrilvallez

I think we can indeed remove the warnings, the defaults you gave are sound IMO

Cyrilvallez · 2025-08-20T14:38:27Z

Let's do the same in TorchExportableModuleWithStaticCache though to be coherent!

zucchini-nlp · 2025-08-20T14:59:41Z

Let's do the same in TorchExportableModuleWithStaticCache though to be coherent!

+1, I'd expect it fail as well but miraculously the tests aren't throwing errors with Static Cache

remi-or · 2025-08-20T15:49:54Z

Took care of the comments above 🤗

jackzhxng

Are we able to update the failing tests instead and specify a cache_config on the generation_config? I think it's good to explicitly specify max cache len since it might be hard to debug issues later on if the user is unaware that there is a default value of 4096

remi-or · 2025-08-21T10:06:13Z

@jackzhxng if the cache_config attribute is supposed to be fine-grained control of the generation process, it seems better to have some default values rather than throwing an error when it's not there. But I will let the transformers maintainers make the call!

zucchini-nlp

Using defaults lgtm as an easy-to-setup export for beginner users. We can add documentation on how advanced users can configure model cache and change the default values

@jackzhxng does that sound good?

Cyrilvallez

LGTM as well, but let's wait for feedback of @jackzhxng as well before merging.

@jackzhxng, why don't you think the defaults are a good idea? IMO, since the exportable classes only take the model as argument, it's a bit awkward to force the user to previously set the cache_config in the generation_config in the model itself before exporting - I do like the defaults as well. Any particular reason you were against?

gante · 2025-08-22T11:38:58Z

src/transformers/integrations/executorch.py


+        cache_config = {} if generation_config.cache_config is None else generation_config.cache_config
+        batch_size = cache_config.get("batch_size", 1)
+        max_cache_len = cache_config.get("max_cache_len", 4096)


If we default to something, should we default to model.config.max_position_embeddings instead? I would prefer no default at all, to be honest, the maximum length is often highly application-specific and a bad number is quite impactful (incorrect model behaviour and/or wasted memory)

For instance, 4096 is too large for some models (e.g., will start generating gibberish after 2k tokens) and too small for some use cases (like processing documents)

model.config.max_position_embeddings is wayyyy too big for some models IMO, we'll waste so much we could OOM on cache init

Many models (even super small ones) have more than 100k https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/config.json

I agree, and that's why I would prefer to have no defaults 😢 I don't think there is a good value that works on 99% of the cases

A missing value with a good exception is trivial to fix. A bad default is a pain to find.

No default == crash is a no no IMO 😉
We can check if it has eot -> will stop generating itself, if not we use a good default for the number of parameter, something simple like 7B -> 512 since its fast
14B -> 256
etc just to have not too much time spend but still generate a bit!

If we default to something, should we default to model.config.max_position_embeddings instead? I would prefer no default at all, to be honest, the maximum length is often highly application-specific and a bad number is quite impactful (incorrect model behaviour and/or wasted memory)

I think to add onto this, not only is a it quite impactful, it's also very tricky to debug the further down you go in the stack, especially in ExecuTorch running a model. As @gante said, max_seq_len is one of, if not the most important export variables, so I think it's good to clear confusion about this value early on by erroring out instead of setting an arbitrary default value that is hard to find.

Yep we need to either make the default value very very easy to find, or error out. Weighting both in both contexts

Alright, from this discussion it looks like more people are pro-erroring-out. WDYT about making them arguments of the exportable class though to simplify? Instead if having to set model.generation_config.cache_config which is a bit awkward

Following this discussion, I removed the defaults and added the batch_size, max_cache_len and device as args. In order to not break the current API, if those arguments are not passed, we check the cache_config for corresponding values, and error-out if they are not found.

github-actions · 2025-08-26T09:46:59Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma, gemma2, gemma3, qwen2

Cyrilvallez

Perfect thanks! I think this version makes everyone happy!

remi-or marked this pull request as draft August 20, 2025 12:40

zucchini-nlp reviewed Aug 20, 2025

View reviewed changes

remi-or marked this pull request as ready for review August 20, 2025 13:58

Cyrilvallez reviewed Aug 20, 2025

View reviewed changes

jackzhxng reviewed Aug 20, 2025

View reviewed changes

remi-or force-pushed the executorch-cache-config-fix branch from 76e1f18 to eb00c25 Compare August 21, 2025 10:01

remi-or force-pushed the executorch-cache-config-fix branch from cb7a379 to 1b8ba14 Compare August 21, 2025 10:14

zucchini-nlp approved these changes Aug 21, 2025

View reviewed changes

remi-or force-pushed the executorch-cache-config-fix branch from 1b8ba14 to 1e6f647 Compare August 21, 2025 10:18

Cyrilvallez approved these changes Aug 21, 2025

View reviewed changes

gante reviewed Aug 22, 2025

View reviewed changes

ahadnagy mentioned this pull request Aug 22, 2025

Gemma3 text fixes: Add expectations for MI325 #40384

Merged

5 tasks

remi-or added 5 commits August 26, 2025 03:39

Relaxed assumptions on cache_config

760919c

Review compliance

fb0b680

Style

80632ff

Styyyle

66b5cc3

Removed default and added args

d5d2f66

remi-or force-pushed the executorch-cache-config-fix branch from 1e6f647 to d5d2f66 Compare August 26, 2025 08:43

remi-or added 4 commits August 26, 2025 03:47

Rebase mishapfix

7ca87b9

Propagate args to TorchExportableModuleForDecoderOnlyLM

6527621

Fix the test I wanted fixed in this PR

8debe26

Added some AMD expectation related to cache tests

5277ead

Cyrilvallez approved these changes Aug 26, 2025

View reviewed changes

Cyrilvallez merged commit b8184b7 into huggingface:main Aug 26, 2025
19 checks passed

Make cache_config not mandatory #40316

Make cache_config not mandatory #40316

Uh oh!

Conversation

remi-or commented Aug 20, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 20, 2025

Uh oh!

remi-or commented Aug 20, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Aug 20, 2025

Uh oh!

remi-or commented Aug 20, 2025

Uh oh!

Cyrilvallez left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez commented Aug 20, 2025

Uh oh!

zucchini-nlp commented Aug 20, 2025

Uh oh!

remi-or commented Aug 20, 2025

Uh oh!

jackzhxng left a comment

Choose a reason for hiding this comment

Uh oh!

remi-or commented Aug 21, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

gante Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

gante Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

jackzhxng Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

remi-or Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Cyrilvallez left a comment •

edited

Loading

gante Aug 22, 2025 •

edited

Loading

ArthurZucker Aug 25, 2025 •

edited

Loading