KEMBAR78
Fix tests fsdp by SunMarc · Pull Request #41422 · huggingface/transformers · GitHub
Skip to content

Conversation

@SunMarc
Copy link
Member

@SunMarc SunMarc commented Oct 7, 2025

What does this PR do?

This PR fixes all fsdp tests as there were some regressions:

  • fsdp typing changed so hf parser had issues
  • fused_adamw_torch (new default for users with torch >=2.8) + fp16 + FSDP don't work cc @winglian @qgallouedec did you face this issue ?
  • Fix cmd by using fsdp_config, this was broken for way too long

Run tests with 2 gpus

RUN_SLOW=True pytest tests/fsdp/test_fsdp.py

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc SunMarc added the for patch Tag issues / labels that should be included in the next patch label Oct 7, 2025
@cyyever
Copy link
Contributor

cyyever commented Oct 8, 2025

What issues are in hf parser?

@SunMarc
Copy link
Member Author

SunMarc commented Oct 8, 2025

What issues are in hf parser?

It has trouble dealing with union of different types and there were some strange cases involving optional. As you can see in this pr, --fsdp arg was incorrectly parsed. We need to fix the parser at some point ;)

@cyyever
Copy link
Contributor

cyyever commented Oct 8, 2025

@SunMarc The current behaviour was made by me. Now if not specified fsdp_config is []. What tests are now broken so I have a chance to look and fix? We also have to fix union in parser because Optional[A] can be represented as A | None in newer typing.

@SunMarc
Copy link
Member Author

SunMarc commented Oct 8, 2025

If you are interested in fixing the parser, you can use play with the following script and use for example --fsdp arg to see if the trainings_args are correctly parsed ! Without this PR for example, you still see that
python script.py --fsdp "full_shard auto_wrap offload" will fail.

from transformers import TrainingArguments, HfArgumentParser

parser = HfArgumentParser((TrainingArguments,))
training_args, = parser.parse_args_into_dataclasses()

The tests should be fixed by this PR but this is not a proper fix tbh

@cyyever
Copy link
Contributor

cyyever commented Oct 8, 2025

@SunMarc Yes, give me time to reproduce and fix.

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Trusting you here for fsdp!

@SunMarc SunMarc merged commit add4df6 into main Oct 9, 2025
29 checks passed
@SunMarc SunMarc deleted the fix-fsdp-tests branch October 9, 2025 12:09
AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025
* Fix tests

* fix !

* fix
Cyrilvallez pushed a commit that referenced this pull request Oct 14, 2025
* Fix tests

* fix !

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants