Fix bnb fsdp loading for pre-quantized checkpoint #41415

SunMarc · 2025-10-07T15:30:31Z

What does this PR do?

This PR fixes bnb loading when using FSDP for pre-quantized checkpoints. This happened because we changed how we load quantized checkpoints as we need to cache all the quantized stats before creating the quantized weight.

SunMarc · 2025-10-07T15:32:06Z

cc @winglian

HuggingFaceDocBuilderDev · 2025-10-07T15:42:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez

Left a few comments about naming for clarity, otherwise LGTM!

Cyrilvallez · 2025-10-09T09:37:24Z

src/transformers/modeling_utils.py

+                    val_kwargs = value.__dict__
+                    if value.dtype in [torch.uint8, torch.int8]:


Maybe just value.is_floating_point() if that works?

that should work ! I think that at some point it should be fine to even remove that if the modules are correctly initialized

Cyrilvallez · 2025-10-09T09:38:08Z

src/transformers/modeling_utils.py

+                    if value.dtype in [torch.uint8, torch.int8]:
                        val_kwargs["requires_grad"] = False
-                    value = type(value)(value.data.to(param_to), **val_kwargs, **value.__dict__)
+                    param_to = "meta" if is_fsdp_enabled() and not is_local_dist_rank_0() else "cpu"


Let's just call it device IMO, param_to is a bit weird

Cyrilvallez · 2025-10-09T09:41:32Z

src/transformers/quantizers/quantizer_bnb_4bit.py

+    def update_param_name(self, param_name: str) -> str:
+        """


Let's maybe call it get_param_name instead as it does not update it

Cyrilvallez · 2025-10-09T09:42:52Z

src/transformers/modeling_utils.py

-                    # special case for gpt_oss model, we wait for the param to be leave the meta device before casting it to cpu
-                    if model.config.model_type == "gpt_oss" and value.device.type == "meta":
+                    # We need to wait until the quantized value is created
+                    if value.device.type == "meta":


Still a bit weird to me that we have to do this, but I wanted to investigate further anyway to remove the gpt-oss special exception - already happy to see it a bit more general and not gpt-oss-specific!

github-actions · 2025-10-09T15:27:32Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: mxfp4

* fix * fix * get_param_name * fix device name

fix

b90ed34

SunMarc changed the title ~~Fix bnb fsdp loading~~ Fix bnb fsdp loading for pre-quantized checkpoint Oct 7, 2025

fix

a18fe86

SunMarc requested a review from Cyrilvallez October 7, 2025 15:31

SunMarc added the for patch Tag issues / labels that should be included in the next patch label Oct 7, 2025

SunMarc requested review from ArthurZucker and MekkCyber October 8, 2025 10:08

Cyrilvallez approved these changes Oct 9, 2025

View reviewed changes

SunMarc added 2 commits October 9, 2025 14:53

get_param_name

d831ab2

fix device name

3b45300

SunMarc merged commit 823fab4 into main Oct 9, 2025
26 checks passed

SunMarc deleted the fix-fsdp-quant branch October 9, 2025 16:05

AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025

Fix bnb fsdp loading for pre-quantized checkpoint (huggingface#41415)

8993c7b

* fix * fix * get_param_name * fix device name

Cyrilvallez pushed a commit that referenced this pull request Oct 14, 2025

Fix bnb fsdp loading for pre-quantized checkpoint (#41415)

2fbd25c

* fix * fix * get_param_name * fix device name

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bnb fsdp loading for pre-quantized checkpoint #41415

Fix bnb fsdp loading for pre-quantized checkpoint #41415

SunMarc commented Oct 7, 2025

Uh oh!

SunMarc commented Oct 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 7, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Cyrilvallez Oct 9, 2025

Uh oh!

SunMarc Oct 9, 2025

Uh oh!

Cyrilvallez Oct 9, 2025

Uh oh!

SunMarc Oct 9, 2025

Uh oh!

Cyrilvallez Oct 9, 2025

Uh oh!

SunMarc Oct 9, 2025

Uh oh!

Cyrilvallez Oct 9, 2025

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		val_kwargs = value.__dict__
		if value.dtype in [torch.uint8, torch.int8]:

Fix bnb fsdp loading for pre-quantized checkpoint #41415

Fix bnb fsdp loading for pre-quantized checkpoint #41415

Conversation

SunMarc commented Oct 7, 2025

What does this PR do?

Uh oh!

SunMarc commented Oct 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 7, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

SunMarc Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

SunMarc Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

SunMarc Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants