[LoRA] feat: lora support for SANA. #10234

sayakpaul · 2024-12-16T05:39:40Z

What does this PR do?

LoRA support to SANA
LoRA fine-tuning script
Tests for the training script
README for the training script

Example LoRA fine-tuning command:

CUDA_VISIBLE_DEVICES=0 accelerate launch train_dreambooth_lora_sana.py \
  --pretrained_model_name_or_path=Efficient-Large-Model/Sana_1600M_1024px_diffusers \
  --dataset_name=Norod78/Yarn-art-style --instance_prompt="a puppy, yarn art style" \
  --output_dir=yarn_art_lora_sana \
  --mixed_precision=bf16 --use_8bit_adam \
  --weighting_scheme=none \
  --resolution=1024 --train_batch_size=1 --repeats=1 \
  --learning_rate=1e-4 --report_to=wandb \
  --gradient_accumulation_steps=1 --gradient_checkpointing \
  --lr_scheduler=constant --lr_warmup_steps=0 --rank=4 \
  --max_train_steps=700 --checkpointing_steps=2000 --seed=0 \
  --validation_prompt="a puppy in a pond, yarn art style" --validation_epochs=1 \
  --push_to_hub

Notes

mixed_precision="fp16" is leading to NaN loss values despite the recommendation to use FP16 for "Efficient-Large-Model/Sana_1600M_1024px_diffusers".
VAE is kept in FP32, always.
When FP16 mixed-precision is used, we cast the LoRA params to FP32 while keeping the transformer in FP16.
Only QKV is LoRA targeted for LoRA.

Results

https://wandb.ai/sayakpaul/dreambooth-sana-lora/runs/tf9fo8o6

Pre-trained LoRA: https://huggingface.co/sayakpaul/yarn_art_lora_sana

sayakpaul · 2024-12-16T05:45:20Z

tests/lora/test_lora_layers_sana.py

+
+        return noise, input_ids, pipeline_inputs
+
+    @unittest.skip("Not supported in Sana.")


Skipped tests are the same as Mochi.

sayakpaul · 2024-12-16T05:45:41Z

tests/lora/test_lora_layers_sana.py

+            "prompt": "",
+            "negative_prompt": "",


Check this internal thread:
https://huggingface.slack.com/archives/C065E480NN9/p1734324025408149

src/diffusers/pipelines/sana/pipeline_sana.py

HuggingFaceDocBuilderDev · 2024-12-16T05:47:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w · 2024-12-16T06:23:19Z

So was not requested for review, but saw the latest commit on email notifications about attention_kwargs rename to cross_attention_kwargs. Is it possible to consistently name this simply attention_kwargs across the library?

I had shared the concern in a previous lora refactor PR and this comment. This is because I often find myself having to refer to the documentation for different pipelines instead of just being able to use one consistent parameter name to pass lora scale with, and it is frustrating because you wait for the pipeline to load only to find it fail immediately. I'm not sure if others resonate with this, but anyone using loras often will have faced this.

We have attention_kwargs, cross_attention_kwargs and joint_attention_kwargs. IMO these can all just be called attention_kwargs and we don't need to make this distinction. I know we can't just change it for all the old pipelines until a release where we are okay with backwards-breaking (1.0.0), but going forward, let's maybe try using same name everywhere? WDYT?

sayakpaul · 2024-12-16T06:32:53Z

Yeah I don't mind.

This reverts commit 23433bf.

a-r-r-o-w

Thanks for the super fast work! Looks good to merge after some of the more important reviews are addressed

a-r-r-o-w · 2024-12-17T20:17:25Z

examples/dreambooth/train_dreambooth_lora_sana.py

+    vae.to(dtype=torch.float32)
+    transformer.to(accelerator.device, dtype=weight_dtype)
+    # because Gemma2 is particularly suited for bfloat16.
+    text_encoder.to(dtype=torch.bfloat16)


I think we could instead load with torch_dtype=torch.bfloat16 and keept he same comment. This is because weight casting this way ignores _keep_modules_in_fp32. I could not get our numerical precision unit tests to match when using the two different ways when working on the integration PR

Oh this is coming from the example provided in https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana#diffusers.SanaPipeline.__call__.example. In this case, we're doing the exact same thing and we are not fine-tuning the text encoder.

a-r-r-o-w · 2024-12-17T20:18:41Z

examples/dreambooth/train_dreambooth_lora_sana.py

+        )
+
+    # VAE should always be kept in fp32 for SANA (?)
+    vae.to(dtype=torch.float32)


FP32 should be good, but I'm not 100% sure. I think AutoencoderDC were all trained in bf16. Maybe @lawrence-cj can comment

This is just to be sure VAE's precision isn't a bottleneck for getting good quality training runs. This is anyway a small VAE, won't matter too much I guess.

Yes. AutoencoderDC is trained under BF16 and FP32 testing is also fine, just it'll cost a lot of additional GPU memory in FP32.

We're offloading it to CPU when it's not used and when cache_latents is supplied through the CLI, we will precompute the latents and delete the VAE. So, I guess okay for now?

I mean the VAE.decode() part will consume much more GPU memory if it runs in FP32, specially when the batch_size is more than 1, not the VAE model itself. Not sure if I understand right.

Oh I think we should be good as we barely make use of decode() in training.

OK, that's cool. Then the only concern is that when we visualize the training results during training

examples/dreambooth/train_dreambooth_lora_sana.py

src/diffusers/loaders/lora_pipeline.py

a-r-r-o-w · 2024-12-17T20:21:50Z

src/diffusers/pipelines/pag/pipeline_pag_sana.py

        clean_caption: bool = False,
        max_sequence_length: int = 300,
        complex_human_instruction: Optional[List[str]] = None,
+        lora_scale: Optional[float] = None,


Are we training text encoder? If not, we can remove these changes maybe

This was to have no surprise for our users when text encoder training support is merged. It's common to see the encode_prompt() method being equipped with handling lora_scale.

src/diffusers/pipelines/pag/pipeline_pag_sana.py

src/diffusers/pipelines/sana/pipeline_sana.py

tests/lora/test_lora_layers_sana.py

tests/lora/utils.py

Co-authored-by: Aryan <aryan@huggingface.co>

sayakpaul · 2024-12-18T01:52:00Z

@a-r-r-o-w your comments have been addressed. @lawrence-cj could you review / test the training script if you want?

yiyixuxu

thanks!

sayakpaul · 2024-12-18T02:52:28Z

Failing tests are unrelated and can safely be ignored. Will add a training test in a followup PR.

lawrence-cj · 2024-12-18T03:02:50Z

@a-r-r-o-w your comments have been addressed. @lawrence-cj could you review / test the training script if you want?

Working on it. Will fine-tune the model using your pokemon dataset.

* feat: lora support for SANA. * make fix-copies * rename test class. * attention_kwargs -> cross_attention_kwargs. * Revert "attention_kwargs -> cross_attention_kwargs." This reverts commit 23433bf. * exhaust 119 max line limit * sana lora fine-tuning script. * readme * add a note about the supported models. * Apply suggestions from code review Co-authored-by: Aryan <aryan@huggingface.co> * style * docs for attention_kwargs. * remove lora_scale from pag pipeline. * copy fix --------- Co-authored-by: Aryan <aryan@huggingface.co>

sayakpaul added 2 commits December 16, 2024 11:08

feat: lora support for SANA.

580a6d5

make fix-copies

71574e9

sayakpaul commented Dec 16, 2024

View reviewed changes

src/diffusers/pipelines/sana/pipeline_sana.py Show resolved Hide resolved

sayakpaul added 2 commits December 16, 2024 11:19

rename test class.

f219198

attention_kwargs -> cross_attention_kwargs.

23433bf

sayakpaul added 4 commits December 16, 2024 12:24

Revert "attention_kwargs -> cross_attention_kwargs."

d07d5e2

This reverts commit 23433bf.

exhaust 119 max line limit

906f2f0

Merge branch 'main' into sana-lora

833fe67

sana lora fine-tuning script.

812c73f

sayakpaul requested a review from a-r-r-o-w December 16, 2024 08:32

sayakpaul marked this pull request as ready for review December 16, 2024 08:40

readme

b7e16ea

sayakpaul added roadmap Add to current release roadmap lora labels Dec 16, 2024

sayakpaul added 2 commits December 16, 2024 14:51

Merge branch 'main' into sana-lora

5a558be

add a note about the supported models.

8882813

a-r-r-o-w approved these changes Dec 17, 2024

View reviewed changes

a-r-r-o-w and others added 6 commits December 18, 2024 01:56

Merge branch 'main' into sana-lora

74bb1a7

Apply suggestions from code review

20916af

Co-authored-by: Aryan <aryan@huggingface.co>

Merge branch 'main' into sana-lora

4681748

style

92b231c

docs for attention_kwargs.

e098880

remove lora_scale from pag pipeline.

12161c0

sayakpaul changed the title ~~[WIP][LoRA] feat: lora support for SANA.~~ [LoRA] feat: lora support for SANA. Dec 18, 2024

copy fix

aca7e8f

sayakpaul requested a review from yiyixuxu December 18, 2024 01:52

yiyixuxu approved these changes Dec 18, 2024

View reviewed changes

sayakpaul merged commit 9408aa2 into main Dec 18, 2024
13 of 15 checks passed

sayakpaul deleted the sana-lora branch December 18, 2024 02:52

sayakpaul mentioned this pull request Jan 16, 2025

lack of support of loading lora weights in PixArtAlphaPipeline #9887

Open

yiyixuxu mentioned this pull request May 20, 2025

PixArt Sigma PEFT LoRA loader support #11216

Open


		return noise, input_ids, pipeline_inputs

		@unittest.skip("Not supported in Sana.")

[LoRA] feat: lora support for SANA. #10234

[LoRA] feat: lora support for SANA. #10234

Uh oh!

Conversation

sayakpaul commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Notes

Results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Dec 16, 2024

Uh oh!

a-r-r-o-w commented Dec 16, 2024

Uh oh!

sayakpaul commented Dec 16, 2024

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lawrence-cj Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sayakpaul commented Dec 18, 2024

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Dec 18, 2024

Uh oh!

Uh oh!

lawrence-cj commented Dec 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sayakpaul commented Dec 16, 2024 •

edited

Loading

lawrence-cj Dec 18, 2024 •

edited

Loading