KEMBAR78
deepspeed train flux1 dreambooth lora can not save model · Issue #9393 · huggingface/diffusers · GitHub
Skip to content

deepspeed train flux1 dreambooth lora can not save model #9393

@ldtgodlike

Description

@ldtgodlike

Describe the bug

when I run the script train_dreambooth_lora_flux.py. It raise ValueError: unexpected save model: <class 'deepspeed.runtime.engine.DeepSpeedEngine'>. something bug in save_model_hook?
Uploading image.png…

Reproduction

accelerate launch train_dreambooth_lora_flux_custom.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--output_dir=$OUTPUT_DIR
--mixed_precision="bf16"
--instance_prompt="bedroom, YF_CN style"
--resolution=1024
--train_batch_size=1
--guidance_scale=1
--gradient_accumulation_steps=4
--optimizer="prodigy"
--learning_rate=1.
--report_to="tensorboard"
--lr_scheduler="constant"
--lr_warmup_steps=0
--num_train_epochs=30
--validation_prompt="bedroom, YF_CN style"
--validation_epochs=80
--checkpointing_steps=500
--seed="0"
--gradient_checkpointing
--use_8bit_adam
--rank=4

Logs

No response

System Info

torch==2.3.1
accelerate==0.34.2
deepspeed==0.15.1+8ac42ed7
diffusers==0.31.0.dev0

default_config.yaml as follow:

compute_environment: LOCAL_MACHINE
debug: true
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: false
zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: fals

Who can help?

@sayakpaul

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions