-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
Describe the bug
when I run the script train_dreambooth_lora_flux.py. It raise ValueError: unexpected save model: <class 'deepspeed.runtime.engine.DeepSpeedEngine'>. something bug in save_model_hook?
Reproduction
accelerate launch train_dreambooth_lora_flux_custom.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--output_dir=$OUTPUT_DIR
--mixed_precision="bf16"
--instance_prompt="bedroom, YF_CN style"
--resolution=1024
--train_batch_size=1
--guidance_scale=1
--gradient_accumulation_steps=4
--optimizer="prodigy"
--learning_rate=1.
--report_to="tensorboard"
--lr_scheduler="constant"
--lr_warmup_steps=0
--num_train_epochs=30
--validation_prompt="bedroom, YF_CN style"
--validation_epochs=80
--checkpointing_steps=500
--seed="0"
--gradient_checkpointing
--use_8bit_adam
--rank=4
Logs
No response
System Info
torch==2.3.1
accelerate==0.34.2
deepspeed==0.15.1+8ac42ed7
diffusers==0.31.0.dev0
default_config.yaml as follow:
compute_environment: LOCAL_MACHINE
debug: true
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: false
zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: fals