-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Open
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates
Description
Describe the bug
AttributeError when running model parallel distributed training with accelerate
Reproduction
accelerate launch --config_file train_dreambooth_lora_flux.py
--resolution=1024
--mixed_precision=bf16
--pretrained_model_name_or_path=black-forest-labels/FLUX.1-dev
--num_validation_images=8
--validation_epochs=100
--rank=16
--train_batch_size=1
--learning_rate=1e-4
--guidance_scale=3.5
--checkpointing_steps=200
--instance_prompt=xyz
--instance_data_dir=xyz
--output_dir=xyz
--logging_dir=xyz
--validation_prompt=xyz
accelerate config:
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
use_cpu: false
gpu_ids: '0, 1'
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
Logs
if transformer.config.guidance_embeds:
AttributeError: DistributedDataParallel object has no attribute configSystem Info
diffusers from source
accelerate==0.33.0
transformers==4.44.1
training on A100s
Who can help?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates