KEMBAR78
Avoid grad sync on each step even when doing accumulation by KohakuBlueleaf · Pull Request #1064 · kohya-ss/sd-scripts · GitHub
Skip to content

Conversation

@KohakuBlueleaf
Copy link
Contributor

Refer to the official document: https://huggingface.co/docs/accelerate/concept_guides/gradient_synchronization

We only need to sync our gradient when we need to update the weight. But the fix by @Isotr0py force gradient sync on each batch even when we don't need to update the weights.

Want to confirm with @Isotr0py if my modification is safe. I tried on my 2gpu machine and it works.

@Isotr0py
Copy link
Contributor

Isotr0py commented Jan 23, 2024

Thanks for correction! I forgot to test with gradient accumulation before. I think your modification is safe.

I tested with gradient_accumulation_steps=2 and the outputs seems to be OK:

Rank 1, weight: 0.00012163435894763097, grad: -6.936413110558703e-10, sync:False, step=2
Rank 0, weight: 0.00012163435894763097, grad: 1.6429112292826176e-08, sync:False, step=2

Rank 1, weight: 0.00012163435894763097, grad: -1.8629542353210127e-13, sync:True, step=3
Rank 0, weight: 0.00012163435894763097, grad: -1.8629542353210127e-13, sync:True, step=3

Rank 1, weight: 0.0001216398595715873, grad: -9.085246333029318e-09, sync:False, step=4
Rank 0, weight: 0.0001216398595715873, grad: -2.3792381398379803e-08, sync:False, step=4

Rank 0, weight: 0.0001216398595715873, grad: -3.675208466551172e-13, sync:True, step=5
Rank 1, weight: 0.0001216398595715873, grad: -3.675208466551172e-13, sync:True, step=5

Rank 0, weight: 0.00012164600775577128, grad: -1.2868744292404699e-08, sync:False, step=6
Rank 1, weight: 0.00012164600775577128, grad: -7.038276628179574e-09, sync:False, step=6

Rank 1, weight: 0.00012164600775577128, grad: -1.0241067976285434e-12, sync:True, step=7
Rank 0, weight: 0.00012164600775577128, grad: -1.0241067976285434e-12, sync:True, step=7

Rank 0, weight: 0.00012166520900791511, grad: -6.624031811952591e-08, sync:False, step=8
Rank 1, weight: 0.00012166520900791511, grad: -5.537489045082111e-08, sync:False, step=8

Rank 0, weight: 0.00012166520900791511, grad: -1.0303240465664443e-12, sync:True, step=9
Rank 1, weight: 0.00012166520900791511, grad: -1.0303240465664443e-12, sync:True, step=9

Rank 1, weight: 0.00012176889867987484, grad: -3.346940502524376e-07, sync:False, step=9
Rank 0, weight: 0.00012176889867987484, grad: -1.471926225349307e-07, sync:False, step=9

Rank 1, weight: 0.00012176889867987484, grad: -5.062247062509462e-12, sync:True, step=10
Rank 0, weight: 0.00012176889867987484, grad: -5.062247062509462e-12, sync:True, step=10

@KohakuBlueleaf
Copy link
Contributor Author

@Isotr0py Thx your test!!

@kohya-ss
Copy link
Owner

Thank you for this PR. It looks good!

@kohya-ss kohya-ss merged commit 7a20df5 into kohya-ss:dev Jan 23, 2024
nana0304 pushed a commit to nana0304/sd-scripts that referenced this pull request Jun 4, 2025
Avoid grad sync on each step even when doing accumulation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants