Avoid grad sync on each step even when doing accumulation #1064

KohakuBlueleaf · 2024-01-23T03:53:15Z

Refer to the official document: https://huggingface.co/docs/accelerate/concept_guides/gradient_synchronization

We only need to sync our gradient when we need to update the weight. But the fix by @Isotr0py force gradient sync on each batch even when we don't need to update the weights.

Want to confirm with @Isotr0py if my modification is safe. I tried on my 2gpu machine and it works.

Isotr0py · 2024-01-23T05:32:41Z

Thanks for correction! I forgot to test with gradient accumulation before. I think your modification is safe.

I tested with gradient_accumulation_steps=2 and the outputs seems to be OK:

Rank 1, weight: 0.00012163435894763097, grad: -6.936413110558703e-10, sync:False, step=2
Rank 0, weight: 0.00012163435894763097, grad: 1.6429112292826176e-08, sync:False, step=2

Rank 1, weight: 0.00012163435894763097, grad: -1.8629542353210127e-13, sync:True, step=3
Rank 0, weight: 0.00012163435894763097, grad: -1.8629542353210127e-13, sync:True, step=3

Rank 1, weight: 0.0001216398595715873, grad: -9.085246333029318e-09, sync:False, step=4
Rank 0, weight: 0.0001216398595715873, grad: -2.3792381398379803e-08, sync:False, step=4

Rank 0, weight: 0.0001216398595715873, grad: -3.675208466551172e-13, sync:True, step=5
Rank 1, weight: 0.0001216398595715873, grad: -3.675208466551172e-13, sync:True, step=5

Rank 0, weight: 0.00012164600775577128, grad: -1.2868744292404699e-08, sync:False, step=6
Rank 1, weight: 0.00012164600775577128, grad: -7.038276628179574e-09, sync:False, step=6

Rank 1, weight: 0.00012164600775577128, grad: -1.0241067976285434e-12, sync:True, step=7
Rank 0, weight: 0.00012164600775577128, grad: -1.0241067976285434e-12, sync:True, step=7

Rank 0, weight: 0.00012166520900791511, grad: -6.624031811952591e-08, sync:False, step=8
Rank 1, weight: 0.00012166520900791511, grad: -5.537489045082111e-08, sync:False, step=8

Rank 0, weight: 0.00012166520900791511, grad: -1.0303240465664443e-12, sync:True, step=9
Rank 1, weight: 0.00012166520900791511, grad: -1.0303240465664443e-12, sync:True, step=9

Rank 1, weight: 0.00012176889867987484, grad: -3.346940502524376e-07, sync:False, step=9
Rank 0, weight: 0.00012176889867987484, grad: -1.471926225349307e-07, sync:False, step=9

Rank 1, weight: 0.00012176889867987484, grad: -5.062247062509462e-12, sync:True, step=10
Rank 0, weight: 0.00012176889867987484, grad: -5.062247062509462e-12, sync:True, step=10

KohakuBlueleaf · 2024-01-23T05:51:30Z

@Isotr0py Thx your test!!

kohya-ss · 2024-01-23T11:33:43Z

Thank you for this PR. It looks good!

Avoid grad sync on each step even when doing accumulation

Avoid always sync

711b40c

kohya-ss merged commit 7a20df5 into kohya-ss:dev Jan 23, 2024

nana0304 pushed a commit to nana0304/sd-scripts that referenced this pull request Jun 4, 2025

Merge pull request kohya-ss#1064 from KohakuBlueleaf/fix-grad-sync

dc31f7e

Avoid grad sync on each step even when doing accumulation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Avoid grad sync on each step even when doing accumulation #1064

Avoid grad sync on each step even when doing accumulation #1064

Uh oh!

KohakuBlueleaf commented Jan 23, 2024

Uh oh!

Isotr0py commented Jan 23, 2024 •

edited

Loading

Uh oh!

KohakuBlueleaf commented Jan 23, 2024

Uh oh!

kohya-ss commented Jan 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Avoid grad sync on each step even when doing accumulation #1064

Avoid grad sync on each step even when doing accumulation #1064

Uh oh!

Conversation

KohakuBlueleaf commented Jan 23, 2024

Uh oh!

Isotr0py commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KohakuBlueleaf commented Jan 23, 2024

Uh oh!

kohya-ss commented Jan 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Isotr0py commented Jan 23, 2024 •

edited

Loading