KEMBAR78
Enable dreambooth lora finetune example on other devices by jiqing-feng · Pull Request #10602 · huggingface/diffusers · GitHub
Skip to content

Conversation

@jiqing-feng
Copy link
Contributor

@jiqing-feng jiqing-feng commented Jan 17, 2025

This PR mainly changed 2 points to enable the example on other devices:

  1. Replace torch.cuda.amp.autocast() by torch.amp.autocast(device) so other devices can also use it
  2. Empty cache for xpu.

@jiqing-feng jiqing-feng marked this pull request as draft January 17, 2025 06:56
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@kaixuanliu
Copy link
Contributor

LGTM

@jiqing-feng jiqing-feng marked this pull request as ready for review January 20, 2025 01:04
@jiqing-feng
Copy link
Contributor Author

Hi @sayakpaul . Would you please review this PR? Thanks!

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contribution! I just left some comments, LMK if they make sense.

Comment on lines 836 to 838
torch.cuda.empty_cache()
if hasattr(torch, "xpu") and torch.xpu.is_available():
torch.xpu.empty_cache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@jiqing-feng
Copy link
Contributor Author

Hi @sayakpaul . I have fixed your comments. For mixed-precision, I am not sure because we have torch_dtype and weight_dtype here. In my view, the mix-precision should work on lora module but not SD based model because the weight_dtype has already been converted to the mix-precision.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment and we should be good to go.

Have you verified if it works effectively? If so, could you share some results?

@jiqing-feng
Copy link
Contributor Author

jiqing-feng commented Jan 21, 2025

Just one comment and we should be good to go.

Have you verified if it works effectively? If so, could you share some results?

For cuda, it just enabled the DDP finetune and had no impact on the result as previously.

The result of Intel Xeon CPU is:
image

Sorry, I cannot release the latency performance data cause it is not allowed by Intel.

@sayakpaul
Copy link
Member

For cuda, it just enabled the DDP finetune and had no impact on the result as previously.

What do you mean by this? I don't see the dog from the training set appear here. Am I missing something?

@jiqing-feng
Copy link
Contributor Author

For cuda, it just enabled the DDP finetune and had no impact on the result as previously.

What do you mean by this? I don't see the dog from the training set appear here. Am I missing something?

Sorry for I didn't upload the cuda result image successfully, I will run again to get the result and give you feedback ASAP.

@jiqing-feng
Copy link
Contributor Author

Hi @sayakpaul , here is the cuda result runs on 2*A100 cards
image

@sayakpaul
Copy link
Member

And how about the XPU result?

@jiqing-feng
Copy link
Contributor Author

And how about the XPU result?

Just got the XPU result with 2 DDP across 2 cards:
image

The XPU needs to tune the args to get a high-quality result.

@sayakpaul
Copy link
Member

Nice, this is good.

@jiqing-feng
Copy link
Contributor Author

The failed test seems unrelated to my changes. Please let me know what needs to be changed or request any other reviewers before merging. Thanks!

@sayakpaul sayakpaul merged commit 012d08b into huggingface:main Jan 21, 2025
11 of 12 checks passed
@sayakpaul
Copy link
Member

Thanks much!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants