-
Notifications
You must be signed in to change notification settings - Fork 6.4k
[Sana] Add Sana, including SanaPipeline
, SanaPAGPipeline
, LinearAttentionProcessor
, Flow-based DPM-sovler
and so on.
#9982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # src/diffusers/models/normalization.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
without complex human instruction: with: is it possible there is something wrong with the CHI implementation here? it makes all images worse. for example with CHI enabled it's putting 508 tokens of input through the model instead of just 300 (206 from CHI plus the 300 prompt tokens (padded) and i don't know why we need this many tokens. is it supposed to be 300 total? |
What’s your inference code? @bghira |
we use |
What's your prompt? @bghira |
@hlky Would you like to give the changes to schedulers here a review? I'm preparing to merge it shortly after I add the integration tests in the next hour since YiYi has approved and confirmed on Slack. I've tested all the normal models (not the multilingual ones) and they seem to work well (I did the conversions myself when testing, but for the integration tests, I will be using the remote checkpoints and match slices). I have not exhaustively tested all scheduler changes though - only DPMSolverMultistep and FlowMatchEulerDiscrete, but I think that should be okay since it is copied logic (from |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-r-r-o-w Scheduler changes look good, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @lawrence-cj and team! The paper was very insightful and it was very cool to come across the ideas developed.
Thanks for bearing with our reviews too! Will merge the PR once the CI passes
Thank you so much for your effort! Love you guys. I was stuck by other things, sorry for the late reply! ! |
…AttentionProcessor`, `Flow-based DPM-sovler` and so on. (#9982) * first add a script for DC-AE; * DC-AE init * replace triton with custom implementation * 1. rename file and remove un-used codes; * no longer rely on omegaconf and dataclass * replace custom activation with diffuers activation * remove dc_ae attention in attention_processor.py * iinherit from ModelMixin * inherit from ConfigMixin * dc-ae reduce to one file * update downsample and upsample * clean code * support DecoderOutput * remove get_same_padding and val2tuple * remove autocast and some assert * update ResBlock * remove contents within super().__init__ * Update src/diffusers/models/autoencoders/dc_ae.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * remove opsequential * update other blocks to support the removal of build_norm * remove build encoder/decoder project in/out * remove inheritance of RMSNorm2d from LayerNorm * remove reset_parameters for RMSNorm2d Co-authored-by: YiYi Xu <yixu310@gmail.com> * remove device and dtype in RMSNorm2d __init__ Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/models/autoencoders/dc_ae.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/models/autoencoders/dc_ae.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/models/autoencoders/dc_ae.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * remove op_list & build_block * remove build_stage_main * change file name to autoencoder_dc * move LiteMLA to attention.py * align with other vae decode output; * add DC-AE into init files; * update * make quality && make style; * quick push before dgx disappears again * update * make style * update * update * fix * refactor * refactor * refactor * update * possibly change to nn.Linear * refactor * make fix-copies * replace vae with ae * replace get_block_from_block_type to get_block * replace downsample_block_type from Conv to conv for consistency * add scaling factors * incorporate changes for all checkpoints * make style * move mla to attention processor file; split qkv conv to linears * refactor * add tests * from original file loader * add docs * add standard autoencoder methods * combine attention processor * fix tests * update * minor fix * minor fix * minor fix & in/out shortcut rename * minor fix * make style * fix paper link * update docs * update single file loading * make style * remove single file loading support; todo for DN6 * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add abstract * 1. add DCAE into diffusers; 2. make style and make quality; * add DCAE_HF into diffusers; * bug fixed; * add SanaPipeline, SanaTransformer2D into diffusers; * add sanaLinearAttnProcessor2_0; * first update for SanaTransformer; * first update for SanaPipeline; * first success run SanaPipeline; * model output finally match with original model with the same intput; * code update; * code update; * add a flow dpm-solver scripts * 🎉[important update] 1. Integrate flow-dpm-sovler into diffusers; 2. finally run successfully on both `FlowMatchEulerDiscreteScheduler` and `FlowDPMSolverMultistepScheduler`; * 🎉🔧[important update & fix huge bugs!!] 1. add SanaPAGPipeline & several related Sana linear attention operators; 2. `SanaTransformer2DModel` not supports multi-resolution input; 2. fix the multi-scale HW bugs in SanaPipeline and SanaPAGPipeline; 3. fix the flow-dpm-solver set_timestep() init `model_output` and `lower_order_nums` bugs; * remove prints; * add convert sana official checkpoint to diffusers format Safetensor. * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/diffusers/pipelines/pag/pipeline_pag_sana.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/diffusers/pipelines/sana/pipeline_sana.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/diffusers/pipelines/sana/pipeline_sana.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update Sana for DC-AE's recent commit; * make style && make quality * Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932) * fix progress bar updates in SD 1.5 PAG Img2Img pipeline --------- Co-authored-by: Vinh H. Pham <phamvinh257@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * make the vae can be None in `__init__` of `SanaPipeline` * Update src/diffusers/models/transformers/sana_transformer_2d.py Co-authored-by: hlky <hlky@hlky.ac> * change the ae related code due to the latest update of DCAE branch; * change the ae related code due to the latest update of DCAE branch; * 1. change code based on AutoencoderDC; 2. fix the bug of new GLUMBConv; 3. run success; * update for solving conversation. * 1. fix bugs and run convert script success; 2. Downloading ckpt from hub automatically; * make style && make quality; * 1. remove un-unsed parameters in init; 2. code update; * remove test file * refactor; add docs; add tests; update conversion script * make style * make fix-copies * refactor * udpate pipelines * pag tests and refactor * remove sana pag conversion script * handle weight casting in conversion script * update conversion script * add a processor * 1. add bf16 pth file path; 2. add complex human instruct in pipeline; * fix fast \tests * change gemma-2-2b-it ckpt to a non-gated repo; * fix the pth path bug in conversion script; * change grad ckpt to original; make style * fix the complex_human_instruct bug and typo; * remove dpmsolver flow scheduler * apply review suggestions * change the `FlowMatchEulerDiscreteScheduler` to default `DPMSolverMultistepScheduler` with flow matching scheduler. * fix the tokenizer.padding_side='right' bug; * update docs * make fix-copies * fix imports * fix docs * add integration test * update docs * update examples * fix convert_model_output in schedulers * fix failing tests --------- Co-authored-by: Junyu Chen <chenjydl2003@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: chenjy2003 <70215701+chenjy2003@users.noreply.github.com> Co-authored-by: Aryan <aryan@huggingface.co> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: hlky <hlky@hlky.ac>
@lawrence-cj @hlky diffusers/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py Lines 420 to 424 in 924f880
it does not work when we pass our own timesteps here. since it can be either timesteps or num_timesteps. do i just overwrite the timesteps here? and compute sigmas and alphas from them? my timesteps are custom one's with additional sigmoid timeshifting etc?
|
What does this PR do?
This PR will add the official Sana (SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer) into the diffusers lib. Sana first makes the Text-to-Image available on 32x compressed latent space, powered by DC-AE(https://arxiv.org/abs/2410.10733v1) without performance degradation. Also, Sana contains several popular efficiency related techs, like DiT with Linear Attention processor and we use Decoder-only LLM (Gemma-2B-IT) for low GPU requirement and fast speed.
Paper: https://arxiv.org/abs/2410.10629
Original code repo: https://github.com/NVlabs/Sana
Project: https://nvlabs.github.io/Sana
Core contributor of DC-AE:
work with @johnny_ez@163.com
Core library:
We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul, @yiyixuxu
Core library:
HF projects:
-->
Images is generated by
SanaPAGPipeline
withFlowDPMSolverMultistepScheduler