2School of Computer Science, Peking University
3Computer Center, Peking University
- [2025/10/02] We have updated the comparison and visualization results with Flow-GRPO, Flow-DPO (offline), and Flow-DPO (online) on SD3.5-M LoRA in the paper !
- [2025/7/30] We released the model checkpoint fine-tuned based on FLUX.1 Dev using the MixGRPO algorithm, with HPSv2, ImageReward, and Pick Score as multi-rewards !
- [2025/7/30] We released the paper and code !
- Take comparison with the FlowGRPO and update our technical report.
conda create -n MixGRPO python=3.12
conda activate MixGRPO
sudo yum install -y pdsh pssh mesa-libGL # centos
bash env_setup.sh
The environment dependency is basically the same as DanceGRPO.
Download the FLUX HuggingFace repository to "./data/flux"
.
mkdir ./data/flux
huggingface-cli login
huggingface-cli download --resume-download black-forest-labs/FLUX.1-dev --local-dir ./data/flux
Download the code of HPSv2.
git clone https://github.com/tgxs002/HPSv2.git
Download the "HPS_v2.1_compressed.pt"
and "open_clip_model.safetensors"
to "./hps_ckpt"
mkdir hps_ckpt
huggingface-cli login
huggingface-cli download --resume-download xswu/HPSv2 HPS_v2.1_compressed.pt --local-dir ./hps_ckpt/
huggingface-cli download --resume-download laion/CLIP-ViT-H-14-laion2B-s32B-b79K open_clip_pytorch_model.bin --local-dir ./hps_ckpt/
Run the demo code to automatically download to "~/.cache/huggingface"
:
python fastvideo/models/reward_model/pick_score.py \
--device cuda \
--http_proxy <Your HTTP_PROXY> \ # Default is None
--https_proxy <Your HTTPS_PROXY> # Default is None
Down the "ImageReward.pt"
and "med_config.json"
to "./image_reward_ckpt"
huggingface-cli login
huggingface-cli download --resume-download THUDM/ImageReward med_config.json --local-dir ./image_reward_ckpt/
huggingface-cli download --resume-download THUDM/ImageReward ImageReward.pt --local-dir ./image_reward_ckpt/
Run the demo code to automatically download to "~/.cache/huggingface"
:
python fastvideo/models/reward_model/clip_score.py \
--device cuda \
--http_proxy <Your HTTP_PROXY> \ # Default is None
--https_proxy <Your HTTPS_PROXY> # Default is None
Adjust the prompt_path
parameter in "./scripts/preprocess/preprocess_flux_rl_embeddings.sh"
to obtain the embeddings of the prompt dataset.
bash scripts/preprocess/preprocess_flux_rl_embeddings.sh
The training dataset is the training prompts in HPDv2, as shown in "./data/prompts.txt"
We use the "pdsh"
command for multi-node training with "torchrun"
. The default resource configuration consists of 4 nodes, each with 8 GPUs, totaling 32 GPUs.
First, set your multi-node IPs in the data/hosts/hostfile
.
Then, run the following script to set the environment variable INDEX_CUSTOM
on each node to 0, 1, 2, and 3, respectively.
bash scripts/preprocess/set_env_multinode.sh
Next, set the wandb_key
to your Weights & Biases (WandB) key in "./scripts/finetune/finetune_flux_grpo_FastGRPO.sh"
.
Finally, run the following training script:
bash scripts/finetune/finetune_flux_grpo_FastGRPO.sh
The test dataset is also the test prompts in HPDv2, as shown in "./data/prompts_test.txt"
First, you need to download the MixGRPO model weight "diffusion_pytorch_model.safetensors"
to the "./mix_grpo_ckpt"
directory.
mkdir mix_grpo_ckpt
huggingface-cli login
huggingface-cli download --resume-download tulvgengenr/MixGRPO diffusion_pytorch_model.safetensors --local-dir ./mix_grpo_ckpt/
Then, adjust the Input parameters
in "scripts/inference/inference_flux.sh" (currently set to default) and then execute the single-node script.
bash scripts/inference/inference_flux.sh
Set prompt_file
to the path of the JSON file generated during inference in "scripts/evaluate/eval_reward.sh". Then run the following single-node script.
bash scripts/evaluate/eval_reward.sh
We are deeply grateful for the following GitHub repositories, as their valuable code and efforts have been incredibly helpful:
MixGRPO is licensed under the License Terms of MixGRPO. See ./License.txt
for more details.
If you find MixGRPO useful for your research and applications, please cite using this BibTeX:
@misc{li2025mixgrpounlockingflowbasedgrpo,
title={MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE},
author={Junzhe Li and Yutao Cui and Tao Huang and Yinping Ma and Chun Fan and Miles Yang and Zhao Zhong},
year={2025},
eprint={2507.21802},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2507.21802},
}