KEMBAR78
GitHub - PKU-YuanGroup/Edit-R1: Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback
Skip to content

PKU-YuanGroup/Edit-R1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UniWorld-V2 UniWorld-V1 ImgEdit Collection License

📣 News

[2025/10/19]: We release Edit-R1, which employs DiffusionNFT and a training-free reward model derived from pretrained MLLMs to fine-tune diffusion models for image editing. UniWorld-Qwen-Image-Edit-2509 and UniWorld-FLUX.1-Kontext-Dev are open-sourced.

🎨 Case Comparisons

Original Prompt Nano-banana GPT-4o Qwen-Image-Edit UniWorld-V2 (Ours)
Case 1: 把鸟移动到红框里,删除掉现在的鸟,最后移除红框 (✅正确执行指令)
Case 2: 把中间白色衣服戴口罩女生的手势改成OK (✅OK手势 )
Case 3: 提取画面中的吉他 (✅弦钮上二下三 )
Case 4: 把下面的所有文字并改用书法体。中间的“月满中秋”改成“千里团圆”。并且把月亮改成模糊的月饼。 (✅模糊月饼,✅书法字体)
Case 5: 让画面中的形象坐在高档西餐厅,双手拿刀叉吃牛排 (✅人物特征,✅刀叉)

🗝️ Train

Deploy vLLM Reward Server

Start the reward server:

python reward_server/reward_server.py

If you want to check the status of the reward server, you can test it by running:

python reward_server/test_reward_server.py

Data Format

Directory structure:

- dataset-dir
  - images/
     - YOUR_IMAGE_DATA
     - ...
  - train_metadata.jsonl
  - test_metadata.jsonl

train_metadata.jsonl and test_metadata.jsonl format:

{"prompt": "PROMPT", "image": "IMAGE_RELATIVE_PATH", "requirement": "TASK_REQUIREMENT"}
...

Configure Training

See config/qwen_image_edit_nft.py and config/kontext_nft.py for available configurations.

Run Training

export REWARD_SERVER=[YOUR_REWARD_SERVICE_IP_ADDR]:12341

torchrun --nproc_per_node=8 \
    scripts/train_nft_qwen_image_edit.py --config config/qwen_image_edit_nft.py:config_name

And you can also refer to the example scripts in examples/.

⚡️ Reproduction

For reproducibility, we provide the reproduction scripts in reproduction/.

See Reproduction Details for more details.

👍 Acknowledgement

🔒 License

See LICENSE for details. The FLUX weights fall under the FLUX.1 [dev] Non-Commercial License.

✏️ Citation

@article{li2025uniworldv2,
    title={Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback},
    author={Li, Zongjian and Liu, Zheyuan and Zhang, Qihui and Lin, Bin and Yuan, Shenghai and Yan, Zhiyuan and Ye, Yang and Yu, Wangbo and Niu, Yuwei and Yuan, Li},
    journal={arXiv preprint arXiv:2506.03147},
    year={2025}
}

@article{lin2025uniworld,
  title={Uniworld: High-resolution semantic encoders for unified visual understanding and generation},
  author={Lin, Bin and Li, Zongjian and Cheng, Xinhua and Niu, Yuwei and Ye, Yang and He, Xianyi and Yuan, Shenghai and Yu, Wangbo and Wang, Shaodong and Ge, Yunyang and others},
  journal={arXiv preprint arXiv:2506.03147},
  year={2025}
}

@article{ye2025imgedit,
  title={Imgedit: A unified image editing dataset and benchmark},
  author={Ye, Yang and He, Xianyi and Li, Zongjian and Lin, Bin and Yuan, Shenghai and Yan, Zhiyuan and Hou, Bohan and Yuan, Li},
  journal={arXiv preprint arXiv:2505.20275},
  year={2025}
}

About

Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages