Star us if you find this project useful! ⭐
- [10/2025] 🔥 HuggingFace Space Demo is online! Try it out!
- [10/2025] 🔥 Model checkpoints, MultiID-Bench, and MultiID-2M are released!
- [10/2025] 🔥 Codebase and Project Page are relased!
- Inference scripts
- WithAnyone - FLUX.1
- WithAnyone.K.preview - FLUX.1 Kontext
- WithAnyone.Ke.preview - FLUX.1 Kontext
- WithAnyone - FLUX.1 Kontext
- MultiID-Bench
- MultiID-2M Part 1
- MultiID-2M Part 2
- WithAnyone - FLUX.1 Krea
- Training codebase (As soon as the repo reaches 1k stars)
Highlight of WithAnyone
- Controllable: WithAnyone aims to mitigate the "copy-paste" artifacts in face generation. Previous methods have a tendency to directly copy and paste the reference face onto the generated image, leading poor controllability of expressions, hairstyles, accessories, and even poses. They falls into a clear trade-off between similarity and copy-paste. The more similar the generated face is to the reference, the more copy-paste artifacts it has. WithAnyone is an attampt to break this trade-off.
- Multi-ID Generation: WithAnyone can generate multiple given identities in a single image. With the help of controllable face generation, all generated faces can fit harmoniously in one group photo.
Model | Description | Download |
---|---|---|
WithAnyone 1.0 - FLUX.1 | Main model with FLUX.1 | HuggingFace |
WithAnyone.K.preview - FLUX.1 Kontext | For t2i generation with FLUX.1 Kontext | HuggingFace |
WithAnyone.Ke.preview - FLUX.1 Kontext | For face-editing with FLUX.1 Kontext | HuggingFace |
If you just want to try it out, please use the base model WithAnyone - FLUX.1. The other models are for the following use cases:
WithAnyone.K
This is a preliminary version of WithAnyone with FLUX.1 Kontext. It can be used for text-to-image generation with multiple given identities. However, stability and quality are not as good as the base model. Please use it with caution. We are working on improving it.WithAnyone.Ke
This is a face editing version of WithAnyone with FLUX.1 Kontext, leveraging the editing capabilities of FLUX.1 Kontext. Please use it with `gradio_edit.py` instead of `gradio_app.py`. It is still a preliminary version, and we are working on improving it.Use pip install -r requirements.txt
to install the necessary packages.
You can download the necessary model checkpoints in one of the two ways:
- Directly run the inference scripts. The checkpoints will be downloaded automatically by the
hf_hub_download
function in the code to your$HF_HOME
(default:~/.cache/huggingface
). - Use
huggingface-cli download <repo name>
to download:black-forest-labs/FLUX.1-dev
xlabs-ai/xflux_text_encoders
openai/clip-vit-large-patch14
google/siglip-base-patch16-256-i18n
withanyone/withanyone
Then run the inference scripts. You can download only the checkpoints you need to speed up setup and save disk space.
Example forblack-forest-labs/FLUX.1-dev
:huggingface-cli download black-forest-labs/FLUX.1-dev flux1-dev.safetensors
huggingface-cli download black-forest-labs/FLUX.1-dev ae.safetensors
Ignore the text encoder in theblack-forest-labs/FLUX.1-dev
model repo (it is there fordiffusers
calls). All checkpoints together require about 51 GB of disk space (~40 in hub and 10 in xet).
After downloading, set the following arguments in the inference script to the local paths of the downloaded checkpoints:
--flux_path <path to flux1-dev.safetensors>
--clip_path <path to clip-vit-large-patch14>
--t5_path <path to xflux_text_encoders>
--siglip_path <path to siglip-base-patch16-256-i18n>
--ipa_path <path to withanyone>
mv models/antelopev2/ models/antelopev2_ mv models/antelopev2_/antelopev2/ models/antelopev2/ rm -rf models/antelopev2_, antelopev2.zip
The Gradio GUI demo is a good starting point to experiment with WithAnyone. Run it with:
python gradio_app.py --flux_path <path to flux1-dev directory> --ipa_path <path to withanyone directory> \
--clip_path <path to clip-vit-large-patch14> \
--t5_path <path to xflux_text_encoders> \
--siglip_path <path to siglip-base-patch16-256-i18n> \
--model_type "flux-dev" # or "flux-kontext" for WithAnyone.K
❗ WithAnyone requires face bounding boxes (bboxes). You should provide them to indicate where faces are. You can provide face bboxes in two ways:
- Upload an example image with desired face locations in
Mask Configuration (Option 1: Automatic)
. The face bboxes will be extracted automatically, and faces will be generated in the same locations. Do not worry if the given image has a different resolution or aspect ratio; the face bboxes will be resized accordingly. - Input face bboxes directly in
Mask Configuration (Option 2: Manual)
. The format isx1,y1,x2,y2
for each face, one per line. - (NOT recommended) leave both options empty, and the face bboxes will be randomly chosen from a pre-defined set.
⭕ WithAnyone works well with LoRA. If you have any stylized LoRA checkpoints, use --additional_lora_ckpt <path to lora checkpoint>
when launching the demo. The LoRA will be merged into the diffusion model.
python gradio_app.py --flux_path <path to flux1-dev directory> --ipa_path <path to withanyone directory> \
--additional_lora_ckpt <path to lora checkpoint> \
--lora_scale 0.8 # adjust the weight as needed
⭕ In Advanced Options
, there is a slider controlling whether outputs are more "similar in spirit" or "similar in form" to the reference faces.
- Move the slider to the right to preserve more details in the reference image (expression, makeup, accessories, hairstyle, etc.). Identity will also be better preserved.
- Move it to the left for more freedom and creativity. Stylization can be stronger, hair style and makeup can be changed.
How the slider works and some tips
The slider actually controlls the weight of SigLIP embedding and ArcFace embedding. The former preserves more mid-level semantic details, while the latter preserves more high-level identity information.SigLIP is a general image embedding model, capturing more than just faces, while ArcFace is a face-specific embedding model, capturing only identity information.
When using high arcface weight (slider to the left), please add more description of the identity in the prompt, since arcface embedding may lose information like hairstyle, skin color, body build, age, etc.
Be prepared for the first few runs as it may not be very satisfying.
- Provide detailed prompts describing the identity. WithAnyone is "controllable", so it needs more information to be controlled. Here are something that might go wrong if not specified:
- Skin color (generally the race is fine, but for asain descent, if not specified, it may generate darker skin tone);
- Age (e.g., intead of "a man", try "a young man". If not specified, it may generate an older figure);
- Body build;
- Hairstyle;
- Accessories (glasses, hats, earrings, etc.);
- Makeup
- Use the slider to balance between "Resemblance in Spirit" and "Resemblance in Form" according to your needs. If you want to preserve more details in the reference image, move the slider to the right; if you want more freedom and creativity, move it to the left.
- Try it with LoRAs from community. They are usually fantastic.
You can use infer_withanyone.py
for batch inference. The script supports generating multiple images with MultiID-Bench.
Download from HuggingFace.
huggingface-cli download WithAnyone/MultiID-Bench --repo-type dataset --local-dir <path to MultiID-Bench directory>
And convert the arrow file to a folder of images and a json file using MultiID_Bench/hf2bench.py
:
python MultiID_Bench/parquet2bench.py --parquet <path to local dir> --output_dir <path to output directory>
You will get a folder with the following structure:
<output_dir>/
├── p1/untar
├── p2/untar
├── p3/
├── p1.json
├── p2.json
└── p3.json
python infer_withanyone.py \
--eval_json_path <path to MultiID-Bench subset json> \
--data_root <path to MultiID-Bench subset images> \
--save_path <path to save results> \
--use_matting True \ # set to True when siglip_weight > 0.0
--siglip_weight 0.0 \ # Resemblance in Spirit vs Resemblance in Form, higher means more similar to reference
--id_weight 1.0 \ # usually, set it to 1 - id_weight, higher means more controllable
--t5_path <path to xflux_text_encoders> \
--clip_path <path to clip-vit-large-patch14> \
--ipa_path <path to withanyone> \
--flux_path <path to flux1-dev>
Where the data_root should be p1/untar, p2/untar, or p3/ depending on which subset you want to evaluate. The eval_json_path should be the corresponding json file converted from the parquet file.
You can use gradio_edit.py
for face editing with FLUX.1 Kontext and WithAnyone.Ke.
python gradio_edit.py --flux_path <path to flux1-dev directory> --ipa_path <path to withanyone directory> \
--clip_path <path to clip-vit-large-patch14> \
--t5_path <path to xflux_text_encoders> \
--siglip_path <path to siglip-base-patch16-256-i18n> \
--model_type "flux-kontext"
The code of WithAnyone is released under the Apache License 2.0, while the WithAnyone model and associated datasets are made available solely for non-commercial academic research purposes.
-
License Terms:
The WithAnyone model is distributed under the FLUX.1 [dev] Non-Commercial License v1.1.1. All underlying base models remain governed by their respective original licenses and terms, which shall continue to apply in full. Users must comply with all such applicable licenses when using this project. -
Permitted Use:
This project may be used for lawful academic research, analysis, and non-commercial experimentation only. Any form of commercial use, redistribution for profit, or application that violates applicable laws, regulations, or ethical standards is strictly prohibited. -
User Obligations:
Users are solely responsible for ensuring that their use of the model and dataset complies with all relevant laws, regulations, institutional review policies, and third-party license terms. -
Disclaimer of Liability:
The authors, developers, and contributors make no warranties, express or implied, regarding the accuracy, reliability, or fitness of this project for any particular purpose. They shall not be held liable for any damages, losses, or legal claims arising from the use or misuse of this project, including but not limited to violations of law or ethical standards by end users. -
Acceptance of Terms:
By downloading, accessing, or using this project, you acknowledge and agree to be bound by the applicable license terms and legal requirements, and you assume full responsibility for all consequences resulting from your use.
We thank the following prior art for their excellent open source work:
If you find this project useful in your research, please consider citing:
@article{xu2025withanyone,
title={WithAnyone: Towards Controllable and ID-Consistent Image Generation},
author={Hengyuan Xu and Wei Cheng and Peng Xing and Yixiao Fang and Shuhan Wu and Rui Wang and Xianfang Zeng and Gang Yu and Xinjun Ma and Yu-Gang Jiang},
journal={arXiv preprint arxiv:2510.14975},
year={2025}
}