This is an unofficial implementation of Image Super-Resolution via Iterative Refinement(SR3) by PyTorch.
There are some implementation details that may vary from the paper's description, which may be different from the actual SR3
structure due to details missing. Specifically, we:
- Used the ResNet block and channel concatenation style like vanilla
DDPM
. - Used the attention mechanism in low-resolution features (
$16 \times 16$ ) like vanillaDDPM
. - Encode the
$\gamma$ asFilM
structure did inWaveGrad
, and embed it without affine transformation. - Define the posterior variance as
$\dfrac{1-\gamma_{t-1}}{1-\gamma_{t}} \beta_t$ rather than$\beta_t$ , which gives similar results to the vanilla paper.
If you just want to upscale
β β β NEW: The follow-up Palette-Image-to-Image-Diffusion-Models is now available; See the details here β β β
- 16Γ16 -> 128Γ128 on FFHQ-CelebaHQ
- 64Γ64 -> 512Γ512 on FFHQ-CelebaHQ
- 128Γ128 face generation on FFHQ
-
1024Γ1024 face generation by a cascade of 3 models
- log / logger
- metrics evaluation
- multi-gpu support
- resume training / pretrained model
- validate alone script
- Weights and Biases Logging π NEW
Note: We set the maximum reverse steps budget to Nvidia 1080Ti
, image noise and hue deviation occasionally appear in high-resolution images, resulting in low scores. There is a lot of room for optimization. We are welcome to any contributions for more extensive experiments and code enhancements.
Tasks/Metrics | SSIM(+) | PSNR(+) | FID(-) | IS(+) |
---|---|---|---|---|
16Γ16 -> 128Γ128 | 0.675 | 23.26 | - | - |
64Γ64 -> 512Γ512 | 0.445 | 19.87 | - | - |
128Γ128 | - | - | ||
1024Γ1024 | - | - |
-
16Γ16 -> 128Γ128 on FFHQ-CelebaHQ [More Results]
![]() |
![]() |
![]() |
---|
-
64Γ64 -> 512Γ512 on FFHQ-CelebaHQ [More Results]
![]() |
![]() |
![]() |
---|---|---|
![]() |
![]() |
![]() |
-
128Γ128 face generation on FFHQ [More Results]
![]() |
![]() |
![]() |
---|
pip install -r requirement.txt
This paper is based on "Denoising Diffusion Probabilistic Models", and we build both DDPM/SR3 network structures, which use timesteps/gamma as model embedding inputs, respectively. In our experiments, the SR3 model can achieve better visual results with the same reverse steps and learning rate. You can select the JSON files with annotated suffix names to train the different models.
Tasks | PlatformοΌCodeοΌqwer) |
---|---|
16Γ16 -> 128Γ128 on FFHQ-CelebaHQ | Google Drive|Baidu Yun |
64Γ64 -> 512Γ512 on FFHQ-CelebaHQ | Google Drive|Baidu Yun |
128Γ128 face generation on FFHQ | Google Drive|Baidu Yun |
# Download the pretrained model and edit [sr|sample]_[ddpm|sr3]_[resolution option].json about "resume_state":
"resume_state": [your pretrained model's path]
If you didn't have the data, you can prepare it by following steps:
Download the dataset and prepare it in LMDB or PNG format using script.
# Resize to get 16Γ16 LR_IMGS and 128Γ128 HR_IMGS, then prepare 128Γ128 Fake SR_IMGS by bicubic interpolation
python data/prepare_data.py --path [dataset root] --out [output root] --size 16,128 -l
then you need to change the datasets config to your data path and image resolution:
"datasets": {
"train": {
"dataroot": "dataset/ffhq_16_128", // [output root] in prepare.py script
"l_resolution": 16, // low resolution need to super_resolution
"r_resolution": 128, // high resolution
"datatype": "lmdb", //lmdb or img, path of img files
},
"val": {
"dataroot": "dataset/celebahq_16_128", // [output root] in prepare.py script
}
},
You also can use your image data by following steps, and we have some examples in dataset folder.
At first, you should organize the images layout like this, this step can be finished by data/prepare_data.py
automatically:
# set the high/low resolution images, bicubic interpolation images path
dataset/celebahq_16_128/
βββ hr_128 # it's same with sr_16_128 directory if you don't have ground-truth images.
βββ lr_16 # vinilla low resolution images
βββ sr_16_128 # images ready to super resolution
# super resolution from 16 to 128
python data/prepare_data.py --path [dataset root] --out celebahq --size 16,128 -l
Note: Above script can be used whether you have the vanilla high-resolution images or not.
then you need to change the dataset config to your data path and image resolution:
"datasets": {
"train|val": { // train and validation part
"dataroot": "dataset/celebahq_16_128",
"l_resolution": 16, // low resolution need to super_resolution
"r_resolution": 128, // high resolution
"datatype": "img", //lmdb or img, path of img files
}
},
# Use sr.py and sample.py to train the super resolution task and unconditional generation task, respectively.
# Edit json files to adjust network structure and hyperparameters
python sr.py -p train -c config/sr_sr3.json
# Edit json to add pretrain model path and run the evaluation
python sr.py -p val -c config/sr_sr3.json
# Quantitative evaluation alone using SSIM/PSNR metrics on given result root
python eval.py -p [result root]
Set the image path like steps in Own Data
, then run the script:
# run the script
python infer.py -c [config file]
The library now supports experiment tracking, model checkpointing and model prediction visualization with Weights and Biases. You will need to install W&B and login by using your access token.
pip install wandb
# get your access token from wandb.ai/authorize
wandb login
W&B logging functionality is added to the sr.py
, sample.py
and infer.py
files. You can pass -enable_wandb
to start logging.
-log_wandb_ckpt
: Pass this argument along with-enable_wandb
to save model checkpoints as W&B Artifacts. Bothsr.py
andsample.py
is enabled with model checkpointing.-log_eval
: Pass this argument along with-enable_wandb
to save the evaluation result as interactive W&B Tables. Note that onlysr.py
is enabled with this feature. If you runsample.py
in eval mode, the generated images will automatically be logged as image media panel.-log_infer
: While runninginfer.py
pass this argument along with-enable_wandb
to log the inference results as interactive W&B Tables.
You can find more on using these features here. π
Our work is based on the following theoretical works:
- Denoising Diffusion Probabilistic Models
- Image Super-Resolution via Iterative Refinement
- WaveGrad: Estimating Gradients for Waveform Generation
- Large Scale GAN Training for High Fidelity Natural Image Synthesis
Furthermore, we are benefitting a lot from the following projects: