KEMBAR78
Structure-Aware Motion Deblurring Using | PDF | Computers
0% found this document useful (0 votes)
46 views14 pages

Structure-Aware Motion Deblurring Using

Uploaded by

pradogel000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views14 pages

Structure-Aware Motion Deblurring Using

Uploaded by

pradogel000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

6142 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL.

30, 2021

Structure-Aware Motion Deblurring Using


Multi-Adversarial Optimized CycleGAN
Yang Wen, Jie Chen , Bin Sheng , Member, IEEE, Zhihua Chen, Ping Li , Member, IEEE,
Ping Tan , Senior Member, IEEE, and Tong-Yee Lee , Senior Member, IEEE

Abstract— Recently, Convolutional Neural Networks (CNNs) I. I NTRODUCTION


have achieved great improvements in blind image motion deblur-
ring. However, most existing image deblurring methods require
a large amount of paired training data and fail to maintain
satisfactory structural information, which greatly limits their
M OTION blur is a painful problem during the process
of taking photos by lightweight devices like mobile
phones. Because of these inevitable factors in the image
application scope. In this paper, we present an unsupervised
image deblurring method based on a multi-adversarial optimized acquisition process especially under poor lighting conditions,
cycle-consistent generative adversarial network (CycleGAN). the image quality will be degraded to undesired blurry images.
Although original CycleGAN can handle unpaired training data Image motion deblurring problem is to restore the sharp image
well, the generated high-resolution images are probable to from a given blurry image [2]–[5]. There are mainly two types
lose content and structure information. To solve this problem,
we utilize a multi-adversarial mechanism based on CycleGAN of image deblurring methods: blind and non-blind deblurring.
for blind motion deblurring to generate high-resolution images Many works have been focused on non-blind deblurring in
iteratively. In this multi-adversarial manner, the hidden layers recent years, which are based on the assumption that the blur
of the generator are gradually supervised, and the implicit function is known before, like blur caused by camera shake,
refinement is carried out to generate high-resolution images etc. However, it is a severely ill-posed problem to find the blur
continuously. Meanwhile, we also introduce the structure-aware
mechanism to enhance the structure and detail retention ability kernel for every pixel. Aiming at the problem of non-blind
of the multi-adversarial network for deblurring by taking the image deblurring, some methods are intended to parameterize
edge map as guidance information and adding multi-scale edge the blur model according to the assumed blur source. In [6],
constraint functions. Our approach not only avoids the strict need Whyte et al. assume that the blurs are only caused by the
for paired training data and the errors caused by blur kernel movement of 3D cameras. While this assumption is not always
estimation, but also maintains the structural information better
with multi-adversarial learning and structure-aware mechanism. true in practice. Recently, CNNs have shown strong semantic
Comprehensive experiments on several benchmarks have shown analysis ability and have been widely used in blind image
that our approach prevails the state-of-the-art methods for blind deblurring. In [7], Madam et al. propose an architecture that
image motion deblurring. consists of an autoencoder to learn the data prior and an
Index Terms— Unsupervised image deblurring, multi- adversarial network to generate and discriminate between the
adversarial, structure-aware, edge refinement. sharp and blurred features. In [8], Schuler et al. describe
how to use a trainable model to learn blind deconvolution.
Manuscript received December 12, 2020; revised May 15, 2021; accepted In [9], Xu et al. propose a model that contains two stages,
June 15, 2021. Date of publication July 2, 2021; date of current version
July 9, 2021. This work was supported in part by the National Natural suppressing extraneous details and enhancing sharp edges.
Science Foundation of China under Grant 61872241 and Grant 61572316; In [10], Nah et al. propose a multi-scale convolutional neural
in part by The Hong Kong Polytechnic University under Grant P0030419, network (CNN) for blind image deblurring.
Grant P0030929, and Grant P0035358; and in part by the Ministry of
Science and Technology, Taiwan, under Grant 108-2221-E-006-038-MY3. The Although significant improvements have been made by
associate editor coordinating the review of this manuscript and approving it the emergence of deep learning, three major challenges still
for publication was Prof. Jiaying Liu. (Corresponding authors: Bin Sheng; stand in the way of the blind motion deblurring problem. (1)
Zhihua Chen.)
Yang Wen and Bin Sheng are with the Department of Computer Science Missing handcrafted yet critical prior features: Deep CNNs
and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China often ignore the traditional manual features based on statistical
(e-mail: shengbin@sjtu.edu.cn). prior knowledge for image deblurring. Previous studies [11],
Jie Chen is with the Samsung Electronics (China) Research and Develop-
ment Centre, Nanjing 210012, China (e-mail: ada.chen@samsung.com). [12] have shown that the traditional manual features are very
Zhihua Chen is with the Department of Computer Science and Engineering, important for image deblurring. (2) Obsolete disposing of
East China University of Science and Technology, Shanghai 200237, China multi-scale deblurring: Although the multi-scale architecture
(e-mail: czh@ecust.edu.cn).
Ping Li is with the Department of Computing, The Hong Kong Polytechnic has long been used to solve the deblurring problem [10],
University, Hong Kong (e-mail: p.li@polyu.edu.hk). it may emphasize high-level semantic information and under-
Ping Tan is with the School of Computing Science, Simon Fraser University, estimate the key role of underlying features in deblurring.
Burnaby, BC V5A 1S6, Canada (e-mail: pingtan@sfu.ca).
Tong-Yee Lee is with the Department of Computer Science and Information (3) Limited training data: The traditional motion deblurring
Engineering, National Cheng Kung University, Tainan 70101, Taiwan (e-mail: methods always aim to find the cause of blurring and estimate
tonylee@mail.ncku.edu.tw). the approximate blur kernel so as to obtain the training
This article has supplementary downloadable material available at
https://doi.org/10.1109/TIP.2021.3092814, provided by the authors. data. The estimation method often has a certain error which
Digital Object Identifier 10.1109/TIP.2021.3092814 leads to the blurred training data generated can only contain
1941-0042 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: STRUCTURE-AWARE MOTION DEBLURRING USING MULTI-ADVERSARIAL OPTIMIZED CycleGAN 6143

several general specific categories. In addition, training data


must contain both pairs of blurred and sharp images [10],
[13]–[15], which are often quite difficult to obtain in reality.
Otherwise, there is a large distribution difference between
the synthesized and real blurred images, so the universal-
ity of the network model trained by sharp image and its
corresponding synthesized blurred data needs to be further
improved.
For the paired training data requirements, various
unsupervised CNNs-based methods have been proposed.
Nimisha et al. [16] propose an unsupervised generative adver-
sarial network (GAN) based method with additional reblur
loss and multi-scale gradient loss. Although this method shows
good performance on the synthetic data set, it is only for the
special blurred type and cannot achieve a satisfactory effect on
the real blurred images. Other existing unsupervised methods
Fig. 1. Comparison of deblurred images by our method and the original
based on GAN for the image-to-image translation mainly CycleGAN on the real images. (a) Blurred images. (b) Deblurring results
involve learning the mapping of blurred image domain to using original CycleGAN [1]. (c) Deblurring results by our method. It shows
the sharp image domain, such as CycleGAN [1] and discover our method is more satisfying, especially in the pink and yellow rectangles.
generative adversarial network (DiscoGAN) [17]. In this paper,
we choose CycleGAN [1] that is well known for its unpaired also automatically generate blurred images from sharp
image-to-image translation to instead the previous network images simultaneously to provide more available data for
model. We take the advantage of CycleGAN to treat blurred subsequent studies.
images and sharp images as two different data distributions • We propose a multi-adversarial architecture to solve
to overcome the paired data training problem about deblur- the artifact problem in high-resolution image genera-
ring mentioned above. Based on CycleGAN, a more flexible tion. Different from the traditional multi-scale meth-
deblurring effect can be achieved with an unpaired image ods [10] and [20], the multi-adversarial constraints can
dataset than other methods that can only be trained with pairs promote the network to produce the results closest to
of sharp and blurred images. For the obsolete disposing of the sharp images at different resolutions. Although the
multi-scale deblurring problem, we utilize a multi-adversarial multi-adversarial structure is somewhat more burdensome
architecture that includes a series of slightly modified dense- than the original CycleGAN, it can effectively suppress
blocks [13] to improve the deblurring performance. The the artifacts in the high-resolution image generation.
multi-adversarial strategy can iteratively generate the sharp • We present a structure-aware mechanism based on edge
images from the low-resolution to the high-resolution. For clues for motion deblurring. Since how to effectively
the missing handcrafted yet critical prior features problem, restore sharp edges is vital for deblurring effect based on
since previous studies have shown that sharp edge restoration the previous research [9], [12], [21], blurred image and
plays a very important role in the structural maintenance of its edge map are fused as the input. Besides, multi-scale
deblurring [9], [11], [12], we use a structure-aware strategy edge constraints are introduced in the multi-adversarial
that includes edge guidance by adding the edge map as part architecture to make the adversarial network generate
of the input and structure enhancement by minimizing the persuasive structural information at different resolutions.
edge loss. Moreover, our architecture can avoid the introduc-
tion of other noise factors (such as color and texture) into II. R ELATED W ORK
the generated deblurred images, which is easy to occur in In recent years, blind image motion deblurring has attracted
the original CycleGAN, and keep the structure and detail considerable research attention in the field of computer vision
information consistent with the corresponding sharp image [22]–[24]. In general, image motion deblurring tasks are based
as much as possible. Combing with the perceptual loss [18] on the assumption that the blur is uniform and spatially
and multi-scale structural similarity (MS-SSIM [19]) loss, invariant [9] and the endless number of solutions [10], [14],
we obtain significantly better image deblurring results than [15] have been proposed. According to the need for blur kernel
most of the existing methods. As shown in Fig. 1, compared estimation, image deblurring methods can be divided into with
to the classical unsupervised methods, our results in Fig. 1(c) kernel and kernel-free two categories.
are more satisfying. Our work makes the following three main
contributions:
• We introduce an unsupervised approach based on Cycle- A. Kernel Estimation Method for Motion Deblurring
GAN [1] for blind motion deblurring without assum- 1) Traditional Kernel Estimation Method for Deblurring:
ing any restricted blur kernel model. It can avoid the Commonly, diverse methods tend to take advantage of the
errors caused by blur kernel estimation and overcome the sharp edges to estimate the blur kernel. Some kernel estimation
drawback that other methods require pairwise images as approaches [25]–[28] rely on implicit or explicit extraction
the training data [10], [14]. In addition, our model can of edge information to detect and enhance the image edges

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
6144 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

through a variety of technologies, such as bilateral filtering realize the deblurring task by unsupervised use of unpaired
and gradient amplitude. In [29], Xu et al. propose an training data, [16], [41] only target at the specific image
L0-regularized gradient prior based on the sharp edge infor- domain deblurring problem, while [42] will encode other
mation for blind image deblurring. In [12], Pan et al. develop factors (color, texture, etc., instead of blurred information) into
an optimization method based on L 0 -regularized intensity and the generated deblurred image. Different from these previous
gradient prior to generate reliable intermediate results for blur methods, our unsupervised method can solve the demand
kernel estimation. In [30], Sun et al. use dictionary learning of paired training data problems for the image deblurring.
to predict the sharp edges with the sharp edge patches of Meanwhile, we utilize the multi-adversarial architecture and
clear images for deblurring. In [31], Pan et al. describe a structure-aware mechanism to further remove the unpleasant
blind image deblurring method with the dark channel prior. artifacts and maintain structure information effectively.
In [32], Kim et al. propose to estimate the motion flow and the
latent sharp image simultaneously based on the total variation III. P ROPOSED M ETHOD
(TV)-L1 model. In [33], Bai et al. propose a multi-scale latent Our overall flowchart is shown in Fig. 2. In Fig. 2, G B
structure prior and gradually restore the sharp images from the and G S are two generator sub-networks which transform from
coarse-to-fine scales on a blurry image. Recently, thanks to the the sharp image to the blurred image and from the blurred
powerful semantic analysis and deep mining ability of CNNs, image to the sharp image, respectively. D B and D S are the
more works tend to use large-scale samples to solve the blind discriminators to distinguish the real images and generated
image deblurring problems. images, and give feedback to the generators. Different from the
2) CNNs Based Kernel Estimation Method for Deblurring:
traditional CycleGAN [1], we use the form of multi-adversarial
In recent years, CNNs have played an unparalleled advan-
in different resolution constraints to gradually improve the
tage in solving computer vision problems including image
quality of the generated images and use skip connections to
deblurring and achieved many promising results [7]–[9]. Some
make the low-level information better guide the high-level
methods use CNNs to estimate the blur kernel to achieve the
generation structure. Meanwhile, we design a structure-aware
deblurring task. For instance, Sun et al. mainly estimate the
mechanism by introducing the multi-scale edge constraints
probabilistic distribution of the unknown motion blur kernel
in the multi-adversarial architecture to make the adversarial
based on CNN for deblurring [34]. However, these methods
network generate persuasive structural information at different
have strict requirements for paired training data and cannot
resolutions, and edge map is also used as part of the input
directly realize the transformation from the blurred image to
to facilitate the network’s retention of structural information.
the sharp image, and still cannot avoid errors in the process of
Besides, we add a variety of loss functions (structural loss
blur kernel estimation based on CNNs [35], [36]. In contrast,
MS-SSIM and perceptual loss obtained by VGG16) to fur-
our approach can avoid these errors, since our method is based
ther strengthen the constraints to reduce the generated false
on the unsupervised image-to-image translation with unpaired
information. Compared with other methods, our method can
training data and can directly realize the transformation from
not only solve the demand of paired data problem, but also
blurred images to sharp images without kernel estimation
can maintain more structural information and achieve a better
process. In this paper, we show a comparison with [12], [31],
deblurring effect.
[34] to verify our advantages in Session IV-E.

B. Kernel-Free Learning for Motion Deblurring A. Original CycleGAN-Based Deblurring Method


Since the popularity of GAN, which is originally designed Inspired by the success of the unsupervised method Cycle-
to solve different image-to-image translation problems [37], GAN [1], we try to handle the demand of paired training data
[38], more people try to generate deblurred images directly problem by the unsupervised image-to-image translation man-
from the blur images with GAN to avoid the distortion ner. Based on the original CycleGAN for deblurring, the archi-
caused by kernel estimation. In [39], Xu et al. use CNN tecture includes two generator sub-networks G B and G S that
to learn the deconvolution operation guided by traditional transform from the blurred image b to the deblurred (sharp)
deconvolution schemes. In [7], Nimisha et al. propose a image s and from the sharp (deblurred) image s to the blurred
novel deep filter based on GAN architecture integrated with image b, respectively. D B and D S are the discriminators for
global skip connection and dense architecture to tackle this the blurred image and the sharp (deblurred) image, respec-
problem. In [13], a special GAN with a densely connected tively. The loss function of CycleGAN contains two parts:
generator and a discriminator is used to generate a deep filter adversarial loss and cycle-consistency loss. On one hand,
for deblurring. Kupyn et al. [14] propose the DeblurGAN the adversarial loss aims to match the distribution of generated
method based on the conditional adversarial network and a images to the data distribution in the target domain. On the
multi-component loss function for blind motion deblurring. other hand, the cycle consistency loss ensures that the cyclic
In [40], Li et al. propose a depth guided network which transformation can bring the image back to its original state.
contains a deblurring branch and a depth refinement branch Based on the traditional CycleGAN, we can successfully
for dynamic scene deblurring. Although breakthroughs have transform from the blurred image domain to the sharp image
been made in these methods, the problems of missing structure domain with unpaired training data. However, some annoying
information and demanding paired training data still need to artifacts (such as color and texture) will be encoded into
be solved. Even the subsequent methods [16], [41], [42] can the generated results and some structure information also

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: STRUCTURE-AWARE MOTION DEBLURRING USING MULTI-ADVERSARIAL OPTIMIZED CycleGAN 6145

Fig. 2. The flowchart of our structure-aware multi-adversarial optimized CycleGAN. Our architecture relies on the unsupervised image-to-image translation to
learn the mapping between blurred images and deblurred (sharp) images with unpaired training data. G S and G B are two generator sub-networks for translating
blurred image to sharp image and translating sharp image to blurred image, respectively. D S64 , D S128 and D S256 are the discriminators to determine whether
the image generated by G S is real or fake at three resolutions. D B64 , D B128 and D B256 are the discriminators to determine whether the image generated by
G B is real or fake at three resolutions. We restore sharp images by this multi-adversarial manner to iteratively generate high-resolution from low-resolution
images. In addition, we introduce the structure-aware mechanism by adding edge input to guide the generation procedure and multi-scale edge losses to
maintain more structure details at different resolutions. Besides, we utilize cycle-consistency loss, perceptual loss and MS-SSIM loss to enforce constraints
on the structure generation.

sometimes lost [16], [43]. In order to solve these problems, images with three resolution levels. Then, three independent
we expect to improve the generation effect step by step with discriminators will judge the authenticity of the generated
multi-adversarial architecture and structure-ware mechanism. images on different resolutions and feed information to the
generators. The hidden layers with different resolutions in the
B. Multi-Adversarial Generative Network network are constrained and the feature maps are iteratively
optimized to generate higher quality results. Additionally,
As discussed in Section II-B, the classical GAN-based
structure often introduces artifacts when generating realistic the generated edge maps at three different resolutions are used
for multi-scale edge constraints to improve the structure reten-
images, especially with the increase of resolution. To solve this
tion performance of the network. We also use skip connections
problem, a multi-scale way is preferred to improve the quality
of the generated images [10]. Ideally, a mature multi-scale to take full advantage of the low-level information to guide the
deconvolution process.
approach not only can significantly improve the network per-
For a blurred image b, generator G S generates synthesized
formance but also need to minimize parameters to reduce time
consumption and hardware burden. However, the parameters sharp image sb1 , sb2 , sb3 as outputs. The sb3 , which presents
the output of the last deconvolution layer, is sent as the input of
in some multi-scale approaches at each scale are still inde-
pendent of each other in some multi-scale methods [10], [20]. G B to generate three reconstructions  b1 , 
b2 and b3 . Similarly,
for a deblurred (sharp) image s as input, G B will output
Given this, we introduce the multi-adversarial architecture in
synthesized blurred images bs1 , bs2 and bs3 . And with bs3 as
our unsupervised deblurring model to make full use of the
input information and avoid the problem of false information the input, the generator G S will produce three reconstructions
s1 , 
 s2 and  s3 . We then supervise these different outputs to
increasing with the increase of resolution.
force them closer to the target at different resolutions. D S64,
Inspired by the traditional encoder-decoder network struc-
ture [44], the generator G S in our proposed multi-adversarial D S128 and D S256 are defined for G S . D B64 , D B128 and D B256
are defined for G B . Three resolutions of 64 × 64, 128 × 128
network is shown in Fig. 3. The input of the generator
and 256 × 256 are applied on the corresponding deconvolution
sub-network G S is the blurred image and the corresponding
layers, respectively. The adversarial losses can be written as
edge map obtained by Sobel operator. The edge map used as
Eq. (1) and Eq. (2):
part of the input can provide additional structural information
to the network. G S contains a series of convolution layers,  
L adv (G S , D Si ) = E b∼ p(b) log(1 − D Si (G S (b)i ))
deconvolution layers and upper sampling layers. Feature maps  
+ E si ∼ p(si ) log(D Si (si )) (1)
are generated from each deconvolution layer through a 3 × 3  
convolution forward layer with output images at different L adv (G B , D Bi ) = E s∼ p(s) log(1 − D Bi (G B (s)i ))
 
resolutions. From Fig. 3, generator G S can produce the output + E bi ∼ p(bi ) log(D Bi (bi )) (2)

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
6146 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

Fig. 3. Network structure of the proposed multi-adversarial generator. G S is the generator sub-network for the translation from the blurred image to the
deblurred (sharp) image. The input of the generator sub-network G S is the blurred image and the corresponding edge map obtained by Sobel operator. By the
multi-adversarial manner, G S can produce three different resolution outputs (64 × 64, 128 × 128 and 256 × 256). Multi-adversarial supervision is achieved
through multiple discriminators in the hidden layers. Discriminators D S64 , D S128 , D S256 are defined for G S at three different resolutions, respectively.
In addition, the generated edge maps at three different resolutions are used for multi-scale edge constraints to improve the structure retention performance of
the network. The specific parameters of the generator sub-network are shown in the figure so that we can train our multi-adversarial model with a specific
size and test the image of any size.

where G S (b)i = sbi , G B (s)i = bsi and i = 1, 2, 3 corresponds image quality assessment and image restoration tasks. In [16],
to the three different resolutions. bi and si are the blurred an unsupervised network for deblurring with a reblurring cost
image and sharp image at i t h resolution, respectively. D Bi and and a scale-space gradient cost is proposed. In [11], Vasu et al.
D Si are the discriminators corresponding to G B and G S at i t h first investigate the relationship between the edge profiles and
scale, respectively. the camera motion, and then incorporate the edge profiles into
As for the cycle-consistency loss in the traditional an existing blind deblurring framework. In [45], a two-stage
CycleGAN, it can be improved to multiple resolutions: edge-aware network is proposed to improve image deblurring
  according to the feature that human eyes pay more attention to
L cycbi = bi − bi 1 = G B (G S (b)3 )i − bi 1 (3) edge sharpening. Although several structure-aware strategies
L cycsi = si − si 1 = G S (G B (s)3 )i − si 1 (4) have been successively applied to deblurring problems, it is
still difficult to maintain structure information and reduce
where G S (b)3 = sb3 and G B (s)3 = bs3 . The final
inherent ambiguity in unsupervised deblurring tasks.
multi-adversarial objective function is defined as:
In order to preserve the structural information of the
L Mult iG AN (G S , G B , D S , D B ) deblurred image to the maximum extent, we introduce the
3 structure-aware mechanism by taking the corresponding edge
= (L adv (G S , D Si ) + L adv (G B , D Bi ) map as part input and adding multi-scale edge constraint func-
i=1 tions in the multi-adversarial architecture. Different from the
+ μi (L cycbi + L cycsi )) (5) structure-aware mechanism in other image processing tasks,
the structure-aware mechanism in our unsupervised deblurring
Simplified as: model not only includes the input edge clues for structural

3 information assistance but also includes multi-scale edge con-
L Mult iG AN = (L adv i + μi L cyci ) (6) straints for generating the deblurring with different resolutions.
i=1 Besides, the multi-scale edge constraints can be organically
combined with the multi-adversarial strategy to promote the
where μi is the weight parameter at i t h resolution to balance
generation of structural information in unsupervised networks.
the different components. L cyci = L cycsi + L cycbi , and L adv i = We have verified that both of them can effectively promote
L adv (G S , D Si ) + L adv (G B , D Bi ) the structure retention ability of the network and generate
a more satisfactory deblurring effect through the ablation
C. Structure-Aware Mechanism for Deblurring experiments.
The high-frequency details of the image are weakened The proposed structure-aware mechanism can emphasize
to some extent due to the blurring process, how to restore the protection of image geometry to alleviate the important
the structure and details as much as possible in the image ambiguity problem of the original CycleGAN. In this paper,
deblurring task is very important. Previous studies [11], [16], the proposed structure-aware mechanism network is shown
[45] prove that image edge is of great significance in subjective in Fig. 3. Due to the input edge guidance, the Eq. (1) and

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: STRUCTURE-AWARE MOTION DEBLURRING USING MULTI-ADVERSARIAL OPTIMIZED CycleGAN 6147

Fig. 4. Comparative experiment of structure maintenance effect. (a) The original blurred image. (b) Deblurring result using CycleGAN [1]. (c) Deblurring
result with edge map as input. (d) Deblurring result with edge loss. (e) Deblurring result with both edge map as input and edge loss. It shows our method
is more satisfying, especially in the yellow rectangles.

Eq. (2) can be revised as Eq. (7) and Eq. (8): combination can better improve the motion deblurring perfor-
  mance as shown in Fig. 4(e).
L adv (G S , D Si ) = E b∼ p(b) log(1 − D Si (G S (b, be )i ))
 
+ E si ∼ p(si ) log(D Si (si , sie )) (7) D. The Network Structure
  1) Generator: The generator in our architecture is shown
L adv (G B , D Bi ) = E s∼ p(s) log(1 − D Bi (G B (s, s e )i ))
  in Fig. 3. It contains a series of convolution layers
+ E bi ∼ p(bi ) log(D Bi (bi , bie )) (8) and residual blocks. Specific as follows: C7S1 − 64,
where be and s e are the edge maps of the image b and C3 − 128, C3 − 256, R B256 × 9, T C64, T C32, C7S1 − 3,
image s obtained by Sobel operator, respectively. bie and sie where, C7S1 − k represents a 7 × 7 ConvBNReLU
are the responding edge maps at i t h resolution. By this edge (Convolution+BatchNorm+ReLU) block with stride 1 and
guidance manner, we can take the advantage of the additional k filters, C3 − k represents a 3 × 3 ConvBNReLU block
edge information to make the generated images in the target with stride 2 and k filters. R Bk × n denotes k filters and
domain contain similar edge structure information of the n residual blocks which contain two 3 × 3 convolution lay-
source domain and better guide the discriminator to distinguish ers, T Ck represents a 3 × 3 TConvBNReLU (Transposed
the generated images from the real images. However, even the Convolution+BatchNorm+ReLU) block with stride 1/2 and
edge guidance can improve the accuracy of discrimination, k filters. In addition, we introduce the structure-aware archi-
we find that the generated deblurred image still exits the tecture (including edge input guidance and multi-scale edge
problems of ringing and oversharp. constrains) in G S and G B during training process.
In order to handle the existing problems and force the struc- 2) Discriminator: The discriminator is also shown in Fig. 3.
ture of the generated deblurred image to match its correspond- Classic PatchGANs [47] is used as a discriminator to classify
ing sharp image, we introduce the multi-scale edge losses in overlapping image blocks and determine whether they are real
the multi-adversarial structure. Since our unsupervised method or false. All the discriminator networks at three resolutions
has no access to the corresponding reference image and it mainly include: C64−C128−C256−C512, here Ck presents
is difficult to generate an accurate corresponding edge map, a 4 × 4 ConvBNLeakyReLU (Convolution + BatchNorm +
we follow the heuristic from [16], [46] and utilize the fact LeakyReLU) block with stride 2 and k filters. The parameter of
that the resized image bη which is obtained by shrinking a LeakyReLU is set to 0.2 in our experiment. According to the
blurred image b with a factor of η is sharper than the image specific parameters of the generator and discriminator, we can
b itself. Thus, we introduce the multi-scale edge losses to train our multi-adversarial model with a specific size and test
enforce the edge of the generated deblurred image to match the images of any size.
its corresponding sharp image. The factor of η in our model
E. Loss Functions
is set to 0, 1/2 and 1/4 for three different scales respectively.
1) Multi-Scale SSIM Loss: The perceptually motivated met-
Then, the introduced multi-scale edge losses are defined as:
  ric Structural SIMilarity index (SSIM) [48] has often been
L Gradbi = sbi − bi 1 = (G S (b)i ) − bi 1 (9) used to measure the similarity of two images. To preserve the
  information of contrast, luminance, structure in the generated
 
L Grad = bs − si = (G B (s)i ) − si 1 (10)
si i 1 images and alleviate the ambiguity problem of CycleGAN,
where  is the Sobel operator to calculate the gradient map we use the Multi-scale SSIM loss (MS-SSIM) based on SSIM
of an image, and L Gradi = L Gradbi + L Gradsi . between  bi and bi in our model. The MS-SSIM we used is
Fig. 4 shows the effect of just using the edge loss and adding defined as:
edge as an input to the generator. From Fig. 4, most structure  α 
M
β
information can be migrated to the target domain with edge L M S S I Mbi = 1− l M (bi , 
bi ) M [c j (bi , 
bi )] j [m j (bi , 
bi )]γ j
input in Fig. 4(c), and most artificial noise can be effectively j =1
eliminated through multi-scale edge losses in Fig. 4(d). The (11)

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
6148 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

2μbi μ
b +C 1 2σbi σ
b +C 2
where l(bi , 
bi ) = μ2b +μ
i
2 +C , c(bi , 
bi ) = i
σb2 +σ2 +C 2
and IV. E XPERIMENTAL R ESULTS
bi 1 bi
i i
σb  +C A. Implementation Details
m(bi ,  
3
bi ) = i bi
σbi σ . (bi , bi ) denotes the image pair of
b +C 3
i We conduct our training and testing experiments on a
input image and the reconstructed image, respectively. μbi , workstation with Intel Xeon E5 CPU and NVIDIA 2080ti
μbi , σbi , σ
bi , σbi 
bi indicate the means, standard deviations and GPU. The model we used is implemented with Pytorch
cross-covariance of the image pair (bi ,  bi ), respectively. C1 , platform [50]. For fairness, all the experiments are set in the
C2 and C3 are the constants determined according to refer- same data set and environment except for special instructions.
ence [48]. l(bi ,  bi ), c(bi , 
bi ) and m(bi , 
bi ) denote the compar- Throughout our experiments, we use ADAM [51] solver for
ison components of luminance, contrast and structure between model training with parameters β1 = 0.9 and β2 = 0.999.
bi and  bi , respectively. α, β and γ are the hyper-parameters Limited by the memory, the batch-size is set to 2 for all the
set according to [48], which are used to control the relative methods. The initial learning rate is fixed to 0.0002 for the
weight of the three comparison components. first 30 epoches and then decay to one-tenth every 30 epoches.
Similarly, the MS-SSIM loss function L M S S I Msi between Totally, 200 epoches already satisfy the convergence condition.

si and si is defined as the same way, and the total MS-SSIM
loss at i t h resolution is L M S S I Mi = L M S S I Mbi + L M S S I Msi . B. Datasets and Metrics
2) Perceptual Loss: Previous work [38] shows that cyclic
For the blurred text images, we use the dataset
perceptual-consistency losses have the ability to preserve
BMVC_TEXT [52] which totally contains 66K text images
original image structure by investigating the combination of
with the size 300 × 300. This dataset contains both defocus
high-level and low-level features extracted from the second
blur generated by anti-aliased disc and motion blur generated
and fifth pooling layers of VGG16 [49] architecture. Accord-
by a random walk. The blurred images in BMVC_TEXT are
ing to [38], the formulation of cyclic perceptual-consistency
divided into two parts: the training set and the test set (50% of
loss is given below, where (bi ,  bi ) refers to the blurred and
the total, and no crossover is ensured), and the corresponding
ground truth image set, φ is a VGG16 [38], [49] feature
sharp images are divided in the same way. During the training
extractor from the second and fifth pooling layers:
process, we crop the image into 128 × 128 image blocks
 2 in both the blur set and the sharp set. The parameter ω1
L Percept ualbi = φ(
bi ) − φ(bi )2 (12)
is set to 5, parameters ω2 and ω3 are set to 0.5, ω4 is set
Similarly, L Percept ualsi between  si and si is defined as the to 10 and ω5 is set to 0 in Eq. (15) because we find that
the perceptual loss L Percept ual has little impact on overall
same way, and the total perceptual loss at i t h resolution is
performance. To compare with other classical deblurring meth-
L Percept uali = L Percept ualsi + L Percept ualbi .
ods, we choose the algorithms given by Pan et al. [12], [31],
3) Identity Preserving Loss: In addition, we use an identity
Xu et al. [29], Sun et al. [34], MS-CNN [10], Deblur-
preserving loss to reinforce the identity information of the
GAN [14]. We also choose other unsupervised methods Cycle-
input image during the unpaired image-to-image translation.
GAN [1], Madam et al. [16] and UID-GAN [43] that trained
Thus, information such as the color of the input and output
on the same text training dataset with our unpaired data.
images can be mapped as accurately as possible. The identity
For the blurred face images, the CelebA dataset [53] which
preserving loss between the source domain and target domain
mainly includes more than 200K face images with size
can be defined as:
178 × 218 are used. We first select 200K data from the data
set, where 100K is the sharp images and the other 100K is
L I d bi = G B (b)i − bi 1 (13)
the blurred images. In addition, we select 2000 images from
L I d si = G S (s)i − si 1 (14) the remaining images for testing. We scale all the images to
128 × 128 and ensure that there is no paired data during the
The total identity preserving loss at i t h resolution is L I di = unsupervised algorithm training. The method of generating
L I d bi + L I d si . From the above loss functions described in blurred images by sharp images is consistent with the method
Eq. (1) ∼ Eq. (14), the total loss for our deblurring model proposed in UID-GAN [43]. The parameters ω1 ∼ ω4 are
is: set in the same way as BMVC_TEXT [52] dataset, and the
parameter ω5 is set to 5.

3
L = (L adv i + ω1 L cyclei + ω2 L Gradi + ω3 L M S S I Mi For the motion blurred images, the same as [10], we firstly
i=1 use the GoPro dataset proposed in [10] to train our model.
+ ω4 L I di + ω5 L Percept uali ) (15) Since our model is based on the unsupervised image-to-image
translation, during the training process, we firstly segregate the
where, ω1 , ω2 , ω3 , ω4 and ω5 are non-negative constants GoPro dataset into two parts. We just use the blurred images
to adjust different influence on overall deblurring effects. i from one part and the clean (sharp) image from the second
denotes the component at i t h resolution. Similar to other part so that there are no corresponding pairs while the training
previous methods [1], [10], parameters ω1 , ω2 , ω3 , ω4 and process. 2103 blurred/clear unpaired images in GoPro dataset
ω5 in Eq. (15) are set according to the data characteristics for are used for training and the remaining 1111 images are used
different cases and we weight each loss empirically to balance for evaluation. We ensure no overlap in the training pairs and
the importance of each component. randomly crop the image into 256 × 256 image blocks in both

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: STRUCTURE-AWARE MOTION DEBLURRING USING MULTI-ADVERSARIAL OPTIMIZED CycleGAN 6149

TABLE I
A BLATION S TUDY ON THE E FFECTIVENESS OF D IFFERENT C OMPONENTS
IN O UR M ODEL . A LL THE R ESULTS A RE T ESTED ON THE G O P RO
D ATASET [10]. G S M EANS THE T RANSLATION F ROM THE B LUR
D OMAIN TO THE S HARP D OMAIN , AND G B M EANS THE
T RANSLATION F ROM THE S HARP D OMAIN
TO THE B LUR D OMAIN

the blur set and the sharp set. The parameter ω1 is set to 5,
parameters ω2 and ω3 are set to 0.5, ω4 is set to 10 and
ω5 is set to 1 in Eq. (15). We use PSNR and SSIM two
metrics to show quantitative comparisons with other deblurring
algorithms.

C. Ablation Study
To analyze the effectiveness of each important component
or loss (perceptual etc.), we perform an ablation study in
this section. Both quantitative and qualitative results on the
GoPro dataset are presented for the following six variants of
our method by adding each component gradually: 1) origi- Fig. 5. Stability analysis for our proposed model. (a) The overall loss
nal CycleGAN method [1]; 2) adding the multi-adversarial variation. (b) The perceptual loss variation. (c) The multi-scale edge losses
structure; 3) adding edge map input component; 4) adding variation of our method at resolution 256×256. (d), (e) and (f) are the identity
loss variation at resolution 64 × 64, 128 × 128 and 256 × 256, respectively.
multi-scale edge constraints; 5) adding multi-scale SSIM loss; (a), (b), (c) and (d) show that different losses of our model can steadily
6) adding all the above components. decrease with the increase of iteration times during the training process.
We present the PSNR and SSIM for each variant in Table I. (d), (e) and (f) indicate the identity preserving loss of our model decrease
steadily with the increase of iteration times at different resolutions.
G S (blur-sharp) means the translation from the blurred domain
to the sharp domain, and G B (sharp-blur) means the trans-
lation from the sharp domain to the blurred domain. From To illustrate the stability of the proposed model, Fig. 5
shows the different loss change curves of our proposed meth-
Table I, we can see that the multi-adversarial structure signif-
ods. Fig. 5(a) is the overall loss variation curve. Fig. 5(b) is
icantly improves the deblurring performance because of the
multi-resolution constraints. Meanwhile, the structure-aware the perceptual loss variation curve. Fig. 5(c) is the multi-scale
edge losses variation of our method at resolution 256 × 256.
mechanism (with the edge as input and multi-scale edge
Fig. 5(d), Fig. 5(e) and Fig. 5(f) indicate that the identity
constraints) can also preserve the structure and details because
preserving loss of our model can decrease steadily with the
of the additional edge information and edge constraints. Even
increase of iteration times at different resolutions (64 × 64,
the original CycleGAN basically implements the unsupervised
translation from blurred to sharp and from sharp to blurred, 128 × 128 and 256 × 256, respectively). As seen from the
change curve of all losses, different types of losses and losses
it introduces the unpleasant noise information (colors, textures,
with different resolutions can steadily decline with the increase
etc.). In contrast, with adding the multi-adversarial structure,
discriminators are able to determine whether the resulting of iteration times during the training process, which fully
indicates that our model is relatively stable.
clear image is true or false from multiple resolutions and then
feedback to the generators. With the edge map as part of the
input, more structure-guided information can be transferred to D. Parameter Sensitivity
the target domain. With the multi-scale edge constraints to As we mentioned in Section III-E, the weight ω1 for
guide the deblurring process, some unwanted ringing artifacts cycle-consistency loss L cycle , ω4 for identity preserving loss
at the boundary of the generated images can be removed L I d , ω5 for perceptual loss L Percept ual need to be tuned
effectively. With the multi-scale SSIM loss, the generated so that the deblurred image neither stays too close to the
image can preserve the luminance, contrast and structure original blurred image, nor contains many artifacts. The
information effectively. The overall deblurring performance quantitative performance is shown in Fig. 6. From Fig. 6,
in Table I also shows that there is a close relationship we can see that parameter ω4 setting for L I d is greatly
between our multi-adversarial learning and the structure-aware different from the traditional CycleGAN based task (such as
mechanism. for Photo-Sketch). As our method is based on multi-resolution

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
6150 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

image is very blurred. In contrast, Fig. 7(e) and Fig. 7(f)


show that if the ω5 is set too high, vast artifacts will be
introduced to the generated images, especially in the colored
rectangular area. In real experiments, the parameters ω1 ∼ ω5
are set according to the data characteristics for different
cases.

E. Comparison With State-of-the-arts


1) BMVC_TEXT Dataset [52] and Face Dataset [53]: In
order to compare the performance of different algorithms on
the text images and face images, we use the same training
Fig. 6. Quantitative results for different setting of ω1 for cycle-consistency data (described in Section IV-B) to retrain the CNN-based
loss L cycle , ω4 for identity preserving loss L I d , ω5 for perceptual loss
L Perceptual . The orange bar chart represents the average PSNR value on methods. We randomly select 100 samples from the test set
the GoPro test set when parameter ω1 , ω4 and ω5 are set to 1, respectively. in the BMVC_TEXT dataset and 2000 samples from face
Correspondingly, the yellow bar represents the average PSNR value on the dataset [53] (as described in Section IV-B) for evaluation. The
GoPro test set when parameters ω1 , ω4 and ω5 are set to 5, respectively.
The green bar represents the average PSNR value on the GoPro test set when quantitative results are presented in Table II. The last column
represents ω1 , ω4 and ω5 are set to 10, respectively. We can see that different of Table II shows the quality metrics of our deblurred method.
parameter settings have a certain influence on the final deblurring effect. From Table II, we could conclude that our method significantly
outperforms other state-of-the-art supervised (Pan et al. [12],
Pan et al. [31], Xu et al. [29], Sun et al. [34], MS-CNN [10]
and DeblurGAN [14]) and unsupervised methods (CycleGAN
[1], UID-GAN [43] and Madam et al. [16]) for text images
and face images deblurring. Fig. 8 presents several examples
from the BMVC_TEXT dataset [52] to illustrate the qualitative
comparisons of other methods with ours. In Fig. 8, especially
in the central character part, the deblurring results by our
method can achieve the clearest characters. These examples are
sufficient to prove that our method can achieve quite effective
results on BMVC_TEXT dataset [52].
2) GoPro Dataset: Table III shows the quantitative com-
parison results with other state-of-the-art deblurring methods
on GoPro dataset [10]. The average PSNR and SSIM for
image quality assessment show our significant improvement
in deblurring effect compared with other popular methods.
From Table III we can see that, compared with almost all
the classical conventional deblurring algorithms (Xu et al.
[29], Whyte et al. [6] and Kim et al. [32]) and the latest
unsupervised CNN-based deblurring approaches (CycleGAN
[1], DiscoGAN [17], UID-GAN [43], Madam et al. [16]),
Fig. 7. Visualizations of sample image in GoPro dataset with different sets
of ω5 for perceptual loss L Perceptual . As shown in (d), when the ω5 is set our algorithm shows quite attractive deblurring effect. Mean-
to 0.1, the generated deblurred image is very blurred. As shown in (e) and while, compared with most supervised CNN-based deblurring
(f), when the ω5 is set too high (ω5 = 5 and ω5 = 10), vast artifacts will be methods (Pix2Pix [47] and Sun et al. [34]), we can still
introduced to cause quality degradation.
achieve relatively satisfactory results. Although our method
is slightly inferior to the supervised CNN-based method [10]
adversarial learning, L I d loss has a great impact on the overall and DeblurGAN [14] on GoPro, the reason is that it is more
deblurring effect, when ω4 is set to 10, the deblurring effect difficult to learn unpaired data compared with paired data and
is the best. For parameter ω1 is set too high (ω1 = 10), CycleGAN itself has performance flaws in handling the gener-
the deblurred image generated by G S becomes very blurred ation of high-resolution images. Meanwhile, our method can
and the quantitative performance is poor. In contrast, if the also achieve better performance on multiple other databases
ω1 is set too low (ω1 = 1), vast artifacts will be introduced. (such as BMVC_TEXT dataset [52] and face dataset [53]).
ω5 for perceptual loss L Percept ual also has a certain influence Additionally, methods [10] and [14] require a large amount of
on the overall deblurring effect. We set the parameters as paired training data, unlike our unsupervised learning, which
ω1 = 5, ω4 = 10 and ω5 = 1 on the GoPro test set. As shown can greatly reduce the strong need for paired training data.
in Fig. 6, many experiments have proved that relatively good Fig. 9 shows some visual examples from the GoPro [10]
results can be obtained when ω5 = 1. Fig. 7 also shows test set. It shows that even in some cases of the GoPro, our
the visualizations of sample image in GoPro dataset with approach is as desirable as method [10]. From Fig. 9, it is
different setting of ω5 for perceptual loss L Percept ual . From obvious that the classical conventional deblurring algorithm
Fig. 7(d), when the ω5 is set to 0.1, the generated deblurred cannot keep structure information well and most unsupervised

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: STRUCTURE-AWARE MOTION DEBLURRING USING MULTI-ADVERSARIAL OPTIMIZED CycleGAN 6151

TABLE II
P EAK S IGNAL - TO -N OISE R ATIO AND S TRUCTURAL S IMILARITY M EASURE , M EAN ON THE BMVC_TEXT [52] AND FACE D ATASETS [53]

TABLE III
P EAK S IGNAL - TO -N OISE R ATIO AND S TRUCTURAL S IMILARITY M EASURE , M EAN ON THE G O P RO D ATASET [10]

Fig. 8. Comparison of deblurred images by our method and other popular approaches on some images from BMVC_TEXT dataset [52]. (a) Blurred images.
(b) Deblurring results using Pan et al. [12]. (c) Deblurring results using Pan et al. [31]. (d) Deblurring results using Xu et al. [29]. (e) Deblurring results
using Sun et al. [34]. (f) Deblurring results using MS-CNN [10]. (g) Deblurring results using CycleGAN [1]. (h) Our results. It shows the characters in our
results are much clearer.

Fig. 9. Comparison of deblurred images by our method and other popular approaches on one sample from GoPro Dataset [10]. (a) Blurred image. (b) Deblurring
results using Pan et al. [12]. (c) Deblurring results using Xu et al. [29]. (d) Deblurring results using Sun et al. [34]. (e) Deblurring results using MS-CNN [10].
(f) Deblurring results using CycleGAN [1]. (g) Deblurring result using DiscoGAN [17]. (h) Our results. It shows our results are more satisfying, especially
in the pink and yellow rectangles.

methods will introduce new artifacts, while our method can on Köhler dataset in Fig. 10, which also verifies our better
better maintain the structure in the areas such as the girl’s performance compared with both supervised and unsupervised
head flower or arm. We also provide the visual contrast effect methods.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
6152 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

Fig. 10. Comparison of deblurred images by our method and other popular approaches on one sample taken from Köhler Dataset [55]. (a) Blurred image.
(b) Deblurring result using Pan et al. [12]. (c) Deblurring result using Xu et al. [29]. (d) Deblurring result using Sun et al. [34]. (e) Deblurring result using
MS-CNN [10]. (f) Deblurring result using CycleGAN [1]. (g) Deblurring result using DiscoGAN [17]. (h) Our results. It shows our results are more satisfying,
especially in the pink and yellow rectangles.

Fig. 11. Comparison of deblurred images by our method and other popular approaches on one real image taken from Lai Dataset [54]. (a) Blurred image.
(b) Deblurring result using [31]. (c) Deblurring result using [29]. (d) Deblurring result using [12]. (e) Deblurring result using [34]. (f) Deblurring result using
[16]. (g) Deblurring result using CycleGAN [1]. (h) Deblurring result using [17]. (i) Deblurring result using [47]. (j) Deblurring result by our method.

3) Real Dataset: In order to compare the effects of different involved) to form the winning matrix. The quantitative results
deblurring algorithms on real blurred images, we use the in Table IV show that the methods based on CNNs usually
model trained on GoPro data set to test the real blurred images have better effect than the convolutional methods, and our
in the real set of Lai dataset [54]. Since the real blurred method can achieve a more satisfied deblurring effect in real
images do not provide the corresponding sharp images, it is blurred images compared with most existing methods. From
impossible to evaluate the deblurring effect with the full ref- Fig. 11, our method shows superior performance compared
erence image quality evaluation methods (Such as SSIM and with other methods, especially in the girl’s eyes and mouth.
PSNR). Therefore, we compare the deblurring performance According to the above experiments, we can conclude that
of different algorithms in the real blurred images with the our method has obvious advantages in solving the deblur-
help of subjective user analysis. Inspired by [56], we use ring task on all the test datasets when comparing with the
the Bradley-Terry model to estimate the subjective score. most existing unsupervised deblurring methods [1], [16],
Each blurred image is processed with the deblurring methods [43]. We can also infer that our unsupervised deblurring
Pan et al. [12], Xu et al. [29], Whyte et al. [6], Sun et al. method can achieve competitive results with the supervised
[30], MS-CNN [10], CycleGAN [1] and DeblurGAN [14]. deblurring algorithm [10], [12], [14], [29] in most datasets
We test all these methods with corresponding models trained except for the GoPro dataset. We believe this is mainly due
on GoPro. Together with the original blurred images, all these to CycleGAN’s lack of ability to generate high-resolution
results are sent for pairwise comparison (22 human raters are images and the difficulty for unpaired data learning compared

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: STRUCTURE-AWARE MOTION DEBLURRING USING MULTI-ADVERSARIAL OPTIMIZED CycleGAN 6153

TABLE IV
AVERAGE S UBJECTIVE E VALUATION S CORES OF D EBLURRING P ERFORMANCE ON THE R EAL D ATASET [54]

TABLE V R EFERENCES
T HE AVERAGE RUNNING T IME C OMPARISONS OF O UR M ETHOD W ITH
O THER S EVERAL C LASSICAL M ETHODS ON [1] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
BMVC_TEXT D ATASET [52] translation using cycle-consistent adversarial networks,” in Proc. IEEE
Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2242–2251.
[2] V. Papyan and M. Elad, “Multi-scale patch-based image restora-
tion,” IEEE Trans. Image Process., vol. 25, no. 1, pp. 249–261,
Jan. 2016.
[3] M. Temerinac-Ott, O. Ronneberger, P. Ochs, W. Driever, T. Brox, and
H. Burkhardt, “Multiview deblurring for 3-D images from light-sheet-
based fluorescence microscopy,” IEEE Trans. Image Process., vol. 21,
no. 4, pp. 1863–1873, Apr. 2012.
with paired data. Since our deblurring method is based on [4] A. Danielyan, V. Katkovnik, and K. Egiazarian, “BM3D frames and
variational image deblurring,” IEEE Trans. Image Process., vol. 21,
unsupervised learning and can be trained with finite unpaired no. 4, pp. 1715–1728, Apr. 2012.
training data. Compared with other supervised-based methods, [5] W. Dong, L. Zhang, G. Shi, and X. Wu, “Image deblurring and super-
our unsupervised deblurring method has a wider application resolution by adaptive sparse domain selection and adaptive regular-
ization,” IEEE Trans. Image Process., vol. 20, no. 7, pp. 1838–1857,
value. Jul. 2011.
[6] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce, “Non-uniform deblurring
F. Evaluation of the Running Time for shaken images,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis.
Pattern Recognit., Jun. 2010, pp. 491–498.
Table V shows the average running time per image compar- [7] T. M. Nimisha, A. K. Singh, and A. N. Rajagopalan, “Blur-invariant
isons of several classical deblurring methods with 512 × 512 deep learning for blind-deblurring,” in Proc. IEEE Int. Conf. Comput.
on the test dataset of BMVC_TEXT dataset [52]. According Vis. (ICCV), Oct. 2017, pp. 4762–4770.
[8] C. J. Schuler, M. Hirsch, S. Harmeling, and B. Scholkopf, “Learning
to Table V, we can see that the proposed unsupervised to deblur,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 7,
method achieves the state-of-the-art deblurring quality, while pp. 1439–1451, Jul. 2016.
maintains relatively high and competitive speed in comparison [9] X. Xu, J. Pan, Y.-J. Zhang, and M.-H. Yang, “Motion blur kernel
estimation via deep learning,” IEEE Trans. Image Process., vol. 27,
to most existing supervised and unsupervised methods on no. 1, pp. 194–205, Jan. 2018.
BMVC_TEXT dataset [52]. Even though the time used is [10] S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convo-
slightly longer than CycleGAN [1] and MS-CNN [10] due to lutional neural network for dynamic scene deblurring,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
the multi-adversarial and multiple constraints structure, we get pp. 257–265.
a better deblurring effect. In future work, we are committed to [11] S. Vasu and A. N. Rajagopalan, “From local to global: Edge profiles to
further streamlining the network and improving its operational camera motion in blurred images,” in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jul. 2017, pp. 558–567.
efficiency. [12] J. Pan, Z. Hu, Z. Su, and M.-H. Yang, “Deblurring text images via L0-
regularized intensity and gradient prior,” in Proc. IEEE Conf. Comput.
V. C ONCLUSION AND F UTURE W ORK Vis. Pattern Recognit., Jun. 2014, pp. 2901–2908.
In this paper, we propose a structure-aware motion [13] S. Ramakrishnan, S. Pachori, A. Gangopadhyay, and S. Raman, “Deep
generative filter for motion deblurring,” in Proc. IEEE Int. Conf. Comput.
deblurring method based on a multi-adversarial optimized Vis. Workshops (ICCVW), Oct. 2017, pp. 2993–3000.
CycleGAN model. Unlike previous work, our CycleGAN [14] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas,
based method can avoid the error of the kernel estimation and “DeblurGAN: Blind motion deblurring using conditional adversarial
networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
does not need the paired training data to make the training Jun. 2018, pp. 8183–8192.
more flexible. In addition, the multi-adversarial constraints in [15] X. Tao, H. Gao, X. Shen, J. Wang, and J. Jia, “Scale-recurrent network
the generator of CycleGAN we used are different from the for deep image deblurring,” in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit., Jun. 2018, pp. 8174–8182.
traditional multi-scale manner to ensure that the results closest [16] N. T. Madam, S. Kumar, and A. N. Rajagopalan, “Unsupervised
to sharpening images are generated at different resolutions. class-specific deblurring,” in Proc. Eur. Conf. Comput. Vis. Cham,
Besides, we introduce a structure-aware method based on Switzerland: Springer, 2018, pp. 353–369.
[17] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover
edge clues so that the generated deblurred image can keep cross-domain relations with generative adversarial networks,” in Proc.
more structural information as much as possible. Extensive Int. Conf. Mach. Learn., 2017, pp. 1857–1865.
experiments on the different benchmark datasets demonstrate [18] J. Johnson, A. Alahi, and F.-F. Li, “Perceptual losses for real-time style
the effectiveness of the method we proposed. In the future, transfer and super-resolution,” in Proc. Eur. Conf. Comput. Vis., 2016,
pp. 694–711.
we are committed to solving the problem of significant target [19] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural
deblurring and further reducing the complexity of the network. similarity for image quality assessment,” in Proc. 37th Asilomar Conf.
Besides, we will further explore an unsupervised motion Signals, Syst. Comput., vol. 2, 2003, pp. 1398–1402.
[20] Y. Gan, X. Xu, W. Sun, and L. Lin, “Monocular depth estimation with
blur method with better performance and apply the proposed affinity, vertical pooling, and label enhancement,” in Proc. Eur. Conf.
network model to the video deblurring problem. Comput. Vis., 2018, pp. 232–247.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
6154 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

[21] C. J. Schuler, H. C. Burger, S. Harmeling, and B. Schölkopf, “A machine [47] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation
learning approach for non-blind image deconvolution,” in Proc. IEEE with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis.
Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 1067–1074. Pattern Recognit. (CVPR), Jul. 2017, pp. 5967–5976.
[22] S. Oh and G. Kim, “Robust estimation of motion blur kernel using a [48] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
piecewise-linear model,” IEEE Trans. Image Process., vol. 23, no. 3, quality assessment: From error visibility to structural similarity,” IEEE
pp. 1394–1407, Mar. 2014. Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[23] P. Chandramouli, M. Jin, D. Perrone, and P. Favaro, “Plenoptic image [49] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
motion deblurring,” IEEE Trans. Image Process., vol. 27, no. 4, large-scale image recognition,” in Proc. Int. Conf. Learn. Represent.,
pp. 1723–1734, Apr. 2018. 2015, pp. 1–14.
[24] Y. Wen, B. Sheng, P. Li, W. Lin, and D. D. Feng, “Deep color guided [50] A. Paszke et al., “Automatic differentiation in PyTorch,” in Proc. Neural
coarse-to-fine convolutional network cascade for depth image super- Inf. Process. Syst. Workshop, 2017, pp. 1–4.
resolution,” IEEE Trans. Image Process., vol. 28, no. 2, pp. 994–1006, [51] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimiza-
Feb. 2019. tion,” in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–15.
[25] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman, [52] M. Hradiš, J. Kotera, P. Zemčík, and F. Šroubek, “Convolutional neural
“Removing camera shake from a single photograph,” ACM Trans. networks for direct text deblurring,” in Proc. Brit. Mach. Vis. Conf.,
Graph., vol. 25, no. 3, pp. 787–794, Jul. 2006. 2015, pp. 6:1–6:13.
[26] Q. Shan, J. Jia, and A. Agarwala, “High-quality motion deblurring from [53] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes
a single image,” ACM Trans. Graph., vol. 27, no. 3, pp. 73:1–73:10, in the wild,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015,
2008. pp. 3730–3738.
[27] L. Xu and J. Jia, “Two-phase kernel estimation for robust motion [54] W.-S. Lai, J.-B. Huang, Z. Hu, N. Ahuja, and M.-H. Yang, “A com-
deblurring,” in Proc. Eur. Conf. Comput. Vis., 2010, pp. 157–170. parative study for single image blind deblurring,” in Proc. IEEE Conf.
[28] D. Krishnan, T. Tay, and R. Fergus, “Blind deconvolution using a Comput. Vis. Pattern Recognit., Jun. 2016, pp. 1701–1709.
normalized sparsity measure,” in Proc. CVPR, Jun. 2011, pp. 233–240. [55] R. Köhler, M. Hirsch, B. Mohler, B. Schölkopf, and S. Harmeling,
[29] L. Xu, S. Zheng, and J. Jia, “Unnatural L0 sparse representation for “Recording and playback of camera shake: Benchmarking blind decon-
natural image deblurring,” in Proc. IEEE Conf. Comput. Vis. Pattern volution with a real-world database,” in Proc. Eur. Conf. Comput. Vis.,
Recognit., Jun. 2013, pp. 1107–1114. 2012, pp. 27–40.
[30] L. Sun, S. Cho, J. Wang, and J. Hays, “Edge-based blur kernel estimation [56] O. Kupyn, T. Martyniuk, J. Wu, and Z. Wang, “DeblurGAN-v2: Deblur-
using patch priors,” in Proc. IEEE Int. Conf. Comput. Photography ring (orders-of-magnitude) faster and better,” in Proc. IEEE Int. Conf.
(ICCP), Apr. 2013, pp. 1–8. Comput. Vis., Oct. 2019, pp. 8877–8886.
[31] J. Pan, D. Sun, H. Pfister, and M.-H. Yang, “Deblurring images via dark
channel prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 10,
pp. 2315–2328, Oct. 2018.
[32] T. H. Kim and K. M. Lee, “Segmentation-free dynamic scene deblur-
ring,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, Yang Wen received the M.Eng. degree in com-
pp. 2766–2773. puter science from Xidian University, Xi’an, China,
[33] Y. Bai, H. Jia, M. Jiang, X. Liu, X. Xie, and W. Gao, “Single-image in 2015. She is currently pursuing the Ph.D. degree
blind deblurring using multi-scale latent structure prior,” IEEE Trans. in computer science with the Department of Com-
Circuits Syst. Video Technol., vol. 30, no. 7, pp. 2033–2045, Jul. 2020. puter Science and Engineering, Shanghai Jiao Tong
[34] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural University, Shanghai, China. Her current research
network for non-uniform motion blur removal,” in Proc. IEEE Conf. interests include motion deblurring, convolutional
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 769–777. neural networks, image/video processing, and com-
[35] D. Ren, W. Zuo, D. Zhang, J. Xu, and L. Zhang, “Partial deconvolution puter graphics.
with inaccurate blur kernel,” IEEE Trans. Image Process., vol. 27, no. 1,
pp. 511–524, Jan. 2018.
[36] D. Gong et al., “From motion blur to motion flow: A deep learning
solution for removing heterogeneous motion blur,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 3806–3815.
[37] C. Ledig et al., “Photo-realistic single image super-resolution using
a generative adversarial network,” in Proc. IEEE Conf. Comput. Vis. Jie Chen received the B.Eng. degree in com-
Pattern Recognit. (CVPR), Jul. 2017, pp. 105–114. puter science from Nanjing University, Nanjing,
[38] D. Engin, A. Genç, and H. K. Ekenel, “Cycle-Dehaze: Enhanced Cycle- China. She is currently a Senior Chief Engineer and
GAN for single image Dehazing,” in Proc. IEEE/CVF Conf. Comput. Senior Architect with Samsung Electronics (China)
Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2018, pp. 938–946. Research and Development Centre, Nanjing. She is
[39] L. Xu, J. S. J. Ren, C. Liu, and J. Jia, “Deep convolutional neural also the Head of the AI Department. Her current
network for image deconvolution,” in Proc. Neural Inf. Process. Syst., research interests include computer vision and big
2014, pp. 1790–1798. data.
[40] L. Li, J. Pan, W.-S. Lai, C. Gao, N. Sang, and M.-H. Yang, “Dynamic
scene deblurring by depth guided model,” IEEE Trans. Image Process.,
vol. 29, pp. 5273–5288, 2020.
[41] B. Lu, J.-C. Chen, and R. Chellappa, “Unsupervised domain-specific
deblurring via disentangled representations,” in Proc. IEEE Conf. Com-
put. Vis. Pattern Recognit., Jun. 2019, pp. 10217–10226.
[42] Q. Yuan, J. Li, L. Zhang, Z. Wu, and G. Liu, “Blind motion deblurring Bin Sheng (Member, IEEE) received the B.A.
with cycle generative adversarial networks,” Vis. Comput., vol. 36, no. 8, degree in English and the B.Eng. degree in computer
pp. 1591–1601, Aug. 2020. science from the Huazhong University of Science
[43] B. Lu, J.-C. Chen, and R. Chellappa, “UID-GAN: Unsupervised image and Technology, Wuhan, China, in 2004, the M.Sc.
deblurring via disentangled representations,” IEEE Trans. Biometrics, degree in software engineering from the University
Behav., Identity Sci., vol. 2, no. 1, pp. 26–39, Jan. 2020. of Macau, Taipa, Macau, in 2007, and the Ph.D.
[44] L. Wang, V. Sindagi, and V. Patel, “High-quality facial photo-sketch degree in computer science and engineering from
synthesis using multi-adversarial networks,” in Proc. 13th IEEE Int. The Chinese University of Hong Kong, Sha Tin,
Conf. Autom. Face Gesture Recognit. (FG ), May 2018, pp. 83–90. Hong Kong, in 2011. He is currently a Full Professor
[45] Z. Fu, Y. Zheng, H. Ye, Y. Kong, J. Yang, and L. He, “Edge-aware deep with the Department of Computer Science and Engi-
image deblurring,” CoRR, vol. abs/1907.02282, pp. 1–9, Jul. 2019. neering, Shanghai Jiao Tong University, Shanghai,
[46] Y. Bahat and M. Irani, “Blind Dehazing using internal patch recurrence,” China. His current research interests include virtual reality and computer
in Proc. IEEE Int. Conf. Comput. Photography (ICCP), vol. 8691, graphics. He is an Associate Editor of IEEE T RANSACTIONS ON C IRCUITS
May 2016, pp. 783–798. AND S YSTEMS FOR V IDEO T ECHNOLOGY .

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.
WEN et al.: STRUCTURE-AWARE MOTION DEBLURRING USING MULTI-ADVERSARIAL OPTIMIZED CycleGAN 6155

Zhihua Chen received the Ph.D. degree in com- Ping Tan (Senior Member, IEEE) received the Ph.D.
puter science from Shanghai Jiao Tong University, degree in computer science and engineering from
Shanghai, China, in 2006. He is currently a Full Pro- The Hong Kong University of Science and Technol-
fessor with the Department of Computer Science and ogy, Clear Water Bay, Hong Kong, in 2007. He is
Engineering, East China University of Science and currently an Associate Professor with the School
Technology, Shanghai. His current research inter- of Computing Science, Simon Fraser University,
ests include image/video processing and computer Burnaby, BC, Canada. His current research interests
vision. include computer vision and computer graphics.
He has served as an Area Chair for IEEE CVPR,
ACM SIGGRAPH, and ACM SIGGRAPH Asia.
He has served as an Editorial Board Member of
IEEE T RANSACTIONS ON PATTERN A NALYSIS AND M ACHINE I NTELLI -
GENCE and the International Journal of Computer Vision.

Tong-Yee Lee (Senior Member, IEEE) received


Ping Li (Member, IEEE) received the Ph.D. the Ph.D. degree in computer engineering from
degree in computer science and engineering from Washington State University, Pullman, in May 1995.
The Chinese University of Hong Kong, Sha Tin, He is currently a Chair Professor with the
Hong Kong, in 2013. He is currently a Research Department of Computer Science and Informa-
Assistant Professor with The Hong Kong Polytech- tion Engineering, National Cheng Kung University,
nic University, Kowloon, Hong Kong. He has one Tainan, Taiwan. He leads the Computer Graphics
image/video processing national invention patent, Group, Visual System Laboratory, National Cheng
and has excellent research project reported world- Kung University (http://graphics.csie.ncku.edu.tw).
wide by ACM TechNews. His current research inter- His current research interests include computer
ests include image/video stylization, colorization, graphics, non-photorealistic rendering, medical visu-
artistic rendering and synthesis, and creative media. alization, virtual reality, and media resizing. He is a member of the ACM.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 20,2021 at 04:18:24 UTC from IEEE Xplore. Restrictions apply.

You might also like