Fake Image Detection Using Transfer Learning
Divyasha Singh1, Tanjul Jain1, Nayan Gupta1, Bhavishya Tolani1 and Seeja K.R1
1
Department of Computer Science and Engineering,
Indira Gandhi Delhi Technical University for Women, Delhi, India
lncs@springer.com
Abstract. The volume of photos generated has increased dramatically in the last
decade as a result of technological advancements and easy access to the internet.
The authenticity of these photos must be guaranteed because they have such a
huge impact on people's lives and are sometimes used as evidence in the investi-
gation of serious crimes. Image fraud must be detected in order to protect the
image's integrity and legitimacy. Deep Fakes, Copy Move forgery, picture splic-
ing, GAN Generative Adversarial Networks (GANs), and other methods are used
to make fabricated images. Most forgery detection approaches and detectors are
focused on only one sort of fraud, whether computer or human generated, and
little progress has been made in constructing robust detectors that can success-
fully and efficiently deal with various types of forgery. As a result, the project's
purpose is to develop a reliable fake picture detector that can recognise phoney
photos created in a number of ways. A human-generated forgery detector is cur-
rently being developed. The CASIA v1 Dataset was used, which included both
legitimate and faked photos. Our model was trained on 80% of the dataset, with
the remaining 20% being used for testing. Transfer learning was the methodology
we used in our implementation. In this discipline, neural networks have continu-
ously performed well, and after extensive research, we fine-tuned the Xception-
Net model, which had a 51 percent accuracy over human-generated phoney pho-
tos. In the future, we'd like to construct a model that can handle both GAN-
generated and manually-created falsified photos.
Keywords: Fake Image, Real Image, Machine Learning, CNN, Copy Move
Forgery, GAN, Splicing.
1 Introduction
"A picture is worth a thousand words," we've all heard the cliche’. Images assist us in
learning, grabbing our attention, explaining difficult topics, and inspiring us, but with
the advent of technology, people are losing trust in the credibility of images. Social
media plays a significant part in people's everyday lives in this modern era. Most indi-
viduals use Twitter, Snapchat, Facebook, Instagram and other social media often to post
and share text, photographs, and videos. On social media, images are one of the most
extensively shared types of material. As a result, there is a demand for image surveil-
lance on social media.
2
We are surrounded with a huge volume of images especially from social media and
without a doubt, a whole lot of them are fake. Fake images can be extremely dangerous.
They have the potential to harm anyone’s reputation, spread false news or information,
lead to mob incitement, etc. Technology is advancing leaps and bounds when it comes
to image forgery so much so that it’s becoming more and more difficult to detect for-
gery by human eyes. Individuals and small businesses can now readily create and dis-
tribute these images in a short amount of time, jeopardising news credibility and public
faith in social media.
In the last decade, deep learning has seen tremendous success in the fields of com-
puter vision, image processing, and natural language processing. Deep neural networks
have outperformed humans in a variety of situations. Furthermore, GANs (Generative
Adversarial Networks) are a type of neural network in which two neural networks (gen-
erator and discriminator) fight to create high-quality outputs that are comparable to the
original inputs. GANs have been widely utilised to create new realistic pictures and to
improve existing ones. However, machine learning algorithms such as GANs can be
misused to generate misleading data and deceive humans. Fake faces created by GANs,
for example, have the potential to fool both people and machine learning classifiers.
Synthetic photographs for identification and authentication purposes, for example, can
be used maliciously.
Furthermore, advanced picture editing software such as Adobe Photoshop allows for
the alteration of complicated input photographs as well as the creation of high-quality
new images. These techniques have improved to the point that they can now build re-
alistic and intricate false pictures that are difficult to distinguish from the genuine thing.
YouTube has step-by-step directions and tutorials for making these sorts of fictitious
graphics. As a result, defamation, impersonation, and factual distortion are all possible
with these technologies. Furthermore, with social media, fraudulent material may be
swiftly and extensively shared on the Internet.
This makes it extremely crucial to develop a robust, efficient and effective image
forgery detector that can put a stop on these malicious intentions and restore the trust
of users on images.
2 Related Work
A lot of work has followed the field of fake image detection, which has unearthed new
techniques and algorithms. To assess the validity of the photographs, several previous
systems relied on image format features and metadata information. Despite the availa-
bility of numerous forensic tools and new algorithms for detecting forgery and fake
components in images, attackers are employing cutting-edge image processing and ma-
chine learning techniques to circumvent well-known forgery detection techniques,
3
making it a difficult problem to solve. With new advances in GAN based technology
everyday, it becomes increasingly difficult to differentiate between forged and real im-
ages. Some of the excellent detectors and methodologies developed by renowned schol-
ars are as below:
Muhammed Afsal Villan et al. [1], proposed a forgery detector that combines the
results obtained from metadata analysis and Error Level Analysis which originates from
the fact that metadata alone cannot be considered as a reliable method to detect a fake
image as metadata can be easily manipulated and certain image formats save limited
information. The compression ratio of authentic photos differs from that of foreign con-
tent in the phoney image, according to research. The exact same property is exploited
by Error Level Analysis by storing the image in JPEG format with a particular com-
pression percentage. The system begins by saving a high-quality image. After that, the
same image is turned to a 90% quality image. A difference method is used to determine
the difference between these two. The resultant picture is the ELA image. This image
is then submitted to a neural network to be processed further. When parts of a picture
are modified, this approach can be useful. ELA, on the other hand, is still unable to
recognise the different error levels in GANs-generated pictures. As a result, this strat-
egy is ineffective.
Francesco Marra et al. [2], described the generation of fake images using GANs
which comprises a system of two neural network models that compete for the ability to
process, capture, and copy variations within a dataset. The Generator fabricates data
samples in order to confuse the Discriminator, which is tasked with distinguishing be-
tween genuine and fraudulent samples. Both the Generator and the Discriminator com-
pete with each other throughout the training phase. It was found that Cozzolino2017
has the highest average accuracy for the original uncompressed image dataset. Cozzo-
lino is a shallow network that performs almost perfect classification for all alterations,
but fails terribly for certain cases. The steganalysis and XceptionNet features also show
promise, however they fail badly for some datasets. For compressed photos on training
mismatch that came from Twitter, most detectors in comparison, particularly those re-
lying on handmade features or shallow neural networks, such as Cozzolino2017 and
Steganalysis features, suffer greatly from this simple routine procedure. Deep networks,
particularly XceptionNet, have a stronger robustness, with an accuracy of 87.17%. For
compressed photos akin to those found on Twitter: Cozzolino and Steganalysis perform
better when training classifiers on compressed pictures, while XceptionNet continues
to lead with an accuracy of 89.03%.. Forged photographs and videos are more likely to
be published on social media in order to reach as many people as possible and promote
the fake news. When photographs are uploaded, they are usually compressed automat-
ically, which disrupts the delicate patterns that most classifiers look for. As a result,
when it comes to detecting picture counterfeiting, resilience is crucial.
Fake Images can be generated in various active and passive methods. The two main
methods that fall under the category of active method are watermarking and steganog-
raphy in which legitimate information is injected into the digital image. When there is
4
a requirement to evaluate the image's legitimacy, the previously saved information is
utilised to enlighten. Copy move forgery is the most prevalent way for passive tactics.
In copy move forgeries, a portion of the original image is copied and pasted into the
same image to conceal important information or simply duplicate visual elements. Be-
cause the cloned component comes from the same image, crucial attributes like noise,
colour, and texture remain unchanged, making the detection procedure much more dif-
ficult. Shubham Sharma et al. [3], present a survey of two popular methods to detect
copy move forgery: PCA (principal component analysis) and DCT (discrete cosine
transform). These are predicated on the fact that pixel location and value are both fairly
stable when analysing a still image, making pixel analysis quite simple. The detection
of copy-move fraud is mostly dependent on identifying similarities in a picture and
creating a link between real image elements and copied portions of the image. In block-
based approaches, the picture is split into fixed-dimension chunks, and additional char-
acteristics are retrieved for each block. The similarity observed between feature vectors
is used to identify forged blocks. PCA was found to be resistant to lossy compression
and additive noise, but not to be able to detect scaling or rotation modifications, accord-
ing to the survey. The DCT coefficients-based features are resistant to compression,
noise and retouching, but they are ineffective in detecting scaled or rotated copied
blocks.
Although the quantity of false videos is growing, there are still limitations in terms
of establishing a benchmark for verifying various detection methods.To overcome this
issue, Korshunov and Marcel [4] used the open source code Faceswap-GAN to con-
struct a huge deepfake dataset of 620 movies based on the GAN model. Deepfake films
were made in low and high resolution utilising recordings from the VidTIMIT database,
which can efficiently replicate facial emotions, lip movements, and eye blinking. The
effectiveness of several deepfake detection methods was then tested using these mov-
ies. According to test results, popular facial recognition algorithms based on VGG [5]
and Facenet are unable to detect deepfakes. Other methods, such as lip-syncing meth-
odologies [6]–[8] and picture quality metrics with support vector machine (SVM), have
very high error rates when used to detect deepfake films from this newly produced
dataset.
The GAN-based deepfake detection was portrayed as a hypothesis testing problem
by Agarwal and Varshney [9], who used the information-theoretic study of authentica-
tion to introduce a statistical framework. The oracle error is the smallest difference be-
tween true picture distributions and images generated by a certain GAN. The analytic
findings show that as the GAN becomes less exact, this gap increases, making deep
fakes easier to spot. To generate difficult-to-detect fake images with high-resolution
image inputs, an extremely accurate GAN is required.
Njood Mohammed AlShariah et al. [10] constructed a model employing deep algo-
rithmic learning, such as Alexnet network, CNN - Convolutional Neural Network and
transfer learning from Alexnet, to utilise the capability of CNN for fraudulent picture
identification especially in the field of Social Media Platform. The networks were
5
employed and compared in the research, with some variances in training. Each of the
layer activation layer, convolution layer , softmax layer and pooling layer were respon-
sible for completing a given task. The input picture was first collected from the image
acquisition process. From these patches, the picture was then turned into non-overlap-
ping patches. To produce a smaller feature set, the values of the features were normal-
ised and down-sampled. Finally, the output's probability was calculated to categorise
the provided image as normal or fraudulent. The Alexnet method has been shown to
detect counterfeit photographs more successfully than standard approaches, with up to
97 percent accuracy which exemplifies the superiority of the established approaches.
However, when the model was run on untrained data, the accuracy was comparatively
lower, limiting it’s capacity to detect forged images on Social Platform.
Shahroz Tariq et al. [11], proposed a shallow convolutional neural network architec-
ture called Shallow Convolutional Network for GAN generated images (ShallowNet).
They created three separate ShallowNet versions, each with different layer settings. On
tiny pictures, ShallowNetV1 has a poorer performance. As a result, they built shallower
structures in V2 and V3, with V2 and V3 having identical depths. However, the most
significant change in V3 is the addition of the Max pooling layer, which improves speed
on tiny pictures. Another advantage of the method is that the training period is greatly
shortened due to the shallow layers. On the other hand, the detection technique for rec-
ognising human-created fake face photos is broken into two stages: the first step in-
volves pre-processing to trim and filter face areas. The classifier model is trained to
differentiate false photos made by humans from unmodified actual photographs once
the cropped and aligned faces have been received. For cropping and detecting faces
MTCNN is used which has been proved to have the highest accuracy. For fake image
detection various CNN based models are trained like VGG16, VGG19 , ResNet, Dense-
Net, NASNet, XceptionNet, and ShallowNetV1. For human generated fake images,
XceptionNet stands tall by surpassing other competing systems. It was found that Shal-
lowNET beats other neural network models when it comes to GAN-generated false face
photos. Although XceptionNet works admirably, it fails terribly in distinguishing be-
tween actual and GANs-generated images for images with lower resolution. As a result,
accuracy decreases as picture resolution decreases for a XceptionNet.
3 Dataset
The dataset we have used is the CASIA v1 dataset from Kaggle. It consists of both
authentic images and forged images. To reduce the training time for the neural network
model we have reduced the size of the dataset. It covers images from classes related to
animals, plants, architecture and nature. Each class contains 100 images from each class
in an authentic folder. The forged images are manipulated by copy move forgery and
splicing technique. There are 114 images from copy move forgery and 139 images from
spliced category. The images are in JPG format, with dimensions of the image being
384*256 pixels. We have used 80% of both the fake and authentic images for training
and 20% of images for testing.
6
4 Implementation
4.1 Implementation of the Model
We utilised XceptionNet as our basic model, which was pre-trained on the Imagenet
dataset, which is used in computer vision research to (manually) classify and categorise
photographs into approximately 22,000 different item categories.
We first introduced three unique layers to the pretrained models so that they could
be trained on our dataset for detecting fake vs real photos. We then added a Flatten
layer to flatten all of our features and a Dropout layer to avoid overfitting to avoid
overfitting. Finally, the Dense output layer was added, which was triggered using the
softmax function. Because the first half of the model has already been pre-trained, the
trainable property of the preceding layers was set to False. Finally, we built the model
using the Adam optimizer with category cross entropy as the loss function.
4.2 Preprocessing of dataset
The photos in the Dataset were all different sizes. As a result, we had to resize them to
a set size before feeding them to the deep learning models for training. We resized the
photographs to 299 by 299 pixels, which is the Xception model's recommended size.
As a result, we gave the pretrained Xception model the input tensor of shape (299, 299,
3), with 3 being the number of channels.
4.3 Training the model
We designed an Image Data Generator to train the models with different angles, flips,
rotations, and shifts of the photographs. The model was then trained with all of the
necessary parameters.
4.4 Making Predictions
On images from the test set, the trained models were used to make predictions.
Fig. 1. Classification of images as real and fake.
7
5 Results
First, we have plotted a confusion matrix for the binary classification we are trying to
achieve. A confusion matrix is a plot that is used to define the performance of a classi-
fication algorithm. The performance of a classification algorithm is shown and summa-
rised using a confusion matrix. False positives, true negatives ,true positives, and false
negatives are all values in the confusion matrix. The confusion matrix depicted on our
model is as follows:
Fig. 2. Confusion matrix obtained.
True positives (TP) happen when you expect an observation to belong to a specific
class and it does. This is a precise forecast. When a genuine image is predicted as a real
image, this is the case. True positive has a value of 0.31.
True negatives (TN) occur when you forecast that an observation does not belong to
a class and it does not. This is an accurate prediction. This is the case when a fake image
is predicted as a fake image. The value of true negative is 0.64.
A false positive occurs when you expect an observation to belong to a class but it
doesn't (FP). This is an incorrect prediction. This is the case when a fake image is pre-
dicted as a real image. The value of false positives is 0.36.
8
When you forecast that an observation does not belong to a class when it actually
does, you get a false negative (FN). This is an incorrect prediction. This is the case
when a real image is predicted as a fake image. The value of false negative is 0.69.
Fig. 3. Performance metrics of the model.
The graph given below shows an ROC curve or receiver operating characteristic
curve is a graph that depicts the classification model’s performance over all other cate-
gorization levels. The ROC curve for our model is shown below:
Fig. 4. ROC Curve Obtained for the model.
The curve is nearly linear, indicating that both the true and the false positive rate are
increasing at the same pace.
9
6 Conclusion
To conclude we worked on binary classification of the images as forged or authentic.
With advancement of AI and its application we have incorporated one of its deep learn-
ing algorithms for classification. We have used a pretrained model named as Xception
Net which was trained initially on the ImageNet dataset from Kaggle. On training the
model with 500 epochs on the training dataset we got 51.
7 Future Work
7.1 Creation of our own model to correctly classify images with better
accuracy:
There is still a lot of work to be done in the subject of fake image detection. Forgery
detectors with great efficiency are urgently required. It is necessary to create powerful
forgery detectors that can detect all kinds of forgeries. The existing Neural Network
models mostly focus on picture categorization; however, we also require neural net-
work models for fraudulent image identification.
To expand on this project, we'd like to develop our own model for classifying photos
from both GAN-generated and human-generated datasets, which would be capable of
doing classification with a higher degree of accuracy.
7.2 Fine Tune the Created Model for human generated fake images:
The accuracy currently obtained is 51% which is less than what XceptionNet is capable
of delivering. Therefore, we aim at fine tuning the model developed in order to achieve
a better accuracy.
7.3 Extensive training and testing on a variety of datasets:
To shorten the training time, we trained our model on a tiny fraction of human-gener-
ated photos. As a follow-up, we'll focus on training our model on a variety of datasets,
including human-generated and GAN-generated images, so that the new model can bet-
ter handle 'unseen' data if it's supplied to it.
7.4 Build an easily accessible simple User Interface Software to enable easier
detection of images by users:
We can focus on designing an easily accessible simple user interface programme after
the model is accurate enough to make digital interaction as simple, fluid, intuitive, and
10
efficient as possible. We'd strive to anticipate user needs and make things as simple as
possible to access, understand, and use, resulting in a great user experience.
A good user interface design should be unnoticeable. The user interface is meant to
give the user quick and easy access to the information they need.
References
1. Villan, M. A., Kuruvilla, A., Paul, J., & Elias, E. P. (2017). Fake Image Detection Using
Machine Learning. IRACST-International Journal of Computer Science and Information
Technology & Security (IJCSITS).
2. Marra, F., Gragnaniello, D., Cozzolino, D., & Verdoliva, L. (2018, April). Detection of gan-
generated fake images over social networks. In 2018 IEEE Conference on Multimedia In-
formation Processing and Retrieval (MIPR) (pp. 384-389). IEEE.
3. Sharma, S., Verma, S., & Srivastava, S. Detection of Image Forgery.
4. Korshunov, P., and Marcel, S. (2019). Vulnerability assessment and detection of deepfake
videos. In The 12th IAPR International Conference on Biometrics (ICB), pp. 1-6.
5. Parkhi, O. M., Vedaldi, A., and Zisserman, A. (2015, September). Deep face recognition. In
Proceedings of the British Machine Vision Conference (BMVC) (pp. 41.1-41.12).
6. Chung, J. S., Senior, A., Vinyals, O., and Zisserman, A.(2017, July). Lip reading sentences
in the wild. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(pp. 3444-3453).
7. Suwajanakorn, S., Seitz, S. M., and Kemelmacher-Shlizerman, I. (2017). Synthesizing
Obama: learning lip sync from audio. ACM Transactions on Graphics (TOG), 36(4), 1–13.
8. Korshunov, P., and Marcel, S. (2018, September). Speaker inconsistency detection in tam-
pered video. In 2018 26th European Signal Processing Conference (EUSIPCO) (pp. 2375-
2379). IEEE.
9. Agarwal, S., and Varshney, L. R. (2019). Limits of deepfake detection: A robust estimation
viewpoint. arXiv preprint arXiv:1905.03493.
10. Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018, December). Mesonet: a compact
facial video forgery detection network. In 2018 IEEE International Workshop on Infor-
mation Forensics and Security (WIFS) (pp. 1-7). IEEE.
11. Tariq, S., Lee, S., Kim, H., Shin, Y., & Woo, S. S. (2018, January). Detecting both machine
and human created fake face images in the wild. In Proceedings of the 2nd international
workshop on multimedia privacy and security (pp. 81-87).