Image Tampering Detection Report With Alternatives
Image Tampering Detection Report With Alternatives
Abstract
Malicious image tampering has gradually become another way to threaten social
stability and personal safety. Timely detection and precise positioning can help reduce
the occurrence of risks and improve the overall safety of society. Due to the limitations
of highly targeted dataset training and low-level feature extraction efficiency, the
generalization and actual performance of the recent tampered detection technology
have not yet reached expectations. In this study, we propose a tampered image
detection method based on RDS-YOLOv5 feature enhancement transformation. Firstly,
a multi-channel feature enhancement fusion algorithm is proposed to enhance the
tampering traces in tampered images. Then, an improved deep learning model named
RDS-YOLOv5 is proposed for the recognition of tampered images, and a nonlinear loss
metric of aspect ratio was introduced into the original SIOU loss function to better
optimize the training process of the model. Finally, RDS-YOLOv5 is trained by
combining the features of the original image and the enhancement image to improve
the robustness of the detection model. A total of 6187 images containing three forms
of tampering: splice, remove, and copy-move were used to comprehensively evaluate
the proposed algorithm. In ablation test, compared with the original YOLOv5 model,
RDS-YOLOv5 achieved a performance improvement of 6.46%, 5.13%, and 3.15% on F1-
Score, mAP50 and mAP95, respectively. In comparative experiments, using SRIOU as
the loss function significantly improved the model’s ability to search for the real
tampered regions by 2.54%. And the RDS-YOLOv5 model trained by the fusion dataset
further improved the comprehensive detection performance by about 1%.
Introduction
As one of the important ways for human beings to perceive and understand the world,
images have gradually been integrated into all aspects of our daily life. With the
vigorous development of science and technology as well as people’s growing
material and spiritual needs, a variety of image “beauty” software has also been
favored by more and more people. People can easily and conveniently make various
custom edits and modifications to the image to achieve their purpose and needs, but
it also creates a lot of problems and troubles. The falsification of identity photo
information, the tampering of receipt type images, and the malicious smearing and
exaggerated rendering of real images have gradually become new means to endanger
social security. Therefore, rapid and effective detection and localization of tampering
regions in images is of crucial significance for the security and stable development of
society.
Currently, there are three main types of image tampering forms: copy-move, remove,
and splice. Generally speaking, these three operations will leave more or less traces of
digital processing on the original image, which can be used as an effective basis for
discrimination. However, for some exquisitely retouched images, due to the addition
of image filtering, smoothing, light adjustment and other subsequent processing, it is
difficult for ordinary people to distinguish such images, and even for professionals, it is
often difficult to see through at a glance. This has prompted relevant scholars to try to
use artificial intelligence technology to capture the traces of image tampering.
Muhammad Hussain et al. proposed a multi-resolution Weber law descriptor based on
the tampering detection method of copy-move images, which tries to extract features
from the chromaticity components of the image to capture the relative information of
different neighborhoods in the image, and then uses the support vector machine to
classify it to obtain the final discriminant result1. Jiachen Yang et al. proposed a multi-
scale self-texture attention generation network, which simulated the forgery process
based on image generation from the perspective of image generation, and traced
potential forgery traces and texture features. Finally, based on the loss function of the
classification probability constraint, the generation and monitoring process of forgery
traces can be directly corrected2. Shilpa Dua et al. proposed an image forgery
detection technology based on image artifact coding analysis, which divides the JPEG
compressed image into several non-overlapping blocks, and then performs an
independent discrete cosine transformation on each block, generates the
corresponding feature vectors through the evaluation and counting of the
transformation coefficients, and then uses the support vector machine to realize the
image discrimination3. Rani Susan Oommen et al. proposed a hybrid method based
on Local Shape Dimension (LFD) and Singular Value Decomposition (SVD) to effectively
detect and locate duplicate forgery regions in images. Firstly, the image is divided into
fixed-size fast, the shape dimension of the local part is estimated, and then the image
blocks are arranged into a B + tree based on the LFD value, and the image blocks in
each fragment are compared with the singular value, so as to narrow the comparison
range, effectively reduce the complexity of calculation and quickly screen the
duplicate parts in the image4. Other tampering detection methods using wavelet
transform, DCT features, and variant neural networks have also achieved certain
results in their respective screening tasks5,6,7,8,9,10,11,12. However, most of the
proposed methods are for specific types of images and data, and the relevant deep
learning models are also inefficient at extracting features from the input original
tampering images.
Splicing
Splicing is a technique where different parts of multiple images are combined to form
a single, deceptive image. This type of tampering often involves merging multiple
photographs or altering the content of individual photographs to create a misleading
representation of an event or object.
Inpainted
Inpainted images involve the removal or alteration of objects or areas within an image
by filling in the gaps with newly created, artificial content. The goal of this technique is
to ensure that the inpainted sections blend seamlessly with the rest of the image,
making it challenging to detect any modifications.
Deepfakes
From the initial search, several promising avenues for mobile-optimized image
tampering detection have emerged. These include approaches leveraging lightweight
deep learning models and techniques specifically designed for mobile environments.
Key Models and Approaches:
RDS-YOLOv5: The Nature article [1] discusses an improved YOLOv5 model (RDS-
YOLOv5) for tampered image detection. This model incorporates a multi-channel
feature enhancement fusion algorithm and optimizes the loss function. While not
explicitly stated as mobile-optimized, YOLOv5 models are generally known for
their efficiency and can be adapted for mobile deployment.
When selecting an optimal model architecture for Android deployment, the following
factors are critical:
Model Size: Smaller models consume less storage space on the device and are
faster to download and load.
Inference Speed: The model must be able to process images quickly on mobile
hardware to provide a responsive user experience.
Computational Resources: Mobile devices have limited CPU, GPU, and memory.
The chosen model should have low computational complexity.
Accuracy: Despite optimization, the model must maintain a high level of
accuracy in detecting various types of image tampering.
Initial Assessment:
References:
Advantages for Mobile: Its small model size and efficient inference make it
suitable for on-device processing, reducing latency and reliance on cloud
resources.
MobileNetV3:
To effectively train and validate the chosen model, access to comprehensive image
tampering datasets is essential. The search results provided several valuable
resources:
Original and Tampered Image Dataset (Kaggle): [13] This dataset is specifically
designed for training and validating machine learning models for classifying
tampered images.
CASIA 2.0 Image Tampering Detection Dataset (Kaggle): [14] A widely used
dataset in image forensics research, containing various types of tampering.
Image Forgery Datasets List (GitHub): [15] A curated list of various image
forgery datasets, offering a broader range of tampering types and image sources.
CG-1050 Dataset: [16] This dataset contains 100 original images, 1050 tampered
images, and their corresponding masks, useful for both detection and
localization tasks.
Considering the need for a robust and efficient solution for mobile deployment,
MobileNetV3 appears to be the optimal choice. While MobileNetV2 is highly capable,
MobileNetV3 offers further architectural improvements that lead to better
performance and efficiency. Its suitability for deepfake detection also suggests its
potential for handling more sophisticated tampering techniques. The availability of
relevant datasets will be crucial for training and fine-tuning the chosen MobileNetV3
model.
References:
These models are typically pre-trained on the ImageNet dataset for image
classification. While ImageNet does not contain tampered images, the features
learned by the model (e.g., edge detection, texture recognition) are highly
transferable to other image-related tasks, including anomaly detection like
image tampering.
Fine-tuning involves adapting the pre-trained MobileNetV3 model to the specific task
of image tampering detection using a relevant dataset. The process generally includes
the following steps:
Dataset Preparation: Utilize image tampering datasets such as CASIA 2.0, CG-
1050, or custom datasets containing both authentic and tampered images. The
dataset should be split into training, validation, and test sets.
Model Modification: The top layers of the pre-trained MobileNetV3 model (the
classification head) will be removed. New layers, typically a global average
pooling layer, a dense layer, and a final classification layer (with a sigmoid
activation for binary classification or softmax for multi-class classification if
different tampering types are to be identified), will be added.
Freezing Base Layers: Initially, the weights of the pre-trained MobileNetV3 base
layers will be frozen. This prevents large gradient updates from corrupting the
learned features during the initial training phase with the new top layers.
Training New Layers: The model will be trained for a few epochs, allowing the
newly added layers to learn to classify tampered images based on the features
extracted by the frozen base model.
Unfreezing and Fine-tuning: After the initial training, a portion or all of the base
layers can be unfrozen. The entire model is then fine-tuned with a very low
learning rate. This allows the model to slightly adjust the pre-trained weights to
better suit the image tampering detection task while preserving the general
features learned during pre-training.
Loss Function and Optimizer: Binary cross-entropy is a suitable loss function for
binary classification (tampered/authentic). Optimizers like Adam or RMSprop are
commonly used.
Output: The conversion process will generate a .tflite file, which is the
optimized model ready for integration into the Android application.
Example Workflow (Conceptual Python Code):
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# 2. Fine-tuning setup
base_model.trainable = False # Freeze the base model
model.compile(optimizer=keras.optimizers.Adam(),
loss=keras.losses.BinaryCrossentropy(),
metrics=[keras.metrics.BinaryAccuracy()])
tflite_model = converter.convert()
This .tflite model will then be integrated into the Android application for on-device
image tampering detection.
Android Application Architecture and Development
Process
User Interface (UI): Activities and Fragments for user interaction, image display,
and result presentation.
TFLite Inference Module: Encapsulates the logic for loading the TFLite model
and performing inference on input images.
TensorFlow Lite Library: Include the necessary TensorFlow Lite AAR (Android
Archive) library in the build.gradle file to enable TFLite model loading and
inference. gradle dependencies { implementation
'org.tensorflow:tensorflow-lite-task-vision:0.4.0' // Or the latest
version implementation 'org.tensorflow:tensorflow-lite-gpu:2.x.x' //
For GPU delegation, if supported // Other necessary AndroidX
libraries }
Image Selection: Implement an Intent to allow users to pick images from their
gallery ( ACTION_PICK or ACTION_GET_CONTENT ).
Model Loading: Load the TFLite model using Interpreter or Task API
(recommended for common vision tasks). ```java // Using Task API (example for
image classification) ImageClassifier.ImageClassifierOptions options =
ImageClassifier.ImageClassifierOptions.builder()
.setBaseOptions(BaseOptions.builder().useGpu().build()) .setMaxResults(1) // Or
more, depending on output .build(); ImageClassifier imageClassifier =
ImageClassifier.createFromFileAndOptions(context,
"image_tampering_detector.tflite", options);
Post-processing: Convert the model's raw output (e.g., a float value between 0
and 1) into a human-readable format (e.g., a percentage or a clear "Tampered" or
"Authentic" label).
UI Update: Update the UI to display the detection result. This could involve:
A simple text message: "Image is likely tampered (75% confidence)."
2.6 Permissions
Implement robust error handling for cases like invalid image selection, model
loading failures, or inference errors.
Provide clear feedback to the user throughout the process (e.g., "Analyzing
image...", "Analysis complete.", "Error: Could not process image.").
3. Development Best Practices
User Experience (UX): Design an intuitive and responsive UI. Provide progress
indicators during analysis.
This outline provides a roadmap for developing the Android application, focusing on
integrating the MobileNetV3 TFLite model for effective image tampering detection. The
next phase will focus on testing and validation.
Thorough testing and validation are crucial to ensure the reliability, accuracy, and
performance of the image tampering detection mobile application. The testing
process will encompass various aspects, from functional correctness to model
accuracy and user experience.
1. Functional Testing
Functional testing will verify that all features of the Android application work as
intended. This includes:
2. Performance Testing
Performance testing will assess the application's efficiency and resource utilization on
target Android devices. Key metrics to evaluate include:
Inference Speed: Measuring the time taken for the TFLite model to process an
image and generate a prediction. This should be optimized for near real-time
performance.
Test Dataset: Using a dedicated, unseen test dataset of both authentic and
various types of tampered images (copy-move, splicing, inpainting, deepfakes).
This dataset should be distinct from the training and validation datasets used
during model development.
ROC Curve and AUC: For binary classification, the Receiver Operating
Characteristic (ROC) curve and Area Under the Curve (AUC) provide a
comprehensive view of the model's performance across different
classification thresholds.
Edge Cases and Adversarial Examples: Testing the model against challenging
cases, such as subtly tampered images, images with various compression
artifacts, or images that have undergone post-processing (e.g., resizing, re-
compression) after tampering. This helps assess the model's robustness.
User Studies (Optional but Recommended): Involving actual users to test the
application with their own images and provide feedback on the perceived
accuracy and usability. This can reveal real-world scenarios and limitations not
captured by synthetic datasets.
4. Usability Testing
Usability testing will ensure that the application is intuitive and easy to use for the
target audience. This includes:
User Interface (UI) Clarity: Assessing if the UI elements are clear, well-
organized, and easy to understand.
Workflow Efficiency: Evaluating how smoothly users can navigate through the
application's workflow, from image selection to result viewing.
Feedback Mechanisms: Checking if the application provides clear and timely
feedback to the user about the status of operations and the results of the
analysis.
By systematically performing these testing and validation steps, the image tampering
detection mobile application can be refined to deliver a robust, accurate, and user-
friendly solution for digital forensics on Android devices.
Given the challenges in directly implementing and building the mobile application
within the sandbox environment, exploring existing solutions is a viable alternative.
This section focuses on identifying and evaluating existing mobile applications and
open-source projects for image tampering detection.
While the search results did not yield a definitive, all-encompassing mobile application
for image tampering detection, several tools and applications offer relevant
functionalities:
2. Open-Source Projects
The search for open-source Android applications specifically for image tampering
detection yielded limited results. However, some projects provide building blocks or
related functionalities:
Android-Tamper-Detector (GitHub):
https://github.com/mukeshsolanki/Android-Tamper-Detector Similar to the
above, this library helps detect if an Android app has been modded or tampered
with.
Given the current landscape, there is a clear opportunity for a dedicated, open-source
Android application for image tampering detection. While a complete, ready-to-use
solution is not readily available, the following approaches can be considered:
Develop a New Application: The detailed design and roadmap provided in the
previous sections of this report can be used as a blueprint to develop a new
Android application. This would involve:
Following the outlined steps for model selection (MobileNetV3), fine-tuning,
and TensorFlow Lite conversion.
In conclusion, while a direct implementation was not feasible within the sandbox, the
provided documentation and exploration of alternative approaches offer a
comprehensive solution for addressing the user's request. The detailed report serves
as a valuable resource for building the desired mobile application, and the
investigation of existing tools provides immediate options for basic image analysis.