Backup
Backup
A PROJECT REPORT
Submitted By
PAZHANIRAJ V (422519205030)
SRIHARI S (422519205040)
VETRIVEL R (422419205044)
MAY 2023
i
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
i
ACKNOWLEDGEMENT
We wish to express our sincere thanks and gratitude to our Dean In-Charge
Dr. R SENTHIL M.E, Ph.D., for offering us all the facilities to do the project.
We also express our sincere thanks to Dr. P SEENUVASAN M.E, Ph.D.,
Head of the Department, Department of Information Technology for his support,
guidance and for successful completion and implementing our valuable idea.
We would like to thank our Coordinator Dr. E KAVITHA M.E, Ph.D., for
their guidance and encouragement in bringing out this project successfully.
We express our sincere thanks to our Guide Dr. P SEENUVASAN M.E,
Ph.D., our internal project guide, Department of Information Technology, University
College of Engineering Villupuram, for his valuable suggestions and constant
encouragement.
We would like to thank all the Faculty Members, Technical Staff Members
and Support Staff Members in our department for their guidance to finish this
project successfully. We also like to thank all our family and friends for their willing
assistance.
This project consumed a huge amount of work, research and dedication. Still,
implementation would not have been possible if we did not have the support of many
individuals and organizations. We would like to extend our sincere gratitude to all of
them.
PAZHANIRAJ V
SRIHARI S
VETRIVEL R
ii
ABSTRACT
poses many threats, including operation errors, security attacks, and hardware failure.
Data backup is essential for protecting against these threats, and cloud backup
systems commonly used for disaster recovery. However, traditional cloud backup
systems may not offer sufficient data privacy and reliability. To address these
challenges, this work proposes a cloud-based backup system that uses multiple cloud
providers to enhance redundancy and reliability. The system also uses fog computing
to improve backup and restore performance by processing and storing data locally.
The proposed backup system uses AES encryption and supports both symmetric and
public-key encryption algorithms to ensure data security. This system can be used in
various applications where data privacy and reliability are crucial, such as enterprise
data storage, disaster recovery, and personal data backup. Future work includes
iii
TABLE OF CONTENTS
ABSTRACT iii
TABLE OF CONTENTS iv
1 INTRODUCTION 1
1.2 TYPES 1
2 LITERATURE SURVEY 5
3 PROPOSED SYSTEM 16
iv
TABLE OF CONTENTS
v
TABLE OF CONTENTS
4 IMPLEMENTATION 35
5.1 CONCLUSION 39
APPENDIX 40
Source Code 40
REFERENCES 48
vi
LIST OF FIGURES
vii
LIST OF ABBREVIATION
LZ77 - Lempel-Ziv-1977
viii
CHAPTER 1
INTRODUCTION
1.2 Types
1
1.2.1 Types of Cloud Services
There are three types cloud services,
i) Software-as-a-service (SaaS)
ii) Infrastructure-as-a-service (IaaS)
iii) Platform-as-a-service (PaaS)
i) Software-as-a-service (SaaS)
It involves the licensing of a software application to customers. Licenses
are typically provided through a pay-as-you-go model or on-demand. This type
of system can be found in Microsoft Office's.
2
1.2.2 Deployment Model Types
There are four types of deployment model
i) Public Cloud
ii) Private Cloud
iii) Hybrid Cloud
iv) Community Cloud
i) Public Cloud
3
1.3 Multi Cloud
Multi-Cloud is a heterogeneous architecture utilizing various cloud computing
and storage facilities, which can come from a public cloud, a private cloud, or as
standalone cloud-like on-premise facilities. If the Multi-Cloud architecture is used,
users are aware of the multiple clouds responsible for managing the resources and the
services, or a third party is responsible for managing them. There are various reasons
to adopt a Multi Cloud architecture, including reducing dependency on any single
provider, cost efficiency, flexibility in choice, and disaster immunity. Additionally,
the use of multiple cloud providers can help to optimize backup and restore times by
spreading the workload across multiple providers. This approach helps to reduce the
risk of bottlenecks that may occur when backing up or restoring large amounts of
data to a single cloud provided.
1.4 Fog Computing
The Fog Computing concept was initially developed to minimize the data
access latency from and to the cloud. Fog Computing provides data processing and
networking facilities at the network edge. This concept is to install dedicated servers
located geographically at the Edge of the network in micro/nano data centers close to
the end-users. Whereas Cloud Computing provides resources that are centralized in
the network core, Fog Computing provides services and resources that are distributed
near/at the network edge. This concept is to install dedicated servers located
geographically at the Edge of the network in micro/nano data centers close to the end-
users. Fog nodes can be any of the typical network elements such as routers or
middle-end servers geographically positioned near the end-users. These nodes are
capable of executing applications and store data to provide the required services and
enhance the user experience. They are connected to the cloud core through high-
speed links and can be considered the cloud arms while the brain is in the center of
the network.
4
CHAPTER 2
LITERATURE SURVEY
5
Dynamic Data Slicing in Multi Cloud Storage Using Cryptographic Technique
6
approaches in terms of delivery ratio, delay, and overhead. However, one potential
limitation of FogRoute is that it requires a certain level of infrastructure and
resources, which may not be available or feasible in certain fog computing
environments.
7
Scheduling Scientific Workflow Using Multi-Objective Algorithm
In this work, Mazen et al proposes a new approach to address the problem of
scheduling scientific workflows in distributed computing environments [16]. The
paper presents a multi-objective algorithm that considers factors such as task
deadline, task dependencies, and resource availability. The proposed algorithm aims
to optimize multiple objectives such as make span, resource utilization, and energy
consumption. The paper also presents a case study and shows that the proposed
approach outperforms traditional scheduling algorithms in terms of efficiency and
effectiveness. However, the practical implementation of the proposed approach may
be challenging due to the need for accurate modeling of scientific workflows and
resource availability.
8
a result, it is unclear how the device would be integrated into existing systems and
what the practical implications of its use would be
Distributed multi cloud storage system to improve data security with hybrid
encryption.
In this work, zaman et al proposes a distributed multi-cloud storage system
that enhances data security through hybrid encryption [7]. The paper discusses the
limitations of traditional cloud storage systems and the need for a more secure and
scalable approach. The proposed system is designed to address these issues by
distributing data across multiple cloud storage providers and encrypting it using a
combination of symmetric and asymmetric encryption techniques. The paper
describes the architecture of the system, which includes a client-side component that
encrypts data before it is uploaded to the cloud, as well as a server-side component
that handles the distribution and retrieval of data. The paper also presents
experimental results that demonstrate the effectiveness of the proposed system in
terms of data security and performance.
9
implementation of the tool, including its use of cryptographic hashing to ensure data
integrity and its support for multiple cloud storage providers
Fog computing for vehicular ad-hoc networks: Paradigms scenarios and issues.
In this work, Kai et al discusses the potential of fog computing for Vehicular
Ad-Hoc Networks (VANETs) and the various paradigms, scenarios, and issues
related to its implementation [8]. The paper presents an overview of VANETs and
the challenges they face, including limited bandwidth and high mobility. It then
explores the potential of fog computing to address these challenges by providing a
decentralized, low-latency computing infrastructure that can operate closer to the
network edge. ThFor example, fog nodes can be placed on roadside infrastructure,
such as traffic lights or lamp posts, to improve communication among vehicles. Fog
nodes can also be deployed on mobile platforms such as buses or emergency vehicles
to provide better coverage in areas with poor connectivity. The paper also discusses
various scenarios in which fog computing can be applied in VANETs, such as real-
time traffic management and collision avoidance. Finally, the paper highlights some
of the issues and limitations of fog computing in VANETs, including security and
privacy concerns, and suggests future research directions to address these challenges.
10
A Cloud Based Automatic Recovery and Backup System with Video
compression.
In this work, Raigonda rani megha et al proposes a cloud-based system for
automatic data recovery and backup, which includes video compression techniques
to reduce storage requirements [19]. The system utilizes cloud computing resources
to enable rapid data backup and recovery, with a focus on minimizing downtime and
data loss. The paper discusses the design and implementation of the system, including
the use of video compression techniques to reduce storage requirements. The paper
also presents experimental results that demonstrate the effectiveness of the system in
terms of backup and recovery speed, as well as storage savings.
11
A proposed virtual private cloud-based disaster recovery strategy.
In this work, S.Hamadah et al introduces a proposed virtual private cloud-
based disaster recovery strategy that aims to improve the resilience of cloud-based
services [17]. The paper discusses the challenges associated with traditional disaster
recovery strategies and highlights the benefits of using a virtual private cloud
approach. The proposed strategy involves the creation of a virtual private cloud
environment that can be used for disaster recovery purposes. The paper also presents
a case study that demonstrates the effectiveness of the proposed strategy in terms of
recovery time and cost. The study shows that the proposed strategy can provide a
cost-effective and efficient approach to disaster recovery for cloud-based services.
12
A Conceptual Framework for Disaster Recovery and Business Continuity of
Database Services in Multi- Cloud.
In this work, Mohammad Alshammari et al presents a conceptual framework
for disaster recovery and business continuity of database services in a Multi Cloud
environment [18]. The framework aims to address the challenges of managing
database services across multiple clouds and ensuring their availability and resilience
in the event of disasters or disruptions. The framework includes a set of guidelines
and best practices for disaster recovery and business continuity planning, as well as
a set of tools and technologies for data replication, backup, and recovery. The work
also discusses the key issues and challenges associated with implementing a Multi
Cloud disaster recovery strategy, including data consistency, network latency, and
security. Overall, the proposed framework provides a comprehensive approach for
ensuring the availability and resilience of database services in a Multi Cloud
environment.
13
Fog computing for vehicular ad-hoc networks: Paradigms scenarios and issues.
In this work, Cong et al explores the use of fog computing in vehicular ad-hoc
networks, discussing different paradigms, scenarios, and issues associated with its
implementation [10]. Fog computing is a distributed computing model that extends
cloud computing capabilities to the edge of the network, enabling faster response
times and reduced network latency. In vehicular ad-hoc networks, fog computing can
be used to support real-time applications and services, such as vehicle-to-vehicle
communication, traffic management, and accident prevention. However, there are
several challenges that need to be addressed when deploying fog computing in
vehicular ad-hoc networks, such as network security, scalability, and resource
management. This work provides insights and recommendations to help address these
challenges and improve the effectiveness of fog computing in vehicular ad-hoc
networks.
14
2.1 Existing System
Duplicacy is a cloud backup tool that offers efficient and secure backup
solutions for individuals and organizations. Its lock-free deduplication algorithm
ensures that only unique data chunks are backed up to the cloud, reducing the amount
of data that needs to be transferred and stored. This results in optimized backup
processes and reduced storage costs, as the same data chunks are not stored multiple
times. Deduplication technique also helps cloud storage service providers to
efficiently use their disk space. Additionally, if deduplication is done at the source, it
saves upload bandwidth by not transmitting duplicate data copies Convergent
encryption, also known as content hash keying, is a cryptographic technique that
generates the same ciphertext for identical plaintext files. This is achieved by using a
hash function to generate a unique identifier for the plaintext data, which is then used
as the encryption key. In cloud computing, convergent encryption is used to remove
duplicate files from storage without the provider having access to the encryption
keys. By comparing the unique identifiers of the plaintext data, the cloud provider
can identify and remove duplicate files without decrypting them or compromising the
security of the data. This approach has several advantages, including reduced storage
costs and improved data security, as it ensures that the cloud provider cannot access
the plaintext data. Despite these limitations, convergent encryption is a useful
technique in cloud computing for efficient and secure data storage This approach
ensures that identical plaintext files produce identical ciphertext, which helps in
identifying and removing duplicate files. While existing backup systems have several
disadvantages such as complex management, cost, lack of features, problems with
compatibility, and different APIs, Duplicacy offers a simple and effective solution
to address these issues. Its efficient and secure backup solutions are designed to meet
the needs of individuals and organizations.
15
CHAPTER 3
PROPOSED WORK
16
provided. Multi-Cloud is a heterogeneous architecture utilizing various cloud
computing and storage facilities, which can come from a public cloud, a private
cloud, or as standalone cloud-like on-premise facilities. If the Multi-Cloud
architecture is used, users are aware of the multiple clouds responsible for managing
the resources and the services, or a third party is responsible for managing them.
In the context of Backup System, fog computing can improve the performance
and efficiency of backup and restore operations. By processing and storing data
locally on edge devices, Backup System can reduce the time and bandwidth required
to back up or restore data to the cloud. This is especially useful for large data sets or
for users with limited bandwidth. Fog computing can also help to reduce storage costs
for backup data by only transmitting the most critical or relevant data to the cloud.
The data that is processed and stored locally on edge devices can be compressed or
deduplicated, resulting in less data being transferred to the cloud and lower storage
costs. Fog nodes can be any of the typical network elements such as routers or middle-
end servers geographically positioned near the end-users. These nodes are capable of
executing applications and store data to provide the required services and enhance
the user experience Overall, fog computing is a useful approach for improving the
performance and efficiency of backup and restore operations, and it can help
organizations reduce their storage costs and network bandwidth usage.
17
3.4 GNU Privacy Guard (GPG)
GNU Privacy Guard (GnuPG or GPG) is a free and open-source software tool
used for secure communication and data encryption. It uses the PGP standard to
provide end-to-end encryption and digital signature capabilities for email, files, and
other types of data. When using GnuPG, a user first generates a public-private key
pair. The public key is then shared with others, while the private key is kept secret.
When someone wants to send a secure message or file to the user, they encrypt it with
the user's public key, and the user then uses their private key to decrypt the message
or file.
In addition to encryption, GnuPG can also be used for digital signatures. A
user can sign a message or file with their private key, which can be verified by anyone
who has the user's public key. This provides a way to verify the authenticity and
integrity of the message or file. This signature can be verified by anyone who has the
user's public key. The digital signature serves as proof that the message or file has
not been tampered with since it was signed by the sender. It also ensures that the
message or file was sent by the sender and not an imposter.
GnuPG is widely used by individuals, organizations, and governments around
the world for secure communication and data encryption. Its open-source nature
allows for transparency and continuous improvement, and its compatibility with
various operating systems and email clients makes it accessible to a wide range of
users.
GnuPG supports a wide range of encryption algorithms, including Advanced
Encryption Algorithm (AES), Rivest Shamir Adleman (RSA) and Secure Hash
Algorithm (SHA)-256 among others. It is widely used by individuals, organizations
and governments around the world for secure communication and data encryption.
18
3.4.1 Pretty Good Privacy (PGP) Key Standard
A comprehensive Key Manager is provided with PGP public and private keys.
This Key Manager can be used to create keys, change keys, view keys and import
keys. These keys can be utilized within Cloud – Fog Data for automating PGP
encryption and decryption within your organization. This Key Manager can also be
used to export public keys for sharing with your trading partners.
Encryption is performed through the use of a key pair. The public key may be
published or sent to the recipient. The private key is known only to the recipient, who
will decrypt the key. The public key is used to encrypt the message or the file and the
private key to decrypt it.
defines the standard format for encrypted messages, signatures and keys.
19
Figure 3.4 (b) PGP Decryption
encryption algorithm that is based on the factorization of large prime numbers. The
security of RSA relies on the difficulty of factoring large composite numbers into
20
Diffie-Hellman: The Diffie-Hellman algorithm is used for key exchange and
enables two parties to agree on a shared secret key over an insecure communication
channel without exchanging the key directly. The algorithm works by each party
generating a public-private key pair. They then exchange their public keys over the
insecure channel. Using their own private key and the other party's public key, they
compute a shared secret key. This key is the same for both parties and can be used to
encrypt and decrypt messages exchanged over the insecure channel
Elliptic Curve Cryptography (ECC): ECC is a type of asymmetric
encryption algorithm that uses elliptic curves instead of prime numbers. ECC is used
in various applications, including digital signatures, encryption, and key exchange. It
is widely used in mobile devices and other resource-constrained environments where
efficient use of computing resources is essential. ECC is also used in internet
protocols such as Transport Layer Security (TLS) and Secure Shell (SSH) to provide
secure communications.
ElGamal: ElGamal is a public-key encryption algorithm that is based on the
discrete logarithm problem. It is used for both encryption and digital signatures. In
the ElGamal encryption process, the sender generates a public key and a private key.
The public key is shared with the receiver, who uses it to encrypt the message before
sending it back to the sender. The sender then uses their private key to decrypt the
message.
3.4.4 Symmetric Encryption Algorithm
21
result is ciphertext, which is the encrypted version of the plaintext. To decrypt the
ciphertext, the same secret key is used along with a decryption algorithm to convert
the ciphertext back into plaintext.
Some common examples of symmetric encryption algorithms include:
Data Encryption Standard (DES): Data Encryption Standard (DES) is a
symmetric encryption algorithm that was developed by IBM in the 1970s and was
widely used for data encryption until the late 1990s. However, due to its small key
size of 56 bits, which can be vulnerable to brute force attacks, it is now considered to
be outdated and insecure. Address the security issues of DES, a variant known as
Triple DES (3DES) was developed, which uses three separate keys and is more secure
than the original DES algorithm. However, even 3DES is no longer considered to be
the best option for encryption, as it is slower and less efficient than newer encryption
algorithms like AES.
Triple DES (3DES): Triple DES (3DES) is an encryption algorithm that is an
improvement over the original DES algorithm. 3DES uses a three-step encryption
process to increase the effective key size of the encryption. By using three separate
keys in this way, 3DES effectively has a key size of 168 bits (three 56-bit keys),
which makes it more secure than the original DES algorithm. The use of multiple
keys also provides additional security against brute force attacks. While 3DES is
more secure than DES, it is also slower and less efficient. As a result, newer
encryption algorithms like AES (Advanced Encryption Standard) have become more
popular for modern encryption applications.
Advanced Encryption Standard (AES): AES is a symmetric encryption
algorithm used for encrypting and decrypting data in 128-bit blocks. It supports key
sizes of 128, 192, or 256 bits. AES encryption involves four steps, while decryption
involves four inverse steps. AES is widely used for securing data in applications such
22
as email, file encryption, and SSL/TLS. AES is considered to be highly secure, with
no practical known attacks against the full algorithm
3.5 Compression Algorithm
Compression algorithms are used to reduce the size of data by encoding it in a
more compact form, thereby reducing storage requirements and increasing
transmission efficiency. There are several compression algorithms available,
including lossless and lossy compression. Lossless compression algorithms, such as
ZIP and GZIP allow data to be compressed and decompressed without losing any
information. Lossy compression algorithms, such as JPEG and MP3 allow data to be
compressed to a smaller size by discarding some of the information that is less
important or less noticeable to the human eye or ear. Compression algorithms are
widely used in a variety of applications, including email, file transfer and multimedia
streaming, to reduce storage requirements and improve transmission speed and
efficiency.
23
Zoning Improvement Plan (ZIP): It is a file format and compression utility
that is widely used for archiving and compressing files. It uses lossless compression
to reduce the size of files without losing any information. ZIP files can be
decompressed back to their original form with no loss of quality or accuracy. ZIP is
often used for archiving and transferring files over the internet. In addition to the
postal code system, there is also a file compression format called ZIP, which was
developed in the 1980s as a way to compress files for storage and transfer. The ZIP
file format uses lossless compression to reduce the size of files without losing any
information. It has become one of the most widely used file compression formats in
the world, particularly for archiving and transferring files over the internet.
Portable Network Graphics (PNG): It is a file format used for storing digital
images, particularly for graphics and images with transparent backgrounds. PNG was
developed as a replacement for the GIF format, which had some limitations, such as
a limited color range and licensing issues. One of the main feature of PNG is its
support for alpha channels, which allows for transparency in images. This makes it
particularly useful for web design and other applications where transparency is
important.
24
3.6 Proposed Architecture
Edge Devices Backup System Cloud
i. Edge devices
ii. Cloud Provider
iii. Backup System
iv. Data Encryption
v. Data compression
vi. Data Upload
vii. Data Retrieval
25
3.7.1 Edge devices
These are the end-user devices that need to back up and secure their data. These
edge nodes can be users’ mobile phones, laptops, IP cameras etc. The edge nodes
require a secure and fast backup interface with the ability to read, modify and delete
the stored data at any time. The privacy of each edge node should be protected so that
no edge device can access the data of the other nodes on the system. These
requirements should be achieved without complicated operations at the edge nodes.
26
The data will be encrypted and divided into multiple chunks before store it on the
public cloud. This prevents any malicious cloud service provider from utilizing the
user data or compromising the user’s privacy.
The backup system that runs on the Fog layer. The Backup System offers a
simple but safe backup interface to the edge nodes, whereas completely protecting
the data on a distributed Multi Cloud storage. The Backup System is responsible for
all complicated operations to keep the user data secure and reliable, whereas the edge
nodes will not pay much attention to these operations. In other words, the backup
system offloads the processing required by the edge devices to the Fog nodes. This
type of processing offloading allows utilizing state-of-the-art backup techniques
without being limited by the low resources of the edge nodes. Moreover, the edge
nodes are usually battery-operated and in low power consumption mode most of the
27
time. On the other hand, the fog nodes are connected to a power source. Hence, there
are lower restrictions on their activity time.
Delta data calculation refers to the process of identifying and calculating the changes
made to a file or dataset since the last backup. Rather than backing up the entire file
or dataset each time, only the changes (delta data) are backed up, which can
significantly reduce backup times and storage requirements.
28
Delta data calculation refers to the process of identifying and calculating the
changes made to a file or dataset since the last backup. Rather than backing up the
entire file or dataset each time, only the changes (delta data) are backed up, which
can significantly reduce backup times and storage requirements. Backup System
saves network bandwidth and storage needs. Hence, it adopts a versioning scheme to
back up the data. At each periodic backup, the Backup System calculates the delta
from the last backup instant and only backs up the new delta. In the case of the first
backup, all data will be in the delta. The system generates the mandatory metadata to
enable data chain construction from the incremental backups when data recovery is
performed. These metadata are also used to calculate the delta between a backup
instant and a previous backup instant.
3.7.4 Data Encryption
29
Public key encryption, also known as asymmetric encryption, is a type of
encryption that uses a pair of keys to encrypt and decrypt data. The pair of keys
consists of a public key and a private key. The public key can be freely distributed,
while the private key must be kept secret. Public key encryption algorithms are based
on complex mathematical problems that are easy to solve in one direction but difficult
to solve in the other direction. These algorithms use the public key to encrypt the
data, and only the corresponding private key can decrypt the data. The Symmetric
key encryption algorithm is the Advanced Encryption Standard (AES). It is a widely
used and highly secure encryption algorithm that is used to protect sensitive data in
various applications, including online transactions, secure messaging and data
storage. The generate key() function generates a secure encryption key using the
Password-Based Key Derivation Function 2 (PBKDF2) algorithm with the given
password.
30
Public key encryption algorithm is the Elliptic Curve Cryptography (ECC).
ECC is a type of public key cryptography based on the mathematics of elliptic curves.
It provides the same level of security as other public key cryptography systems such
as RSA, but with smaller key sizes, making it more efficient for use in resource-
constrained environments such as mobile devices or IoT devices.
3.7.5 Data compression
31
programming languages and libraries. LZ77 compression is a lossless data
compression algorithm that works by replacing repeated occurrences of data with
references to a single copy of that data existing earlier in the uncompressed data
stream. This is achieved by using a sliding window of previously encountered data in
the stream, and encoding each subsequent segment of data as either a literal value or
a pointer to a previous instance of that same segment in the window.
32
3.7.6 Data Upload
This is the final stage of the backup scenario in Backup. The data chunks are
going to be uploaded to the cloud servers. The Backup System authenticates itself
with each CSP automatically and uploads the user data. The Backup System adopts
a round-robin scheme to preserve the balance of storage usage on each CSP account.
Metadata is distributed and replicated in the same manner as the user data. This
scheme has been shown to keep the storage balanced on all the CSPs over the long
term and avoid overwhelming specific CSP.
3.7.7 Data Retrieval
Edge devices can restore their data at any time as long as they have access to
the Backup System. Even if there is no internet connection, edge devices can
download the data from the Backup System in case of any data loss incident at the
edge devices. This data retrieval can only be accomplished by authenticating the edge
device with the Backup System using its username and password. After
authentication, the edge device has immediate and full access to its previously
backed-up data.
33
In the event of a disaster in the Fog layer , data retrieval and recovery of the
entire system can be conveniently performed using the Backup System. Disaster
events in the Fog layer include Backup System software failures, hardware failures,
hard drive crashes, or other natural disasters. In order to recover the system, the
Backup System only needs to provide the master key and the credentials of the used
CSPs after bringing up a new Fog node. Then, the Backup System can search for the
metadata on all the configured CSPs and construct the backup chain. After that, the
data can be downloaded, decrypted, and reconstructed automatically. In the end, the
whole system is recovered again and the edge devices can access their data as usual.
3.8 Advantages
1) Speed
The Backup System is running on the Fog layer on user premises. User devices
and the Fog layer are in the same Local Area Network (LAN). This results in many
advantages. First, the data backup speed from the user devices to the Backup System
will be super fasten up to the LAN speed, compared to direct data upload to the cloud.
Second, the backup service is available even if there is no internet access.
2) Security
The Fog layer is a personal device. This means that the connected devices will
be well known and belong to the Fog layer owner. The security threats are more
controllable as the Fog layer and the edge devices are under user control. Backup
System restricts the Fog layer access to the specified user devices only through
authentication to allow better management and reduce security threats.
3) Privacy
Data privacy is maintained by isolating the edge devices’ data on the Fog layer
where no edge device can get access to the data of the other devices. Therefore the
Backup System can be used by all users on the same LAN without privacy concerns.
34
CHAPTER 4
IMPLEMENTATION
Processors
Processors, also known as central processing units (CPUs), are the brains of a
computer system. They are responsible for carrying out instructions and performing
calculations for the system. In this project, we are using Intel® Core™ i5 processor
4300M at 2.60 GHz or 2.59 GHz (1 socket, 2 cores, 2 threads per core), 8 GB of
DRAM.
Disk space
Disk space refers to the amount of storage capacity available on a hard drive
or other storage device. It is typically measured in gigabytes (GB) or terabytes (TB),
with one terabyte being equal to 1,000 gigabytes. In this system, we are using 320
GB for backup storage
35
Operating systems
An operating system (OS) is a software that manages computer hardware
and software resources and provides common services for computer programs. It
acts as an interface between computer hardware and software applications, allowing
them to communicate with each other and operate effectively. In this project, we are
using Windows® 10, macOS*, and Linux*.
Python
Python is used for a wide range of applications, from web development and
data analysis to scientific computing and artificial intelligence. Its syntax is designed
to be easily readable and intuitive, with an emphasis on code readability. Python is
also known for its extensive libraries and frameworks, which make it easy to
accomplish complex tasks with minimal code. Python can be run on various operating
systems, including Windows, Mac OS, and Linux, and can be used for both desktop
and web-based applications. Its popularity has led to a large community of developers
and users who create and share libraries, tools, and resources to help others learn and
use Python effectively.
Cloud SDK
Cloud SDK is a set of tools and libraries that enables developers to easily create
and manage applications on cloud computing platforms such as Google Cloud
36
Platform (GCP). It includes command-line tools, libraries, and APIs for managing
resources, deploying and testing applications, and accessing cloud services.Cloud
SDK provides a streamlined way to interact with GCP services and resources,
allowing developers to focus on building applications rather than worrying about the
underlying infrastructure. It includes tools for managing authentication and access
control, as well as tools for monitoring and debugging applications.Cloud SDK is
available for Windows, macOS, and Linux operating systems, and supports multiple
programming languages including Python, Java, and Go. It is regularly updated with
new features and improvements to ensure that developers have access to the latest
tools and resources for building and deploying cloud applications.
37
technologies, and has helped to popularize the use of public-key encryption in the
modern era.
Compression Libraries
Compression libraries are software libraries that provide developers with a set
of tools and functions to compress and decompress data in various formats. These
libraries are designed to make it easy for developers to integrate compression and
decompression functionality into their applications, without having to implement the
algorithms themselves. In this project, we are using the GNU Zoning Improvement
Plan (GZip) A file format and associated compression library that uses the DEFLATE
algorithm to compress and decompress data. Gzip is commonly used on Unix-based
systems, and is also supported by many other platforms and operating systems.
Hashing Libraries
Hashing libraries provide a set of functions or classes to implement hash
functions in programming languages. Hash functions take input data of arbitrary size
and produce a fixed-size output, known as a hash value or digest. The output of a
hash function is typically used for data integrity verification, message authentication,
and indexing in hash tables.
Other Libraries
Backup System uses various other libraries such as cryptography, boto3. The
"cryptography" library is a popular Python library that provides various
cryptographic functionalities, including encryption, decryption, hashing, and digital
signatures. It is a high-level library that offers easy-to-use interfaces for performing
secure cryptographic operations The "Boto 3" library is the Amazon Web Services
(AWS) SDK for Python, which allows developers to easily integrate their Python
applications
38
CHAPTER 5
CONCLUSION AND FUTURE WORK
5.1 Conclusion
utilizes the latest technologies to provide users with a secure and reliable backup
service. By leveraging multiple cloud providers, the system ensures that data is
reducing the risk of data loss due to hardware failures or natural disasters. The use of
fog computing improves the overall performance of backup and restore operations,
offer users more flexibility and control over their data security. Overall, Backup
System represents an important step forward in backup technology and provides users
with a reliable and efficient backup solution that addresses many of the challenges
39
APPENDIX 1
Source Code
login.py
return True
else:
return False
def main():
parser.add_argument('username', help='Username')
args = parser.parse_args()
username = args.username
if authenticate_user(username, password):
print('Logged in successfully')
else:
print('Invalid credentials')
if __name__ == '__main__':
40
main()
backup.py
import argparse
def main():
args = parser.parse_args()
if __name__ == '__main__':
main()
41
encryption.py
import os
import hashlib
def generate_key(password):
salt = os.urandom(16)
iv = os.urandom(16)
encrypted_data = iv + cipher.encrypt(data.encode('utf-8'))
42
# Decrypt encrypted backup data using AES algorithm
iv = encrypted_data[:16]
decrypted_data = cipher.decrypt(encrypted_data[16:]).decode('utf-8')
return decrypted_data
# Example usage
password = "mysecurepassword"
print("Salt:", salt.hex())
43
compression.py
import gzip
# copy the input file to the output file using gZip compression
f_out.writelines(f_in)
multi_cloud_backup.py
import os
import hashlib
# Create a dictionary to hold the data hashes for each cloud provider
cloud_hashes = {}
44
for provider in cloud_providers:
if provider == "dropbox":
pass
pass
creds = None
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json',
['https://www.googleapis.com/auth/drive'])
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file('credentials.json',
['https://www.googleapis.com/auth/drive'])
creds = flow.run_local_server(port=0)
45
token.write(creds.to_json())
hash_value = hashlib.sha256(data.encode('utf-8')).hexdigest()
if check_cloud_provider(provider, hash_value):
cloud_hashes[provider] = hash_value
else:
fields='id').execute()
cloud_hashes[provider] = hash_value
store_cloud_provider(provider, hash_value)
return cloud_hashes
46
def check_cloud_provider(provider, hash_value):
pass
pass
47
REFERENCES
[1]Z. Li, G. Chen and Y. Deng, "Duplicacy: A New Generation of Cloud Backup
Tool Based on Lock-Free Deduplication," in IEEE Transactions on Cloud
Computing, vol. 10, no. 4, pp. 2508-2520, 1 Oct.-Dec. 2022.
[3] K. Subramanian and F. L. John, "Dynamic data slicing in multi cloud storage
using cryptographic technique", Proc. World Congr. Comput. Commun. Technol.
(WCCCT), pp. 159-161, Feb. 2017.
[4] L. Gao, T. H. Luan, S. Yu, W. Zhou and B. Liu, "FogRoute: DTN-based data
dissemination model in fog computing", IEEE Internet Things J., vol. 4, no. 1, pp.
225-235, Feb. 2017.
[5] O. A. Nasr, Y. Amer and M. AboBakr, "The ‘droplet’: A new personal device to
enable fog computing", Proc. 3rd Int. Conf. Fog Mobile Edge Compute. (FMEC), pp.
93-99, Apr. 2018.
[6] P. Singh, N. Agarwal and B. Raman, "Secure data deduplication using secret
sharing schemes over cloud", Future Gener. Comput. Syst., vol. 88, pp. 156-167,
Nov. 2018.
48
[8] K. Kai, W. Cong and L. Tao, "Fog computing for vehicular ad-hoc networks:
Paradigms scenarios and issues", J. China Universities Posts Telecommun., vol. 23,
no. 2, pp. 56-96, Apr. 2016.
[9] S. Yi, C. Li and Q. Li, "A survey of fog computing: Concepts applications and
issues'', Proc. Workshop Mobile Big Data, pp. 37-42, June. 2015.
[10] K. Kai, W. Cong and L. Tao, "Fog computing for vehicular ad-hoc networks:
Paradigms scenarios and issues", J. China Universities Posts Telecommun., vol. 23,
no. 2, pp. 56-96, Apr. 2016.
[11] F. Tao (2019), "A Smart Manufacturing Service System Based on Edge
Computing, Fog Computing, and Cloud Computing," in IEEE Access, vol. 7, pp.
86769-86777, doi: 10.1109/ACCESS.2019.2923610.
[12] X. Yang, X. Pei, M. Wang, T. Li and C. Wang (2020) "Multi-Replica and Multi-
Cloud Data Public Audit Scheme Based on Blockchain" IEEE Access, vol. 8, pp.
144809-144822, doi: 10.1109/ACCESS.2020.3014510.
[16] Mazen Farid, Rohaya Latip,Masnida Hussin,Nor Asilah Wati Abdul Hamid.
(2020) 'Scheduling Scientific Workflow Using Multi-Objective Algorithm with
Institute of electrical and electronics engineers vol.9, pp.24309-24321.
49
[17] S.Hamadah and D.Aqel, "A proposed virtual private cloud-based disaster
recovery strategy", 2019 IEEE Jordan Int. Jt. Conf. Electr. Eng. Inf. Technol. IEEE
2019 - Proc., pp. 469-73, 2019.
[18] Mohammad Alshammari and Ali Alwan, "A Conceptual Framework for Disaster
Recovery and Business Continuity of Database Services in Multi- Cloud", 2017.
[19] Raigonda rani Megha and Tahseen Fatima, "A Cloud Based Automatic
Recovery and Backup System with Video compression", International journal of
engineering and computer science., vol. 5, pp. 17819-17822, 2016.
[20] Taj-aldeen naser abdali,Rosilah hassan, Azana hafizah mohd aman, Quang ngoc
nguyen. (2021) ‘Fog Computing Advancement: Concept, Architecture, Applications,
Advantages, and Open Issues’, Institute of Electrical and Electronics Engineers.
[21] Y. Wei and F. Chen, "ExpanCodes: Tailored LDPC codes for big data storage",
Proc. IEEE 14th Intl Conf. Dependable Autonomic Secure Comput. 14th Intl Conf.
Pervas. Intell. Comput. 2nd Intl Conf. Big Data Intell. Comput. Cyber Sci. Technol.
Congr. (DASC/PiCom/DataCom/CyberSciTech), pp. 620-625, Aug. 2016.
[22] R. Pottier and J.-M. Menaud, "Trustydrive a multi-cloud storage service that
protects your privacy", Proc. IEEE 9th Int. Conf. Cloud Comput. (CLOUD), pp. 937-
940, Jun. 2016.
[23] Y. Wei, F. Chen and D. C. J. Sheng, "ExpanStor: Multiple cloud storage with
dynamic data distribution", Proc. IEEE 7th Int. Symp. Cloud Service Comput. (SC),
pp. 85-90, Nov. 2017.
[24] D. Leibenger and C. Sorge, "SEC-CS: Getting the most out of untrusted cloud
storage", Proc. IEEE 42nd Conf. Local Comput. Netw. (LCN), pp. 623-631, Oct.
2017.
50
51