KEMBAR78
Backup | PDF
0% found this document useful (0 votes)
105 views60 pages

Backup

Uploaded by

Raj VJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views60 pages

Backup

Uploaded by

Raj VJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

A SECURE BACKUP SYSTEM USING FOG COMPUTING

AND MULTI CLOUD COMPUTING

A PROJECT REPORT

Submitted By

PAZHANIRAJ V (422519205030)
SRIHARI S (422519205040)
VETRIVEL R (422419205044)

in partial fulfillment for the award of the degree of


of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY

UNIVERSITY COLLEGE OF ENGINEERING, VILLUPURAM

ANNA UNIVERSITY: CHENNAI 600 025

MAY 2023
i
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE

Certified that this project report “A SECURE BACKUP SYSTEM USING


MULTI CLOUD AND FOG COMPUTING” is the bonafide work of
“PAZHANIRAJ V (422519205030), SRIHARI S (422519205040), VETRIVEL
R (422519205044)” who carried out the project work under my supervision.

SIGNATURE SIGNATURE

Dr.P SEENUVASAN M.E, Ph.D., Dr.P SEENUVASAN M.E, Ph.D.,

HEAD OF THE DEPARTMENT, SUPERVISOR

Assistant Professor (Sr. Gr), Assistant Professor (Sr. Gr),


Department of Information Technology, Department of Information Technology,

University College of Engineering Villupuram, University College of Engineering Villupuram,

Kakuppam, Villupuram - 605103. Kakuppam, Villupuram - 605103.

Submitted For IT8811 Project Viva Voce Examination Held On ……………

INTERNAL EXAMINER EXTERNAL EXAMINER

i
ACKNOWLEDGEMENT

We wish to express our sincere thanks and gratitude to our Dean In-Charge
Dr. R SENTHIL M.E, Ph.D., for offering us all the facilities to do the project.
We also express our sincere thanks to Dr. P SEENUVASAN M.E, Ph.D.,
Head of the Department, Department of Information Technology for his support,
guidance and for successful completion and implementing our valuable idea.
We would like to thank our Coordinator Dr. E KAVITHA M.E, Ph.D., for
their guidance and encouragement in bringing out this project successfully.
We express our sincere thanks to our Guide Dr. P SEENUVASAN M.E,
Ph.D., our internal project guide, Department of Information Technology, University
College of Engineering Villupuram, for his valuable suggestions and constant
encouragement.
We would like to thank all the Faculty Members, Technical Staff Members
and Support Staff Members in our department for their guidance to finish this
project successfully. We also like to thank all our family and friends for their willing
assistance.
This project consumed a huge amount of work, research and dedication. Still,
implementation would not have been possible if we did not have the support of many
individuals and organizations. We would like to extend our sincere gratitude to all of
them.

PAZHANIRAJ V
SRIHARI S
VETRIVEL R

ii
ABSTRACT

In Modern day world, digital storage is becoming increasingly popular, but it

poses many threats, including operation errors, security attacks, and hardware failure.

Data backup is essential for protecting against these threats, and cloud backup

systems commonly used for disaster recovery. However, traditional cloud backup

systems may not offer sufficient data privacy and reliability. To address these

challenges, this work proposes a cloud-based backup system that uses multiple cloud

providers to enhance redundancy and reliability. The system also uses fog computing

to improve backup and restore performance by processing and storing data locally.

The proposed backup system uses AES encryption and supports both symmetric and

public-key encryption algorithms to ensure data security. This system can be used in

various applications where data privacy and reliability are crucial, such as enterprise

data storage, disaster recovery, and personal data backup. Future work includes

further performance optimization and the integration of additional cloud providers

and backup strategies.

iii
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO

ABSTRACT iii

TABLE OF CONTENTS iv

LIST OF FIGURES vii

LIST OF ABBREVIATION viii

1 INTRODUCTION 1

1.1 CLOUD COMPUTING 1

1.2 TYPES 1

1.2.1 Types of Cloud Service 2

1.2.2 Deployment Model 3

1.3 MULTI CLOUD 4

1.4 FOG COMPUTING 4

2 LITERATURE SURVEY 5

2.1 EXISTING SYSTEM 15

3 PROPOSED SYSTEM 16

3.1 PROPOSED WORK 16

3.2 MULTI CLOUD SUPPORT 16

3.3 FOG COMPUTING FUNCTION 17

iv
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO

3.4 GNU PRIVACY GUARD (GPG) 18

3.4.1 PGP key Standard 19

3.4.2 Standards Support for PGP Encryption 19

3.4.3 Asymmetric Encryption Algorithms 20

3.4.4 Symmetric Encryption Algorithm 21

3.5 COMPRESSION ALGORITHMS 23

3.6 PROPOSED ARCHITECTURE 25

3.7 MODULES DESCRIPTION 25

3.7.1 Edge Devices 26

3.7.2 Cloud Provider 26

3.7.3 Backup System 27

3.7.4 Data Encryption 29

3.7.5 Data Compression 31

3.7.6 Data Upload 33

3.7.7 Data Retrieval 33

v
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO

4 IMPLEMENTATION 35

4.1 SYSTEM REQUIREMENTS 35

4.1.1 Hardware Requirements 35

4.1.2 Software Requirements 36

5 CONCLUSION AND FUTURE WORK 39

5.1 CONCLUSION 39

5.2 FUTURE WORK 39

APPENDIX 40

Source Code 40

REFERENCES 48

vi
LIST OF FIGURES

FIGURE NO FIGURE NAME PAGE NO

1.2 Types of Cloud 1

3.4 PGP Encryption 19

3.4 PGP Decryption 19

3.5 Compression Algorithm 23

3.6 Proposed Architecture 26

3.7.1 Edge Devices 27

3.7.2 Cloud Provider 28

3.7.3 Backup System 28

3.7.4 Data Encryption 30

3.7.5 Data Compression 32

3.7.6 Data Retrieval 33

vii
LIST OF ABBREVIATION

SAAS - Software as a service

PAAS - Platform as a service

IAAS - Infrastructure as a service

CSP - Cloud service provider

GnuPG - GNU privacy guard

PGP - Pretty Good privacy

AES - Advanced Encryption Standard

RSA - Riverset Shamir Adleman

SHA - Secure Hash Algorithm

CAST - Carlisle Adams and Stafford Tavares

SERPENT - Secure Encryption Reliable Protection Against Exposure

IDEA - International Data Encryption Algorithm

RIPEMD - RACE Integrity Primitives Evaluation Message Digest

ZIP - Zoning Improvement Plan

GZIP - GNU Zoning Improvement Plan

PBKDF2 - Password based key derivation function 2

ECC - Elliptical Curve Cryptography

LZ77 - Lempel-Ziv-1977

SDK - Software Development

viii
CHAPTER 1

INTRODUCTION

1.1 Cloud Computing


Cloud computing refers to the delivery of computing services, including
servers, storage, databases, networking, software and analytics over the internet. It
enables users to access and use these resources on-demand, without having to invest
in and maintain their own IT infrastructure. The services are typically provided by
third-party vendors who own and manage the infrastructure and charge customers
based on their usage. Cloud computing offers several advantages over traditional on-
premises IT infrastructure, including scalability, cost-effectiveness, flexibility and
ease of use. Examples of Cloud Computing Services: AWS, Azure, Google Cloud.

1.2 Types

Figure: 1.2 Types of Cloud

1
1.2.1 Types of Cloud Services
There are three types cloud services,
i) Software-as-a-service (SaaS)
ii) Infrastructure-as-a-service (IaaS)
iii) Platform-as-a-service (PaaS)

i) Software-as-a-service (SaaS)
It involves the licensing of a software application to customers. Licenses
are typically provided through a pay-as-you-go model or on-demand. This type
of system can be found in Microsoft Office's.

ii) Infrastructure-as-a-service (IaaS)


It involves a method for delivering everything from operating systems
to servers and storage through IP-based connectivity as part of an on-demand
service. Clients can avoid the need to purchase software or servers and instead
procure these resources in an outsourced, on-demand service. Popular
examples of the IaaS system include IBM Cloud and Microsoft Azure.

iii) Platform-as-a-service (PaaS)


It is often used by organizations that need to develop and deploy custom
applications, but do not have the resources or expertise to manage the
underlying infrastructure. It can also be used to build and deploy web and
mobile applications quickly and efficiently, as developers can use pre-built
components and services provided by the PaaS provider to speed up the
development process. Examples of PaaS providers include Heroku, Google
App Engine, and Microsoft Azure.

2
1.2.2 Deployment Model Types
There are four types of deployment model
i) Public Cloud
ii) Private Cloud
iii) Hybrid Cloud
iv) Community Cloud
i) Public Cloud

Public cloud providers shared a platform that is accessible to the general


public through an internet connection. Example: Google cloud, IBM Blue
cloud and Sun Cloud.

ii) Private Cloud

Private cloud provides computing services to a private internal network


(within the organization) and selected users instead of the general public.
Example: HP Data Centers, Microsoft and Elastra-private cloud.

iii) Hybrid cloud

Hybrid Cloud is the combination of private and public cloud. Examples


for hybrid cloud including Google, VMware Cloud on AWS.

iv) Community Cloud

Community Cloud is a shared cloud infrastructure owned and managed


by one or more organizations, providing access to systems and services for a
group of organizations with common concerns, such as security and
compliance. An example of a community cloud is the GovCloud managed by
Amazon Web Services (AWS).

3
1.3 Multi Cloud
Multi-Cloud is a heterogeneous architecture utilizing various cloud computing
and storage facilities, which can come from a public cloud, a private cloud, or as
standalone cloud-like on-premise facilities. If the Multi-Cloud architecture is used,
users are aware of the multiple clouds responsible for managing the resources and the
services, or a third party is responsible for managing them. There are various reasons
to adopt a Multi Cloud architecture, including reducing dependency on any single
provider, cost efficiency, flexibility in choice, and disaster immunity. Additionally,
the use of multiple cloud providers can help to optimize backup and restore times by
spreading the workload across multiple providers. This approach helps to reduce the
risk of bottlenecks that may occur when backing up or restoring large amounts of
data to a single cloud provided.
1.4 Fog Computing

The Fog Computing concept was initially developed to minimize the data
access latency from and to the cloud. Fog Computing provides data processing and
networking facilities at the network edge. This concept is to install dedicated servers
located geographically at the Edge of the network in micro/nano data centers close to
the end-users. Whereas Cloud Computing provides resources that are centralized in
the network core, Fog Computing provides services and resources that are distributed
near/at the network edge. This concept is to install dedicated servers located
geographically at the Edge of the network in micro/nano data centers close to the end-
users. Fog nodes can be any of the typical network elements such as routers or
middle-end servers geographically positioned near the end-users. These nodes are
capable of executing applications and store data to provide the required services and
enhance the user experience. They are connected to the cloud core through high-
speed links and can be considered the cloud arms while the brain is in the center of
the network.

4
CHAPTER 2
LITERATURE SURVEY

Fog Computing Advancement: Concept, Architecture, Applications,


Advantages and Open Issues
In this work, Taj et al provides a comprehensive survey of Fog Computing
technology, its core concept, and its impact on real-life applications [20]. This article
defines Fog Computing and highlights the differences between cloud computing and
Fog Computing. This work also discusses the architecture of Fog Computing and its
flexible environment that can be adjusted based on users' customized requests. This
article concludes that Fog Computing is a significant technology that serves as a
broker for IoT, facilitating practical deployments. It also identifies open issues and
challenges related to the adoption of Fog Computing.

Secure Data Deduplication Using Secret Sharing Schemes Over Cloud


In this work, P Singh et al provides a secure method for data deduplication
over cloud storage systems using secret sharing schemes [6]. This work highlights
the importance of data deduplication in cloud storage systems and the potential
security risks associated with the use of public cloud infrastructures. This paper
ensures the confidentiality of user data by dividing it into multiple shares, which are
stored in different cloud servers using secret sharing schemes. This approach is
intended to protect user data against unauthorized access and ensure that the data
remains private even if some of the shares are compromised. This article also presents
a detailed analysis of the proposed method, including the security analysis and
performance evaluation. The security analysis examines the potential vulnerabilities
of the proposed method and assesses its ability to protect against attacks. This article
concludes that the proposed method provides a secure and efficient solution for data
deduplication over cloud storage systems.

5
Dynamic Data Slicing in Multi Cloud Storage Using Cryptographic Technique

In this work, K Subramanian et al provides a method for dynamic data slicing


in multi-cloud storage using cryptographic techniques to enhance the security of data
storage and retrieval [3]. This article highlights the importance of data security in
multi-cloud storage systems and proposes a method that dynamically slices and
distributes the data among multiple clouds using cryptographic techniques, including
AES encryption, hash functions, and Shamir's secret sharing scheme. The proposed
method ensures the confidentiality and integrity of user data by preventing
unauthorized access and tampering of data. This article also provides a detailed
performance evaluation of the proposed method and concludes that it provides an
effective and efficient solution for secure data storage and retrieval in multi-cloud
environments. However, one potential drawback of this method is that it may require
additional computational resources, such as encryption and decryption, which could
increase processing time and possibly lead to slower performance in certain use cases.

FogRoute: Delay Tolerant Networks (DTN)-based data dissemination model in


fog computing
In this work, Gao et al presents a novel approach for data dissemination in fog
computing using Delay Tolerant Networks (DTN) [4]. This paper highlights the
limitations of traditional data dissemination approaches in fog computing due to
factors such as network heterogeneity, intermittent connectivity, and mobility, and
introduces a new approach that uses DTN to overcome these challenges. The
proposed approach, called FogRoute, leverages the computing and storage resources
available in fog nodes to improve the efficiency and effectiveness of data
dissemination. This paper provides a detailed description of the FogRoute
architecture, its components and their functions, and evaluates its performance using
simulations. The results show that FogRoute outperforms existing data dissemination

6
approaches in terms of delivery ratio, delay, and overhead. However, one potential
limitation of FogRoute is that it requires a certain level of infrastructure and
resources, which may not be available or feasible in certain fog computing
environments.

A Smart Manufacturing Service System Based on Edge Computing, Fog


Computing, and Cloud Computing,
In this paper, tao et al proposes a new approach to support smart manufacturing
services by integrating edge computing, fog computing, and cloud computing [11].
The system architecture consists of three layers, each providing specific capabilities
to improve the efficiency and effectiveness of manufacturing processes. This paper
describes a prototype implementation and shows that the proposed system
outperforms traditional manufacturing service systems in terms of reliability,
scalability, and flexibility. However, the complexity of the system may pose
challenges in deployment and management.

Disaster-and-Evacuation-Aware Backup Datacenter Placement Based on Multi-


Objective Optimization .
In this work, H.Lei et al provides a new approach to address the problem of
backup datacenter placement in disaster scenarios [15]. This paper presents a multi-
objective optimization model that considers factors such as geographic location,
connectivity, and evacuation distance. The proposed model aims to minimize the
likelihood of data loss while ensuring timely data recovery in the event of a disaster.
The paper also presents a case study and shows that the proposed approach
outperforms traditional backup datacenter placement strategies. However, the
practical implementation of the proposed approach may be challenging due to the
need for accurate data on disaster scenarios and evacuation routes.

7
Scheduling Scientific Workflow Using Multi-Objective Algorithm
In this work, Mazen et al proposes a new approach to address the problem of
scheduling scientific workflows in distributed computing environments [16]. The
paper presents a multi-objective algorithm that considers factors such as task
deadline, task dependencies, and resource availability. The proposed algorithm aims
to optimize multiple objectives such as make span, resource utilization, and energy
consumption. The paper also presents a case study and shows that the proposed
approach outperforms traditional scheduling algorithms in terms of efficiency and
effectiveness. However, the practical implementation of the proposed approach may
be challenging due to the need for accurate modeling of scientific workflows and
resource availability.

The ‘droplet’: A new personal device to enable fog computing


In this work, Nasr et al introduces a new personal device designed to facilitate
the deployment of fog computing [5]. The paper discusses the limitations of
traditional cloud computing and the need for a more distributed computing model
that can handle the increasing volume of data generated by IoT devices. The paper
proposes the device as a solution that can extend the capabilities of IoT devices and
enable fog computing. The paper describes the architecture of the device, which
includes a microcontroller, Wi-Fi module, and Bluetooth Low Energy (BLE) module.
The paper also discusses the potential use cases of the device, such as healthcare
monitoring, smart homes, and industrial automation. However, the paper does not
provide any experimental results to demonstrate the effectiveness of the proposed
device. Furthermore, while the authors discuss the potential use cases of the device,
such as healthcare monitoring, smart homes, and industrial automation, they do not
provide any concrete examples of how the device could be used in these contexts. As

8
a result, it is unclear how the device would be integrated into existing systems and
what the practical implications of its use would be

Distributed multi cloud storage system to improve data security with hybrid
encryption.
In this work, zaman et al proposes a distributed multi-cloud storage system
that enhances data security through hybrid encryption [7]. The paper discusses the
limitations of traditional cloud storage systems and the need for a more secure and
scalable approach. The proposed system is designed to address these issues by
distributing data across multiple cloud storage providers and encrypting it using a
combination of symmetric and asymmetric encryption techniques. The paper
describes the architecture of the system, which includes a client-side component that
encrypts data before it is uploaded to the cloud, as well as a server-side component
that handles the distribution and retrieval of data. The paper also presents
experimental results that demonstrate the effectiveness of the proposed system in
terms of data security and performance.

Duplicacy: A New Generation of Cloud Backup Tool Based on Lock-Free


Deduplication.
In this work, Li et al document introduces a new cloud backup tool that utilizes
lock-free deduplication to improve backup efficiency and reduce storage costs [1].
The document discusses the limitations of traditional backup tools and the need for a
more efficient approach that can handle large amounts of data. The tool is designed
to address these issues by using a novel deduplication algorithm that operates without
locks, allowing for high concurrency and scalability. The document presents
experimental results that demonstrate the effectiveness of the tool in terms of backup
speed and storage savings. The document also discusses the design and

9
implementation of the tool, including its use of cryptographic hashing to ensure data
integrity and its support for multiple cloud storage providers

Dynamic data slicing in multi cloud storage using cryptographic technique.


In this work, F L John et al proposes a new approach for storing data across
multiple cloud providers, called dynamic data slicing [3]. This approach involves
splitting the data into smaller, encrypted fragments and distributing them across
multiple cloud providers. The use of cryptographic techniques helps to improve
security by ensuring that the data remains protected even if one or more cloud
providers are compromised. This approach can potentially improve the reliability and
availability of data stored in the cloud, while also enhancing security.

Fog computing for vehicular ad-hoc networks: Paradigms scenarios and issues.
In this work, Kai et al discusses the potential of fog computing for Vehicular
Ad-Hoc Networks (VANETs) and the various paradigms, scenarios, and issues
related to its implementation [8]. The paper presents an overview of VANETs and
the challenges they face, including limited bandwidth and high mobility. It then
explores the potential of fog computing to address these challenges by providing a
decentralized, low-latency computing infrastructure that can operate closer to the
network edge. ThFor example, fog nodes can be placed on roadside infrastructure,
such as traffic lights or lamp posts, to improve communication among vehicles. Fog
nodes can also be deployed on mobile platforms such as buses or emergency vehicles
to provide better coverage in areas with poor connectivity. The paper also discusses
various scenarios in which fog computing can be applied in VANETs, such as real-
time traffic management and collision avoidance. Finally, the paper highlights some
of the issues and limitations of fog computing in VANETs, including security and
privacy concerns, and suggests future research directions to address these challenges.

10
A Cloud Based Automatic Recovery and Backup System with Video
compression.
In this work, Raigonda rani megha et al proposes a cloud-based system for
automatic data recovery and backup, which includes video compression techniques
to reduce storage requirements [19]. The system utilizes cloud computing resources
to enable rapid data backup and recovery, with a focus on minimizing downtime and
data loss. The paper discusses the design and implementation of the system, including
the use of video compression techniques to reduce storage requirements. The paper
also presents experimental results that demonstrate the effectiveness of the system in
terms of backup and recovery speed, as well as storage savings.

Disaster-and- Evacuation-Aware Backup Datacenter Placement Based on


Multi-Objective Optimization.
In this work, Wang et al presents a new approach for placing backup
datacenters that takes into account the possibility of disasters and evacuations [15].
The paper presents experimental results that demonstrate the effectiveness of the
proposed approach in terms of reducing backup latency and cost, while also
improving the resilience and availability of backup datacenters. The proposed
method utilizes a multi-objective optimization algorithm to determine the optimal
placement of backup datacenters in order to minimize both latency and cost, while
also ensuring that backup datacenters are located in safe areas that are less prone to
natural disasters and other hazards. The paper presents experimental results that
demonstrate the effectiveness of the proposed approach in terms of reducing backup
latency and cost, while also improving the resilience and availability of backup
datacenters. However, the limitations of the proposed approach include the need for
accurate disaster prediction and evacuation information, as well as the potential cost
and complexity of implementing backup datacenters in safe areas.

11
A proposed virtual private cloud-based disaster recovery strategy.
In this work, S.Hamadah et al introduces a proposed virtual private cloud-
based disaster recovery strategy that aims to improve the resilience of cloud-based
services [17]. The paper discusses the challenges associated with traditional disaster
recovery strategies and highlights the benefits of using a virtual private cloud
approach. The proposed strategy involves the creation of a virtual private cloud
environment that can be used for disaster recovery purposes. The paper also presents
a case study that demonstrates the effectiveness of the proposed strategy in terms of
recovery time and cost. The study shows that the proposed strategy can provide a
cost-effective and efficient approach to disaster recovery for cloud-based services.

A Management System for Servicing Multi-Organizations on Community Cloud


Model in Secure Cloud Environment.
In this work, K.Dubey et al a management system is proposed for servicing
multi-organizations in a community cloud model [14]. The system is designed to
operate in a secure cloud environment and provides features such as resource
allocation, service monitoring, and billing. The paper also describes the simulation
experiments conducted to evaluate the proposed system. The simulations
demonstrate the effectiveness of the proposed system in terms of resource utilization
and service availability. The experiments show that the system can effectively
allocate resources to different tenants based on their requirements, monitor their
services, and ensure their availability. The proposed system is intended to address the
challenges faced by community cloud providers in managing multiple tenants and
their resources. The authors present an architecture for the system and describe its
components and functions. The system is also evaluated through simulations, which
demonstrate its effectiveness in terms of resource utilization and service availability.

12
A Conceptual Framework for Disaster Recovery and Business Continuity of
Database Services in Multi- Cloud.
In this work, Mohammad Alshammari et al presents a conceptual framework
for disaster recovery and business continuity of database services in a Multi Cloud
environment [18]. The framework aims to address the challenges of managing
database services across multiple clouds and ensuring their availability and resilience
in the event of disasters or disruptions. The framework includes a set of guidelines
and best practices for disaster recovery and business continuity planning, as well as
a set of tools and technologies for data replication, backup, and recovery. The work
also discusses the key issues and challenges associated with implementing a Multi
Cloud disaster recovery strategy, including data consistency, network latency, and
security. Overall, the proposed framework provides a comprehensive approach for
ensuring the availability and resilience of database services in a Multi Cloud
environment.

Multi-Replica and Multi-Cloud Data Public Audit Scheme Based on Blockchain.


In this work, Yang et al provides a data public audit scheme based on
blockchain technology that supports multi-replica and multi-cloud environments
[12]. The scheme aims to provide secure and efficient auditing of data stored in
multiple cloud replicas while ensuring data integrity, authenticity, and
confidentiality. By leveraging the immutability and transparency of blockchain
technology, the scheme enables third-party auditors to verify the correctness of data
without accessing the actual data or relying on any single cloud provider. The
proposed scheme also supports multi-cloud environments by allowing data to be
replicated across different cloud providers, which enhances data availability and
resilience against cloud failures. Overall, the scheme provides a practical and robust
solution for public auditing of data in multi-replica and multi-cloud environments.

13
Fog computing for vehicular ad-hoc networks: Paradigms scenarios and issues.
In this work, Cong et al explores the use of fog computing in vehicular ad-hoc
networks, discussing different paradigms, scenarios, and issues associated with its
implementation [10]. Fog computing is a distributed computing model that extends
cloud computing capabilities to the edge of the network, enabling faster response
times and reduced network latency. In vehicular ad-hoc networks, fog computing can
be used to support real-time applications and services, such as vehicle-to-vehicle
communication, traffic management, and accident prevention. However, there are
several challenges that need to be addressed when deploying fog computing in
vehicular ad-hoc networks, such as network security, scalability, and resource
management. This work provides insights and recommendations to help address these
challenges and improve the effectiveness of fog computing in vehicular ad-hoc
networks.

Scheduling Scientific Workflow Using Multi-Objective Algorithm.

In this work presents a method for scheduling scientific workflows using a


multi-objective algorithm [14]. Scientific workflows often involve the execution of
multiple tasks in a particular order, and efficient scheduling of these tasks can
significantly improve overall workflow performance. The proposed algorithm aims
to optimize multiple objectives simultaneously, such as minimizing execution time,
reducing resource usage, and balancing task distribution. The paper discusses the
challenges of scientific workflow scheduling and how the proposed method can
address these challenges. Experimental results show that the multi-objective
algorithm can effectively schedule scientific workflows, leading to improved
performance compared to traditional scheduling methods.

14
2.1 Existing System

Duplicacy is a cloud backup tool that offers efficient and secure backup
solutions for individuals and organizations. Its lock-free deduplication algorithm
ensures that only unique data chunks are backed up to the cloud, reducing the amount
of data that needs to be transferred and stored. This results in optimized backup
processes and reduced storage costs, as the same data chunks are not stored multiple
times. Deduplication technique also helps cloud storage service providers to
efficiently use their disk space. Additionally, if deduplication is done at the source, it
saves upload bandwidth by not transmitting duplicate data copies Convergent
encryption, also known as content hash keying, is a cryptographic technique that
generates the same ciphertext for identical plaintext files. This is achieved by using a
hash function to generate a unique identifier for the plaintext data, which is then used
as the encryption key. In cloud computing, convergent encryption is used to remove
duplicate files from storage without the provider having access to the encryption
keys. By comparing the unique identifiers of the plaintext data, the cloud provider
can identify and remove duplicate files without decrypting them or compromising the
security of the data. This approach has several advantages, including reduced storage
costs and improved data security, as it ensures that the cloud provider cannot access
the plaintext data. Despite these limitations, convergent encryption is a useful
technique in cloud computing for efficient and secure data storage This approach
ensures that identical plaintext files produce identical ciphertext, which helps in
identifying and removing duplicate files. While existing backup systems have several
disadvantages such as complex management, cost, lack of features, problems with
compatibility, and different APIs, Duplicacy offers a simple and effective solution
to address these issues. Its efficient and secure backup solutions are designed to meet
the needs of individuals and organizations.

15
CHAPTER 3

PROPOSED WORK

3.1 Proposed Work

A secure backup system that aims to address the limitations of traditional


cloud-based backup solutions by using a multi-cloud and fog computing architecture.
Multi-cloud support: Backup System uses multiple cloud providers to store backup
data, which helps to prevent a single point of failure and provides greater redundancy.
Fog computing: Backup System also utilizes fog computing, which involves using
local edge devices to process and store data. This approach reduces the amount of
data that needs to be transferred to the cloud and can improve backup and restore
performance.
Encryption and security: Backup System employs strong encryption and security
measures to protect backup data from unauthorized access or tampering. Automatic
backup scheduling: Backup System can be set up to automatically back up data on a
regular schedule, reducing the risk of data loss due to user error or forgetfulness.
3.2 Multi Cloud Support

By utilizing multiple cloud providers, Backup system provides users with a


backup solution that is more resilient to a range of potential risks. For example, if one
cloud provider experiences hardware failures or natural disasters that result in data
loss or downtime, backup data can still be retrieved from other cloud providers. This
helps to reduce the risk of permanent data loss and ensures that backup data remains
available in the event of an unexpected event. Additionally, the use of multiple cloud
providers can help to optimize backup and restore times by spreading the workload
across multiple providers. This approach helps to reduce the risk of bottlenecks that
may occur when backing up or restoring large amounts of data to a single cloud

16
provided. Multi-Cloud is a heterogeneous architecture utilizing various cloud
computing and storage facilities, which can come from a public cloud, a private
cloud, or as standalone cloud-like on-premise facilities. If the Multi-Cloud
architecture is used, users are aware of the multiple clouds responsible for managing
the resources and the services, or a third party is responsible for managing them.

3.3 Fog Computing Function


Fog computing is an approach to data processing and storage that involves the
use of edge devices, such as routers, gateways, and switches, to perform tasks locally
instead of transmitting data to the cloud. This approach reduces latency and
bandwidth usage, resulting in faster data processing and reduced network congestion.

In the context of Backup System, fog computing can improve the performance
and efficiency of backup and restore operations. By processing and storing data
locally on edge devices, Backup System can reduce the time and bandwidth required
to back up or restore data to the cloud. This is especially useful for large data sets or
for users with limited bandwidth. Fog computing can also help to reduce storage costs
for backup data by only transmitting the most critical or relevant data to the cloud.
The data that is processed and stored locally on edge devices can be compressed or
deduplicated, resulting in less data being transferred to the cloud and lower storage
costs. Fog nodes can be any of the typical network elements such as routers or middle-
end servers geographically positioned near the end-users. These nodes are capable of
executing applications and store data to provide the required services and enhance
the user experience Overall, fog computing is a useful approach for improving the
performance and efficiency of backup and restore operations, and it can help
organizations reduce their storage costs and network bandwidth usage.

17
3.4 GNU Privacy Guard (GPG)

GNU Privacy Guard (GnuPG or GPG) is a free and open-source software tool
used for secure communication and data encryption. It uses the PGP standard to
provide end-to-end encryption and digital signature capabilities for email, files, and
other types of data. When using GnuPG, a user first generates a public-private key
pair. The public key is then shared with others, while the private key is kept secret.
When someone wants to send a secure message or file to the user, they encrypt it with
the user's public key, and the user then uses their private key to decrypt the message
or file.
In addition to encryption, GnuPG can also be used for digital signatures. A
user can sign a message or file with their private key, which can be verified by anyone
who has the user's public key. This provides a way to verify the authenticity and
integrity of the message or file. This signature can be verified by anyone who has the
user's public key. The digital signature serves as proof that the message or file has
not been tampered with since it was signed by the sender. It also ensures that the
message or file was sent by the sender and not an imposter.
GnuPG is widely used by individuals, organizations, and governments around
the world for secure communication and data encryption. Its open-source nature
allows for transparency and continuous improvement, and its compatibility with
various operating systems and email clients makes it accessible to a wide range of
users.
GnuPG supports a wide range of encryption algorithms, including Advanced
Encryption Algorithm (AES), Rivest Shamir Adleman (RSA) and Secure Hash
Algorithm (SHA)-256 among others. It is widely used by individuals, organizations
and governments around the world for secure communication and data encryption.

18
3.4.1 Pretty Good Privacy (PGP) Key Standard

A comprehensive Key Manager is provided with PGP public and private keys.
This Key Manager can be used to create keys, change keys, view keys and import
keys. These keys can be utilized within Cloud – Fog Data for automating PGP
encryption and decryption within your organization. This Key Manager can also be
used to export public keys for sharing with your trading partners.
Encryption is performed through the use of a key pair. The public key may be
published or sent to the recipient. The private key is known only to the recipient, who
will decrypt the key. The public key is used to encrypt the message or the file and the
private key to decrypt it.

3.4.2 Standards Support for PGP Encryption


The PGP standard is a non-proprietary and industry-accepted protocol which

defines the standard format for encrypted messages, signatures and keys.

Figure 3.4 (a) PGP Encryption

19
Figure 3.4 (b) PGP Decryption

3.4.3 Asymmetric Encryption Algorithms

Asymmetric encryption algorithms, also known as public-key encryption, are


cryptographic methods that use a pair of keys to encrypt and decrypt data. In
asymmetric encryption, the keys used for encryption and decryption are different,
unlike symmetric encryption algorithms where the same key is used for both
encryption and decryption. The two keys used in asymmetric encryption are the
public key and the private key. The public key is shared with anyone who needs to
send encrypted data to the owner of the private key. The private key, on the other
hand, is kept secret and is only known by the owner of the key.
The most widely used asymmetric encryption algorithms include:

RSA (Rivest-Shamir-Adleman): RSA is a widely used asymmetric

encryption algorithm that is based on the factorization of large prime numbers. The

security of RSA relies on the difficulty of factoring large composite numbers into

their prime factors.

20
Diffie-Hellman: The Diffie-Hellman algorithm is used for key exchange and
enables two parties to agree on a shared secret key over an insecure communication
channel without exchanging the key directly. The algorithm works by each party
generating a public-private key pair. They then exchange their public keys over the
insecure channel. Using their own private key and the other party's public key, they
compute a shared secret key. This key is the same for both parties and can be used to
encrypt and decrypt messages exchanged over the insecure channel
Elliptic Curve Cryptography (ECC): ECC is a type of asymmetric
encryption algorithm that uses elliptic curves instead of prime numbers. ECC is used
in various applications, including digital signatures, encryption, and key exchange. It
is widely used in mobile devices and other resource-constrained environments where
efficient use of computing resources is essential. ECC is also used in internet
protocols such as Transport Layer Security (TLS) and Secure Shell (SSH) to provide
secure communications.
ElGamal: ElGamal is a public-key encryption algorithm that is based on the
discrete logarithm problem. It is used for both encryption and digital signatures. In
the ElGamal encryption process, the sender generates a public key and a private key.
The public key is shared with the receiver, who uses it to encrypt the message before
sending it back to the sender. The sender then uses their private key to decrypt the
message.
3.4.4 Symmetric Encryption Algorithm

A symmetric encryption algorithm, also known as a secret key encryption


algorithm, is a type of encryption that uses a single secret key to encrypt and decrypt
data. In symmetric encryption, the same key is used for both encryption and
decryption.
The process of symmetric encryption involves taking plaintext (the original
message) and applying an encryption algorithm to it, along with a secret key. The

21
result is ciphertext, which is the encrypted version of the plaintext. To decrypt the
ciphertext, the same secret key is used along with a decryption algorithm to convert
the ciphertext back into plaintext.
Some common examples of symmetric encryption algorithms include:
Data Encryption Standard (DES): Data Encryption Standard (DES) is a
symmetric encryption algorithm that was developed by IBM in the 1970s and was
widely used for data encryption until the late 1990s. However, due to its small key
size of 56 bits, which can be vulnerable to brute force attacks, it is now considered to
be outdated and insecure. Address the security issues of DES, a variant known as
Triple DES (3DES) was developed, which uses three separate keys and is more secure
than the original DES algorithm. However, even 3DES is no longer considered to be
the best option for encryption, as it is slower and less efficient than newer encryption
algorithms like AES.
Triple DES (3DES): Triple DES (3DES) is an encryption algorithm that is an
improvement over the original DES algorithm. 3DES uses a three-step encryption
process to increase the effective key size of the encryption. By using three separate
keys in this way, 3DES effectively has a key size of 168 bits (three 56-bit keys),
which makes it more secure than the original DES algorithm. The use of multiple
keys also provides additional security against brute force attacks. While 3DES is
more secure than DES, it is also slower and less efficient. As a result, newer
encryption algorithms like AES (Advanced Encryption Standard) have become more
popular for modern encryption applications.
Advanced Encryption Standard (AES): AES is a symmetric encryption
algorithm used for encrypting and decrypting data in 128-bit blocks. It supports key
sizes of 128, 192, or 256 bits. AES encryption involves four steps, while decryption
involves four inverse steps. AES is widely used for securing data in applications such

22
as email, file encryption, and SSL/TLS. AES is considered to be highly secure, with
no practical known attacks against the full algorithm
3.5 Compression Algorithm
Compression algorithms are used to reduce the size of data by encoding it in a
more compact form, thereby reducing storage requirements and increasing
transmission efficiency. There are several compression algorithms available,
including lossless and lossy compression. Lossless compression algorithms, such as
ZIP and GZIP allow data to be compressed and decompressed without losing any
information. Lossy compression algorithms, such as JPEG and MP3 allow data to be
compressed to a smaller size by discarding some of the information that is less
important or less noticeable to the human eye or ear. Compression algorithms are
widely used in a variety of applications, including email, file transfer and multimedia
streaming, to reduce storage requirements and improve transmission speed and
efficiency.

Figure 3.5 Compression Algorithm

The most widely used compression algorithms include:

23
Zoning Improvement Plan (ZIP): It is a file format and compression utility
that is widely used for archiving and compressing files. It uses lossless compression
to reduce the size of files without losing any information. ZIP files can be
decompressed back to their original form with no loss of quality or accuracy. ZIP is
often used for archiving and transferring files over the internet. In addition to the
postal code system, there is also a file compression format called ZIP, which was
developed in the 1980s as a way to compress files for storage and transfer. The ZIP
file format uses lossless compression to reduce the size of files without losing any
information. It has become one of the most widely used file compression formats in
the world, particularly for archiving and transferring files over the internet.

GNU Zoning Improvement Plan (GZIP): It is a file compression and


decompression utility commonly used on Unix and Linux operating systems. It was
developed as part of the GNU Project and uses a lossless compression algorithm to
reduce the size of files for storage and transfer without losing any information. GZIP
is commonly used for compressing individual files, but it can also be used to archive
and compress multiple files. It is a popular choice for web servers because it can
significantly reduce the size of files transmitted over the internet, leading to faster
page load times for users.

Portable Network Graphics (PNG): It is a file format used for storing digital
images, particularly for graphics and images with transparent backgrounds. PNG was
developed as a replacement for the GIF format, which had some limitations, such as
a limited color range and licensing issues. One of the main feature of PNG is its
support for alpha channels, which allows for transparency in images. This makes it
particularly useful for web design and other applications where transparency is
important.

24
3.6 Proposed Architecture
Edge Devices Backup System Cloud

Figure 3.5 Proposed Architecture

3.7 Modules Description

i. Edge devices
ii. Cloud Provider
iii. Backup System
iv. Data Encryption
v. Data compression
vi. Data Upload
vii. Data Retrieval

25
3.7.1 Edge devices

These are the end-user devices that need to back up and secure their data. These
edge nodes can be users’ mobile phones, laptops, IP cameras etc. The edge nodes
require a secure and fast backup interface with the ability to read, modify and delete
the stored data at any time. The privacy of each edge node should be protected so that
no edge device can access the data of the other nodes on the system. These
requirements should be achieved without complicated operations at the edge nodes.

Figure 3.7.1 Edge devices

3.7.2 Cloud Provider

A cloud provider is a company or organization that offers cloud computing


services and infrastructure to businesses, organizations and individuals. Cloud
providers offer a range of services including virtual servers, storage, networking,
databases and applications that can be accessed and used over the internet. Examples
of popular cloud providers include Amazon Web Services (AWS), Microsoft Azure,
Google Cloud Platform (GCP) and IBM Cloud. Cloud providers offer a flexible and
scalable way for businesses to store and process data and run applications without the
need for on-premises hardware and infrastructure. The data are collected from the
edge devices on the backup system and periodically backed up to several public cloud
servers. Backing up the data on the cloud offers disaster recovery and increases
reliability.

26
The data will be encrypted and divided into multiple chunks before store it on the
public cloud. This prevents any malicious cloud service provider from utilizing the
user data or compromising the user’s privacy.

Figure 3.7.2 Cloud Provider

3.7.3 Backup System

The backup system that runs on the Fog layer. The Backup System offers a
simple but safe backup interface to the edge nodes, whereas completely protecting
the data on a distributed Multi Cloud storage. The Backup System is responsible for
all complicated operations to keep the user data secure and reliable, whereas the edge
nodes will not pay much attention to these operations. In other words, the backup
system offloads the processing required by the edge devices to the Fog nodes. This
type of processing offloading allows utilizing state-of-the-art backup techniques
without being limited by the low resources of the edge nodes. Moreover, the edge
nodes are usually battery-operated and in low power consumption mode most of the

27
time. On the other hand, the fog nodes are connected to a power source. Hence, there
are lower restrictions on their activity time.

Figure 3.7.3 (a) Backup System

Delta data calculation refers to the process of identifying and calculating the changes
made to a file or dataset since the last backup. Rather than backing up the entire file
or dataset each time, only the changes (delta data) are backed up, which can
significantly reduce backup times and storage requirements.

Figure 3.7.3 (b) Backup System

28
Delta data calculation refers to the process of identifying and calculating the
changes made to a file or dataset since the last backup. Rather than backing up the
entire file or dataset each time, only the changes (delta data) are backed up, which
can significantly reduce backup times and storage requirements. Backup System
saves network bandwidth and storage needs. Hence, it adopts a versioning scheme to
back up the data. At each periodic backup, the Backup System calculates the delta
from the last backup instant and only backs up the new delta. In the case of the first
backup, all data will be in the delta. The system generates the mandatory metadata to
enable data chain construction from the incremental backups when data recovery is
performed. These metadata are also used to calculate the delta between a backup
instant and a previous backup instant.
3.7.4 Data Encryption

In Backup, data encryption is implemented using the PGP encryption scheme.


The owner of the backup system provides the necessary key pairs for encryption and
decryption. The data delta generated in the first stage is encrypted using the PGP
scheme, ensuring that the data is protected from unauthorized access or tampering.
PGP is an encryption standard that uses a combination of symmetric-key and public-
key encryption to secure data. It works by generating a public-private key pair, where
the public key can be shared with others for encrypting messages, while the private
key is kept secret and used for decrypting messages. When a message is encrypted
using the public key, only the owner of the private key can decrypt it. Symmetric key
encryption algorithms, also known as secret key algorithms, are encryption
techniques that use the same secret key for both encryption and decryption of data.
The secret key is a shared secret between the sender and receiver of the encrypted
data. Symmetric-key encryption algorithms are generally faster and require less
computational power than public-key algorithms.

29
Public key encryption, also known as asymmetric encryption, is a type of
encryption that uses a pair of keys to encrypt and decrypt data. The pair of keys
consists of a public key and a private key. The public key can be freely distributed,
while the private key must be kept secret. Public key encryption algorithms are based
on complex mathematical problems that are easy to solve in one direction but difficult
to solve in the other direction. These algorithms use the public key to encrypt the
data, and only the corresponding private key can decrypt the data. The Symmetric
key encryption algorithm is the Advanced Encryption Standard (AES). It is a widely
used and highly secure encryption algorithm that is used to protect sensitive data in
various applications, including online transactions, secure messaging and data
storage. The generate key() function generates a secure encryption key using the
Password-Based Key Derivation Function 2 (PBKDF2) algorithm with the given
password.

Figure 3.7.4 Data Encryption

30
Public key encryption algorithm is the Elliptic Curve Cryptography (ECC).
ECC is a type of public key cryptography based on the mathematics of elliptic curves.
It provides the same level of security as other public key cryptography systems such
as RSA, but with smaller key sizes, making it more efficient for use in resource-
constrained environments such as mobile devices or IoT devices.
3.7.5 Data compression

Data compression is the process of reducing the amount of data required to


represent a piece of information. This can be achieved by removing redundant or
unnecessary information from the data, or by using mathematical algorithms to
encode the data in a more efficient manner. Data compression can be used to reduce
the size of data files, making them easier to store and transmit over networks, as well
as to improve the performance of applications that process large amounts of data.
Some common data compression techniques include lossless compression, which
preserves all the original data, and lossy compression, which sacrifices some
information in order to achieve greater compression ratios. The compression
algorithm used in Backup System is the standard GNU Zoning Improvement Plan
(Gzip) compression algorithm, which is a widely used and efficient compression
algorithm. Gzip works by identifying repetitive patterns in the data and replacing
them with shorter codes, thereby reducing the overall size of the data. Gzip works by
compressing a stream of data using a combination of Huffman coding and Lembel-
Ziv (LZ77) compression techniques. It first divides the input data into fixed-size
blocks and then applies the LZ77 algorithm to find repeated sequences of data within
each block. It then uses Huffman coding to assign shorter codes to frequently
occurring symbols, thereby reducing the overall size of the data. Gzip supports
various compression levels, ranging from 1 to 9, with higher levels providing better
compression ratios but slower processing times. It is used to compress files on Unix-
based systems and is also supported by many

31
programming languages and libraries. LZ77 compression is a lossless data
compression algorithm that works by replacing repeated occurrences of data with
references to a single copy of that data existing earlier in the uncompressed data
stream. This is achieved by using a sliding window of previously encountered data in
the stream, and encoding each subsequent segment of data as either a literal value or
a pointer to a previous instance of that same segment in the window.

Figure 3.7.5 Data compression

Huffman coding is a variable-length prefix coding algorithm that is often used


to further compress the output of LZ77 compression. It works by encoding frequently
occurring symbols with short codes, and less frequently occurring symbols with
longer codes. The codes are constructed based on the frequency of each symbol in
the input data, with more frequent symbols being assigned shorter codes to reduce
the overall length of the encoded output. By combining these two techniques, LZ77
compression can be used to remove redundancy and reduce the size of the data
stream, and Huffman coding can be used to encode the resulting output with shorter
codes, resulting in further compression and reduced storage requirements.

32
3.7.6 Data Upload

This is the final stage of the backup scenario in Backup. The data chunks are
going to be uploaded to the cloud servers. The Backup System authenticates itself
with each CSP automatically and uploads the user data. The Backup System adopts
a round-robin scheme to preserve the balance of storage usage on each CSP account.
Metadata is distributed and replicated in the same manner as the user data. This
scheme has been shown to keep the storage balanced on all the CSPs over the long
term and avoid overwhelming specific CSP.
3.7.7 Data Retrieval

Edge devices can restore their data at any time as long as they have access to
the Backup System. Even if there is no internet connection, edge devices can
download the data from the Backup System in case of any data loss incident at the
edge devices. This data retrieval can only be accomplished by authenticating the edge
device with the Backup System using its username and password. After
authentication, the edge device has immediate and full access to its previously
backed-up data.

Figure 3.7.7 Data Retrieval

33
In the event of a disaster in the Fog layer , data retrieval and recovery of the
entire system can be conveniently performed using the Backup System. Disaster
events in the Fog layer include Backup System software failures, hardware failures,
hard drive crashes, or other natural disasters. In order to recover the system, the
Backup System only needs to provide the master key and the credentials of the used
CSPs after bringing up a new Fog node. Then, the Backup System can search for the
metadata on all the configured CSPs and construct the backup chain. After that, the
data can be downloaded, decrypted, and reconstructed automatically. In the end, the
whole system is recovered again and the edge devices can access their data as usual.

3.8 Advantages
1) Speed
The Backup System is running on the Fog layer on user premises. User devices
and the Fog layer are in the same Local Area Network (LAN). This results in many
advantages. First, the data backup speed from the user devices to the Backup System
will be super fasten up to the LAN speed, compared to direct data upload to the cloud.
Second, the backup service is available even if there is no internet access.
2) Security
The Fog layer is a personal device. This means that the connected devices will
be well known and belong to the Fog layer owner. The security threats are more
controllable as the Fog layer and the edge devices are under user control. Backup
System restricts the Fog layer access to the specified user devices only through
authentication to allow better management and reduce security threats.
3) Privacy
Data privacy is maintained by isolating the edge devices’ data on the Fog layer
where no edge device can get access to the data of the other devices. Therefore the
Backup System can be used by all users on the same LAN without privacy concerns.

34
CHAPTER 4
IMPLEMENTATION

4.1 SYSTEM REQUIREMENTS


System requirements refer to the minimum hardware and software
specifications needed for a particular computer system or software application to
function properly. The specific system requirements will depend on the particular
software or hardware being used.

4.1.2 Hardware Requirements


Hardware requirements refer to the physical components and specifications
that are necessary for a computer system or other electronic device to operate
properly.

Processors
Processors, also known as central processing units (CPUs), are the brains of a
computer system. They are responsible for carrying out instructions and performing
calculations for the system. In this project, we are using Intel® Core™ i5 processor
4300M at 2.60 GHz or 2.59 GHz (1 socket, 2 cores, 2 threads per core), 8 GB of
DRAM.

Disk space
Disk space refers to the amount of storage capacity available on a hard drive
or other storage device. It is typically measured in gigabytes (GB) or terabytes (TB),
with one terabyte being equal to 1,000 gigabytes. In this system, we are using 320
GB for backup storage

35
Operating systems
An operating system (OS) is a software that manages computer hardware
and software resources and provides common services for computer programs. It
acts as an interface between computer hardware and software applications, allowing
them to communicate with each other and operate effectively. In this project, we are
using Windows® 10, macOS*, and Linux*.

4.1.3 Software Requirements


Software requirements refer to the specific applications or programs that need
to be installed on a computer system to perform a particular task or function. The
software requirements for a computer system can vary widely depending on the
intended use of the system.

Python
Python is used for a wide range of applications, from web development and
data analysis to scientific computing and artificial intelligence. Its syntax is designed
to be easily readable and intuitive, with an emphasis on code readability. Python is
also known for its extensive libraries and frameworks, which make it easy to
accomplish complex tasks with minimal code. Python can be run on various operating
systems, including Windows, Mac OS, and Linux, and can be used for both desktop
and web-based applications. Its popularity has led to a large community of developers
and users who create and share libraries, tools, and resources to help others learn and
use Python effectively.

Cloud SDK
Cloud SDK is a set of tools and libraries that enables developers to easily create
and manage applications on cloud computing platforms such as Google Cloud

36
Platform (GCP). It includes command-line tools, libraries, and APIs for managing
resources, deploying and testing applications, and accessing cloud services.Cloud
SDK provides a streamlined way to interact with GCP services and resources,
allowing developers to focus on building applications rather than worrying about the
underlying infrastructure. It includes tools for managing authentication and access
control, as well as tools for monitoring and debugging applications.Cloud SDK is
available for Windows, macOS, and Linux operating systems, and supports multiple
programming languages including Python, Java, and Go. It is regularly updated with
new features and improvements to ensure that developers have access to the latest
tools and resources for building and deploying cloud applications.

Pretty Good Privacy (PGP)


Pretty Good Privacy (PGP) is a data encryption and decryption program that
uses a combination of symmetric-key and public-key encryption to secure messages
and data. PGP works by using a public key to encrypt a message or data file, which
can only be decrypted by the corresponding private key. This allows users to securely
communicate with one another without worrying about unauthorized access or
interception of their messages. PGP also includes features such as digital signatures,
which can be used to verify the authenticity and integrity of a message or file, and
compression algorithms, which can reduce the size of files before encryption to save
space and speed up transmission. PGP works by using a public key to encrypt a
message or data file, which can only be decrypted by the corresponding private key.
This allows users to securely communicate with one another without worrying about
unauthorized access or interception of their messages. PGP has become a widely used
tool for secure communications and is commonly used for email encryption, file
encryption, and data encry. It has also inspired many other encryption tools and

37
technologies, and has helped to popularize the use of public-key encryption in the
modern era.

Compression Libraries
Compression libraries are software libraries that provide developers with a set
of tools and functions to compress and decompress data in various formats. These
libraries are designed to make it easy for developers to integrate compression and
decompression functionality into their applications, without having to implement the
algorithms themselves. In this project, we are using the GNU Zoning Improvement
Plan (GZip) A file format and associated compression library that uses the DEFLATE
algorithm to compress and decompress data. Gzip is commonly used on Unix-based
systems, and is also supported by many other platforms and operating systems.

Hashing Libraries
Hashing libraries provide a set of functions or classes to implement hash
functions in programming languages. Hash functions take input data of arbitrary size
and produce a fixed-size output, known as a hash value or digest. The output of a
hash function is typically used for data integrity verification, message authentication,
and indexing in hash tables.

Other Libraries
Backup System uses various other libraries such as cryptography, boto3. The
"cryptography" library is a popular Python library that provides various
cryptographic functionalities, including encryption, decryption, hashing, and digital
signatures. It is a high-level library that offers easy-to-use interfaces for performing
secure cryptographic operations The "Boto 3" library is the Amazon Web Services
(AWS) SDK for Python, which allows developers to easily integrate their Python
applications

38
CHAPTER 5
CONCLUSION AND FUTURE WORK

5.1 Conclusion

Backup System is a highly innovative and efficient backup solution that

utilizes the latest technologies to provide users with a secure and reliable backup

service. By leveraging multiple cloud providers, the system ensures that data is

redundantly stored across multiple geographically dispersed servers, thereby

reducing the risk of data loss due to hardware failures or natural disasters. The use of

fog computing improves the overall performance of backup and restore operations,

while encryption and compression techniques provide additional security and

optimize storage requirements.

5.2 Future Work

There is potential to explore additional compression techniques to further optimize

storage requirements, integration with additional backup sources to provide a more

comprehensive backup solution and support for additional encryption algorithms to

offer users more flexibility and control over their data security. Overall, Backup

System represents an important step forward in backup technology and provides users

with a reliable and efficient backup solution that addresses many of the challenges

associated with traditional backup solutions.

39
APPENDIX 1

Source Code

login.py

def authenticate_user(username, password):

if username == 'admin' and password == 'password':

return True

else:

return False

def main():

parser = argparse.ArgumentParser(description='User Authentication Tool')

parser.add_argument('username', help='Username')

args = parser.parse_args()

username = args.username

password = getpass.getpass('Enter password: ')

if authenticate_user(username, password):

print('Logged in successfully')

else:

print('Invalid credentials')

if __name__ == '__main__':

40
main()

backup.py

import argparse

def configure_backup(frequency, destinations, data_selection):

# TODO: Implement backup configuration logic

print(Backup configuration set for frequency {frequency}, destinations

{destinations}, and data selection {data_selection}')

def main():

parser = argparse.ArgumentParser(description='Backup Configuration Tool')

parser.add_argument('frequency', help='Backup frequency')

parser.add_argument('destinations', help='Backup destinations')

parser.add_argument('data_selection', help='Data selection')

args = parser.parse_args()

configure_backup(args.frequency, args.destinations, args.data_selection)

if __name__ == '__main__':

main()

41
encryption.py

import os

import hashlib

from Crypto.Cipher import AES

# Generate a secure encryption key

def generate_key(password):

# Use PBKDF2 to derive a key from a password

salt = os.urandom(16)

key = hashlib.pbkdf2_hmac('sha256', password.encode('utf-8'), salt, 100000)

return key, salt

# Encrypt backup data using AES algorithm

def encrypt_data(data, key, salt):

# Use AES in CBC mode with PKCS7 padding

iv = os.urandom(16)

cipher = AES.new(key, AES.MODE_CBC, iv)

encrypted_data = iv + cipher.encrypt(data.encode('utf-8'))

return encrypted_data, salt

42
# Decrypt encrypted backup data using AES algorithm

def decrypt_data(encrypted_data, key, salt):

# Use AES in CBC mode with PKCS7 padding

iv = encrypted_data[:16]

cipher = AES.new(key, AES.MODE_CBC, iv)

decrypted_data = cipher.decrypt(encrypted_data[16:]).decode('utf-8')

return decrypted_data

# Example usage

password = "mysecurepassword"

backup_data = "Some sensitive backup data"

key, salt = generate_key(password)

encrypted_data, salt = encrypt_data(backup_data, key, salt)

# Print encrypted data and salt

print("Encrypted data:", encrypted_data)

print("Salt:", salt.hex())

# Decrypt encrypted data and print original text

decrypted_data = decrypt_data(encrypted_data, key, salt)

print("Original text:", decrypted_data)

43
compression.py

import gzip

# open the input file in binary mode

with open('input_file.txt', 'rb') as f_in:

# open the output file in binary mode

with gzip.open('compressed_file.gz', 'wb') as f_out:

# copy the input file to the output file using gZip compression

f_out.writelines(f_in)

multi_cloud_backup.py

import os

import hashlib

from google.oauth2.credentials import Credentials

from googleapiclient.discovery import build

from googleapiclient.errors import HttpError

from google.auth.transport.requests import Request

def multi_cloud_backup(data, cloud_providers):

# Create a dictionary to hold the data hashes for each cloud provider

cloud_hashes = {}

# Loop through each cloud provider

44
for provider in cloud_providers:

if provider == "dropbox":

# Code to backup data to Dropbox

pass

elif provider == "onedrive":

# Code to backup data to OneDrive

pass

elif provider == "googledrive":

# Get Google Drive credentials

creds = None

if os.path.exists('token.json'):

creds = Credentials.from_authorized_user_file('token.json',

['https://www.googleapis.com/auth/drive'])

if not creds or not creds.valid:

if creds and creds.expired and creds.refresh_token:

creds.refresh(Request())

else:

flow = InstalledAppFlow.from_client_secrets_file('credentials.json',

['https://www.googleapis.com/auth/drive'])

creds = flow.run_local_server(port=0)

with open('token.json', 'w') as token:

45
token.write(creds.to_json())

# Authenticate and create Drive API client

service = build('drive', 'v3', credentials=creds)

# Calculate SHA-256 hash of the data

hash_value = hashlib.sha256(data.encode('utf-8')).hexdigest()

# Check if the hash value already exists in the cloud provider

# If it does, skip the upload and store the hash value

# If it doesn't, upload the data and store the hash value

if check_cloud_provider(provider, hash_value):

cloud_hashes[provider] = hash_value

else:

file_metadata = {'name': 'backup.txt'}

media = MediaFileUpload('backup.txt', mimetype='text/plain')

file = service.files().create(body=file_metadata, media_body=media,

fields='id').execute()

cloud_hashes[provider] = hash_value

store_cloud_provider(provider, hash_value)

return cloud_hashes

46
def check_cloud_provider(provider, hash_value):

# Check if the hash value already exists in the cloud provider

# Return True if it does, False if it doesn't

pass

def store_cloud_provider(provider, hash_value):

# Store the hash value in the cloud provider

pass

47
REFERENCES

[1]Z. Li, G. Chen and Y. Deng, "Duplicacy: A New Generation of Cloud Backup
Tool Based on Lock-Free Deduplication," in IEEE Transactions on Cloud
Computing, vol. 10, no. 4, pp. 2508-2520, 1 Oct.-Dec. 2022.

[2] P. Habibi, M. Farhoudi, S. Kazemian, S. Khorsandi and A. Leon-Garcia, "Fog


computing: A comprehensive architectural survey", IEEE Access, vol. 8, pp.
69105-69133, 2020.

[3] K. Subramanian and F. L. John, "Dynamic data slicing in multi cloud storage
using cryptographic technique", Proc. World Congr. Comput. Commun. Technol.
(WCCCT), pp. 159-161, Feb. 2017.

[4] L. Gao, T. H. Luan, S. Yu, W. Zhou and B. Liu, "FogRoute: DTN-based data
dissemination model in fog computing", IEEE Internet Things J., vol. 4, no. 1, pp.
225-235, Feb. 2017.

[5] O. A. Nasr, Y. Amer and M. AboBakr, "The ‘droplet’: A new personal device to
enable fog computing", Proc. 3rd Int. Conf. Fog Mobile Edge Compute. (FMEC), pp.
93-99, Apr. 2018.

[6] P. Singh, N. Agarwal and B. Raman, "Secure data deduplication using secret
sharing schemes over cloud", Future Gener. Comput. Syst., vol. 88, pp. 156-167,
Nov. 2018.

[7] S. U. Zaman, R. Karim, M. S. Arefin and Y. Morimoto, "Distributed multi cloud


storage system to improve data security with hybrid encryption" in Intelligent
Computing and Optimization, Cham, Switzerland:Springer, pp. 61-74, 2020.

48
[8] K. Kai, W. Cong and L. Tao, "Fog computing for vehicular ad-hoc networks:
Paradigms scenarios and issues", J. China Universities Posts Telecommun., vol. 23,
no. 2, pp. 56-96, Apr. 2016.

[9] S. Yi, C. Li and Q. Li, "A survey of fog computing: Concepts applications and
issues'', Proc. Workshop Mobile Big Data, pp. 37-42, June. 2015.

[10] K. Kai, W. Cong and L. Tao, "Fog computing for vehicular ad-hoc networks:
Paradigms scenarios and issues", J. China Universities Posts Telecommun., vol. 23,
no. 2, pp. 56-96, Apr. 2016.

[11] F. Tao (2019), "A Smart Manufacturing Service System Based on Edge
Computing, Fog Computing, and Cloud Computing," in IEEE Access, vol. 7, pp.
86769-86777, doi: 10.1109/ACCESS.2019.2923610.

[12] X. Yang, X. Pei, M. Wang, T. Li and C. Wang (2020) "Multi-Replica and Multi-
Cloud Data Public Audit Scheme Based on Blockchain" IEEE Access, vol. 8, pp.
144809-144822, doi: 10.1109/ACCESS.2020.3014510.

[13] K. H. Abdulkareem et al.,(2019), "A Review of Fog Computing and Machine


Learning: Concepts, Applications, Challenges, and Open Issues" IEEE Access, doi:
10.1109/ACCESS.2019.2947542.

[14] K. Dubey, M. Y. Shams, S. C. Sharma, A. Alarifi, M. Amoon and A. A.


Nasr(2019), "A Management System for Servicing Multi-Organizations on
Community Cloud Model in Secure Cloud Environment," in IEEE Access, vol. 7, pp.
159535-159546, 2019, doi: 10.1109/ACCESS.2019.2950110.

[15] X. Li, H. Wang, S. Yi, S. Liu, L. Zhai and C. Jiang(2019), "Disaster-and-


Evacuation-Aware Backup Datacenter Placement Based on Multi-Objective ”.

[16] Mazen Farid, Rohaya Latip,Masnida Hussin,Nor Asilah Wati Abdul Hamid.
(2020) 'Scheduling Scientific Workflow Using Multi-Objective Algorithm with
Institute of electrical and electronics engineers vol.9, pp.24309-24321.

49
[17] S.Hamadah and D.Aqel, "A proposed virtual private cloud-based disaster
recovery strategy", 2019 IEEE Jordan Int. Jt. Conf. Electr. Eng. Inf. Technol. IEEE
2019 - Proc., pp. 469-73, 2019.

[18] Mohammad Alshammari and Ali Alwan, "A Conceptual Framework for Disaster
Recovery and Business Continuity of Database Services in Multi- Cloud", 2017.

[19] Raigonda rani Megha and Tahseen Fatima, "A Cloud Based Automatic
Recovery and Backup System with Video compression", International journal of
engineering and computer science., vol. 5, pp. 17819-17822, 2016.

[20] Taj-aldeen naser abdali,Rosilah hassan, Azana hafizah mohd aman, Quang ngoc
nguyen. (2021) ‘Fog Computing Advancement: Concept, Architecture, Applications,
Advantages, and Open Issues’, Institute of Electrical and Electronics Engineers.

[21] Y. Wei and F. Chen, "ExpanCodes: Tailored LDPC codes for big data storage",
Proc. IEEE 14th Intl Conf. Dependable Autonomic Secure Comput. 14th Intl Conf.
Pervas. Intell. Comput. 2nd Intl Conf. Big Data Intell. Comput. Cyber Sci. Technol.
Congr. (DASC/PiCom/DataCom/CyberSciTech), pp. 620-625, Aug. 2016.

[22] R. Pottier and J.-M. Menaud, "Trustydrive a multi-cloud storage service that
protects your privacy", Proc. IEEE 9th Int. Conf. Cloud Comput. (CLOUD), pp. 937-
940, Jun. 2016.

[23] Y. Wei, F. Chen and D. C. J. Sheng, "ExpanStor: Multiple cloud storage with
dynamic data distribution", Proc. IEEE 7th Int. Symp. Cloud Service Comput. (SC),
pp. 85-90, Nov. 2017.

[24] D. Leibenger and C. Sorge, "SEC-CS: Getting the most out of untrusted cloud
storage", Proc. IEEE 42nd Conf. Local Comput. Netw. (LCN), pp. 623-631, Oct.
2017.

[25] D. Leibenger and C. Sorge, "Triviback: A storage-efficient secure backup


system", Proc. IEEE 42nd Conf. Local Comput. Netw. (LCN), pp. 435-443, Oct.
2017.

50
51

You might also like