Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications
Tracing Applications
Ruoxi Sun Wei Wang Minhui Xue
The University of Adelaide The University of Adelaide The University of Adelaide
Australia Australia Australia
ABSTRACT are currently deployed around the globe. These include the Health
The rapid spread of COVID-19 has made traditional manual contact Code in China [50], the public COVID-19 website in South Ko-
tracing to identify potential persons in close physical proximity to rea [14], and the mobile contact tracing apps released in Singa-
an known infected person challenging. Hence, a number of public pore [17], Israel [45], and Australia [23, 24]. Contact tracing apps
health authorities have experimented with automated contact trac- operate by recording prolonged and close proximity interactions
ing apps. While the global deployment of contact tracing apps aims between individuals by using proximity sensing methods, e.g. Blue-
to protect the health of citizens, these apps have raised security and tooth. The data gathered allows notifications to be generated to
privacy concerns. In this paper, we assess the security and privacy inform persons of a potential exposure to the virus.
of 34 exemplar contact tracing apps using three methodologies: Proponents argue that the low cost and scalable nature of contact
(i) evaluate the design paradigms and the privacy protections pro- tracing apps make them an attractive option for health authorities.
vided; (ii) static analysis to discover potential vulnerabilities and Despite this, contact tracing apps are not universally popular, with
data flows to identify potential leaks of private data; and (iii) evalu- a number of prominent critics. They have proven particularly con-
ate the robustness of privacy protection approaches. Based on the troversial due to potential violations of privacy [38], and security
results, we propose a venue-access-based contact tracing solution, consequences from the mass-scale installation of (rapidly devel-
VenueTrace, which preserves user privacy while enabling proximity oped) apps across entire populations. Despite attempts to alleviate
contact tracing. We hope that our systematic assessment results these concerns by both governments and industry, it is well known
and concrete recommendations can contribute to the development that the anonymization of individual information is a challenging
and deployment of applications against COVID-19 and help gov- problem [22]. This study, to the best of our knowledge, performs
ernments and application development industry build secure and the first security and privacy vetting of contact tracing apps. We
privacy-preserving contact tracing applications. describe our key contributions below:
• We assess the security and privacy of 34 worldwide Android
1 INTRODUCTION contact tracing applications, listed in Table 3. We discover
COVID-19 is now a global pandemic affecting over 200 countries, about 70% of the apps pose potential security risks due to:
after its first recorded outbreak in China in December 2019. To (i) employing cryptographic algorithms that are insecure or
counter its spread, numerous measures have been undertaken by not part of best practice; and (ii) storing sensitive informa-
public health authorities, e.g. quarantining of people, lock-downs, tion in clear text that could be potentially read by attack-
curfews, physical distancing, and mandatory use of face masks. ers. Over 60% of apps pose vulnerabilities through Manifest
Identifying those who have been in close contact with infected weaknesses, e.g. allowing permissions for backup (hence, the
individuals, followed by self-isolation (so called contact tracing) has copying of potentially unencrypted application data). Fur-
proven particularly effective [48]. Consequently, contact tracing ther, we identify that approximately 75% of the apps contain
has emerged as a key tool to mitigate the spread. However, manual at least one tracker, potentially causing serious privacy leak-
contact tracing, using an army of “detectives” is not trivial and has age, i.e. data leaks that lead to exposing private information,
proven challenging for many countries, e.g. UK and Italy. Notably, to third parties. To facilitate further research, we will pub-
it is difficult due to the rapid and exponential growth patterns of licly release the dataset, the scripts developed for analysis,
the virus and the increased demands on qualified human resources. and security assessment reports in due course.
After 5 months of the pandemic, the number of daily new case has • We analyze user privacy exposure and privacy protections
increased more than 80 fold.1 Thus, in many countries it has become provided by 10 solutions—covering 3 different frameworks
extremely difficult to perform manual contact tracing [20, 25, 37]. (PACT [19], Covid Watch [60], and PEPP-PT [57]), the Coron-
Government authorities around the world, together with indus- avirus Disease-19 website [14], and 6 applications — from 7
try, have sought to address the challenge by developing contact countries around the world. We establish a threat model and
tracing applications and services. A plethora of apps and services analyze the vulnerabilities of the apps to multiple privacy
1 1,354 confirmed per day in the first 30 days from 11 January to 10 February 2020 and
attacks. The results demonstrate that there is no solution
109,615 confirmed per day in the recent 30 days from 13 May to 12 June 2020—using that is able to protect users’ privacy against all of attacks
data released by the World Health Organization (WHO). investigated. The replay attack, in which a malicious user
1
Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, and Damith Ranasinghe
can replay valid identifiers to redirect all the traffic from • Level II: “Tokens are shared with proximity users”, a medium
one place to another to virtually or digitally alter the foot- exposure level with only tokens containing no Personal Iden-
print of the contact, could result in the targeted area being tifiable Information (PII) exchanged between users.
incorrectly locked-down due to false information. Generally, • Level III: “Tokens are shared with the server”, a medium
Bluetooth-based decentralized solutions that avoid direct exposure level with tokens exposed to the server.
location tracking outperform centralized systems. • Level IV: “PII is shared with a server”, a high risk exposure
• We synthesize the findings from our extensive COVID-19 level in which the users’ PII is shared with the server.
contact tracing app vetting exercises to: (i) provide best prac- • Level V: “PII is published to public”, the highest risk exposure
tice security and privacy guidance to governments and app level.
industry; and (ii) recommend a novel decentralized venue-
accessing-based contact tracing approach, termed Venue-
Trace, to overcome potential privacy issues highlighted in 2.2 Analysis of Design Paradigms and User
the state-of-the-practice solutions. Our VenueTrace proposal Privacy Exposure
has the capability to significantly increase the privacy pro-
We select 10 well known contact tracing solutions—including both
tections for citizens whilst being securely implemented.
current apps and proposed frameworks. Table 1 presents an overview
We have disclosed our findings and detailed security and privacy of the 10 selected solutions. Four of the selected solutions are de-
risk reports to the related stakeholders on 23 May 2020, at 11 am, velopments of apps used from the early stages of the pandemic
UTC. We have received acknowledgements from numerous vendors, (supported by governments such as China, South Korea, Singapore,
such as MySejahtera (Malaysia), Pakistan’s National Action Plan for and Israel). The one service and a proposed framework from Eu-
COVID-19 (Pakistan), Contact Tracer (USA), and Private Kit (USA). rope were selected because they are the first solutions that enable
We believe our study can provide useful insights for governments, anonymous identifier exchange. We have also selected three solu-
developers and researchers in the software industry to develop tions that are about to or have already been deployed from North
secure and privacy-preserving contact tracing apps. We hope the America and one app deployment from Oceania. As summarized in
results and the proposed contact tracing approach will contribute Table 2, all 10 solutions have user privacy exposure to some extent.
to increasing the trustworthiness of solutions to contain infectious We next discuss them in the context of the two broad categories
diseases now and in the future. of: (i) centralized architectures; and (ii) decentralized/distributed
architectures.
2 CONTACT TRACING APPLICATIONS Centralized solutions. Many solutions utilize a centralized sys-
A range of contact tracing applications (or “apps”) are used world- tem in which the central server is responsible for: (i) collecting
wide. Given the large number of contact tracing apps and proposed the contact records from diagnosed users; and (ii) health status
solution frameworks, we survey a representative sample to more evaluation for users and at-risk user determination. In some East
broadly study the architectures employed, design paradigms and Asian countries, e.g. China and South Korea, where the outbreaks
the privacy exposure of the user groups we identified in Section 2.1. first occurred, contact tracing systems were quickly developed and
released. The systems helped health authorities to successfully
2.1 Users Groups and Privacy Exposure control the spread of COVID-19, but a huge amount of Personal
Identifiable Information (PII) was collected.
We define model user groups and privacy exposure levels to aid our
In South Korea, the Coronavirus Disease-19 website [14] (#7 in Ta-
investigations into app architectures (in Section 2.2) and privacy
ble 1) is supported by the Ministry of Health and Welfare. Although
vetting (in Section 4). We envisage three groups of contact tracing
Alice’s privacy is protected as no data is required from her, the sys-
app users, based on their health status. We describe the three users
tem publishes Bob’s information to the public (marked as Level V in
groups below and analyze their privacy exposure in Section 2.2.
Table 2). The information exposed includes gender, nationality, age,
• Generic user. A typical user of the contact tracing system, diagnosis date, hospital, and movement history (removed in the
who is healthy or has not been diagnosed yet. latest version). This directly puts Bob at risk of being re-identified,
• At-risk user. Alice, who has recently been in contact with raising serious privacy concerns. For example, as reported by The
an infected user, Bob. Ideally, Alice will receive an at-risk Washington Post [36], in Cheonan, a city in South Korea, a text
alarm from her application. alert to residents showed that an infected person visited “Imperial
• Diagnosed user. As a diagnosed patient, Bob will be asked Foot Massage at 13:46 on 24 February”.
to reveal his private information as well as the information In China, QR-code contact tracing apps were developed by the
of at-risk users to the health authorities, e.g. the diagnosis two Tech companies, Alibaba and Tencent [50] (#2). The apps use
of his infection, his movement history, the persons he has a colour code to present the health condition of an individual–
been in contact with. green implies people can travel freely, while yellow or red indicates
they must report to the authorities. Users need to provide their
We define user group exposures in different apps using five
name, national ID, and phone number to register and use the app
levels:
to enter public places, e.g. the metro stations, supermarkets, and
• Level I: “No data is shared with a server or users”, the most airports. The apps are mandatory and jointly developed by govern-
secure level in which there is no user data shared. ment departments and supported by data from health and transport
2
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications
Table 1: Representative state-of-the-practice solutions from seven countries and five continents.
ID Country Name Developer Private/State Technique Architecture
1 Australia COVIDSafe Australian Department of Health State Bluetooth Centralized
2 China Health Code Alibaba Private QR code Centralized
3 Europe PEPP-PT∗ International consortium Private Bluetooth Centralized
4 Europe DP3T International consortium Private Bluetooth Decentralized
5 Israel HaMagen Ministry of Health State Location Decentralized
6 Singapore TraceTogether Government Technology Agency State Bluetooth Centralized
7 South Korea Coronavirus Disease-19 Ministry of Health and Welfare State Location Centralized
8 USA Covid Watch∗ Standford University Private Bluetooth Decentralized
9 USA Private Kit Massachusetts Institute of Technology Private GPS+Bluetooth Decentralized
10 USA PACT∗ University of Washington Private Bluetooth Decentralized
∗ Note: frameworks with no application implemented
Table 2: User exposure. The solutions that expose PII may not work well in countries
Solutions Generic At-risk Diagnosed Architecture with different societal norms. Thus, many western countries devel-
oped solutions with no PII related information exchange, e.g. PEPP-
COVIDSafe Centralized
Health Code Centralized PT [57]. In PEPP-PT, an ephemeral user ID is implemented based
PEPP-PT Centralized on a seed randomly generated by a user device. Users will exchange
DP3T Decentralized ephemeral IDs, instead of encrypted PII messages, to record a prox-
HaMagen Decentralized
TraceTogether Centralized imity contact event, thus reducing the privacy exposure of diag-
Coronavirus Disease-19 Centralized nosed users to Level III where tokens are shared with servers.
Covid Watch Decentralized
PACT Decentralized Decentralized solutions. The second type of solution is decentral-
Private Kit Decentralized ized, where: (i) the back-end server is only responsible for collecting
Level I : No data is shared with servers or users, the identifiers that was used by diagnosed users, e.g. the broadcast
Level II : Token shared with proximity users, Level III : Token shared with the server,
token, from diagnosed patients; and (ii) the health evaluation is
Level IV : PII shared with the server, Level V : PII is published to public.
conducted on users’ devices, locally. This design prevents the cen-
authorities. In this solution, for all types of users, their privacy in- tral server from knowing the infected person, and their physical
formation will be shared with the central server. Thus, we evaluate proximity contacts.
the user exposure of Health Code as Level IV in Table 2. However, Some decentralized systems rely on location information. For
although such solutions may collect and expose more privacy infor- example, Hamagen (#5), an app provided by the Israeli Ministry of
mation than other solutions we discuss later, a public perceptions Health, obtains but does not share location data from the user’s
survey [41] indicates that the users in USA prefer centralized sys- phone and compares it with the information stored in a central
tems that share diagnosed users’ recent locations in public venues. server regarding the location histories of confirmed cases. As no
TraceTogether [17] from Singapore is the first solution that uses data is shared before diagnosis, the exposure level of at-risk users
Bluetooth technology. Bluetooth-based solutions rely on proximity are evaluated as Level I in Table 2. However, the diagnosed users
tracing via Bluetooth broadcasts from apps. As these occur exclu- will be notified and given the option of reporting their exposure
sively between devices in proximity, these methods provide more to the Health Ministry by filling out a form; subsequently, their
scope for privacy-preserving computations compared to those that location trails are released to public.
use GPS locations (Coronavirus Disease-19 and Hamagen [45]). In Other Bluetooth solutions, e.g. DP3T [58] (#4), Covid Watch [60]
TraceTogether, proximity between two users is measured through (#8), and PACT [19] (#10), implement decentralized designs to allow
the Bluetooth broadcast signals and encrypted user information is users to download diagnosed anonymous identifiers from the back-
stored on mobile devices. Once diagnosed, the user will be asked to end server and compare with local records to obtain their risk of
upload their local on-device records to the Ministry of Health with exposure to the virus. The design paradigm reduces the exposure
the authority to decrypt the data and obtain the mobile numbers level of at-risk users to Level II and the level of diagnosed users to
of the user’s close contacts within a period of time (e.g. 21 days) Level III in Table 2 as no PII is shared by users. Apple and Google
that covers the incubation period of the virus. Such a centralized have released the “privacy-preserving contact tracing” API, which
BLE-based solution preserves more personal privacy as the data can support building decentralized contact-tracing apps [2].
exchanged between users is not related to absolute location in- Another application, Private Kit [8] (#9), a decentralized solution
formation. COVIDSafe [24] from Australia also utilizes a similar developed by Raskar et al. [52], enables individuals to log their own
technique. However, considering that the PII, e.g. phone numbers, location information. Particularly, Private Kit also allows Bluetooth
is collected by the government, the at-risk and diagnosed users’ broadcasts between users to enable direct notification between
PII is exposed to the central server (Level IV in Table 2), while the users. As the sharing of diagnosed users’ location trails and the
exposure of generic users still remains in Level II as the tokens are broadcasts between users is privacy protected, we determine Private
only shared between users. Kit’s user exposure levels to be the same as other Bluetooth-based
decentralized solutions.
3
Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, and Damith Ranasinghe
Figure 1: Overview of our security vetting methods. Importantly, we also augment the analysis with manual inspections.
are used to receive or query specific values but do not contain sen-
sitive information, are stored as constant values. However, MobSF
regards all constant string values as potential clear text storage.
Some applications, e.g. Coronavirus UY (Uruguay), create template
files while decompressing and loading multiple dex files in order
to avoid the 64K reference limit [44]. Other applications, e.g. CG
Covid-19 ePass (India), are able to scan other’s barcode and save it
into temporary files in order to read the content. However, these
behaviours are mis-regarded as temporary file leakage by MobSF.
All these false-positives were removed through manual inspections
Figure 2: Code analysis results.
from our further analysis.
Figure 2 shows that the most frequent weakness detected by
static analysis is the “Risky Cryptography Algorithm”. Over 90% of
Table 5: Trackers identified in contact tracing apps.
apps use at least one of the deprecated cryptographic algorithms, e.g.
MD5 and SHA-1. For instance, in the app MySejahtera (Malaysia), Trackers # Apps Percentage
the parameters in WebSocket requests are combined and encrypted
with MD5 which will be compared with the content from requests Google Firebase 25 71.4%
Google CrashLytics 6 17.1%
in the class Draft_76 in order to verify the validity of connections.
Other Google trackers 4 11.4%
Although this has been listed in the top 10 OWASP [7] mobile Facebook trackers 3 8.6%
risks 2016, the results show that it is still a common security issue. Other trackers 9 25.7%
Another frequent weakness is “Clear Text Storage” (files may con-
tain hard-coded sensitive information like usernames, passwords, methods calling from Location and database.Cursor. These may
keys etc.). In class DataBaseSQL of COVID-19 (Vietnam) app, the obtain sensitive information from a geographic location sensor or
password of SQLite database is stored in the source code without en- from a database query. Most of the sensitive data will be transferred
cryption; CG Covid-19 ePass (India) also hard-coded its encryption to sinks, such as Bundle, Service, and OutputStream, which may
key in class Security. leak sensitive information out of apps. As discussed previously,
In total, 20 trackers have also been identified, including Google sending sensitive information to the Bundle object may reveal sen-
Firebase Analytics, Google CrashLytics, and Facebook Ana- sitive data to other activities. Besides, developers usually utilize
lytics. Approximately 75% of the apps contain at least one tracker. Log to print debugging information into Logcat [4] panel. How-
As shown in Table 5, the most frequent tracker is Google Fire- ever, human errors from developers can lead to mistakenly print
base Analytics which is identified in more than 70% of the apps. sensitive data. Notably, we discover that some apps transmit loca-
Notably, a research study [39] argues that TraceTogether using tion information through SMS messages. Considering Hamagen
Google’s Firebase service to store user information may leak user’s (Israel) as an example, location information is detected and obtained
privacy to third parties, such as Google. In the most extreme case, a by a source method initialize(Context,Location,e) and then
contact tracing app, the Contact Tracing (USA), contains 8 trackers. flows to a sink method where Handler.sendMessage(Message)
Data flow analysis. Figure 3 presents the flow of data between is called. This is a potential vulnerability as malware could easily
sources and sinks. This is counted by the number of source-to-sink intercept the outbox of Android SMS service [15].
paths found in each apps. The top sources of sensitive data are We also manually vet the FlowDroid results for false positives.
In total, 60 out of 371 paths (16.17%) are false positives (results
5
Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, and Damith Ranasinghe
6
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications
DP3T. According to the static analysis, DP3T’s database is not en- Security guidance 4: To protect the system against false-
crypted, and data is saved in plain text. In contrast to TraceTogether, negatives caused by malfunction, thorough and comprehen-
the app does not implement any root detection capabilities. This sive testings must be carried out. In particular, the situations,
means that a malicious app could possibly access the database di- such as the mobile phone is locked and app are running in
rectly and manipulate the database containing COVID-19 contact the background, should be seriously considered.
records. Potentially, an adversary could spread false-positive.
Besides aforementioned issues, in accordance with the report
Security guidance 2: To protect the database from being
released on 14 May 2020 [46], several vulnerabilities, such as CVE-
dumped and prevent data breaches, a solution should:
2020-12857 and CVE-2020-12858, have been fixed. In CVE-2020-
(1) Implement database encryption [11] and 12857, the COVIDSafe app improperly catches GATT characteristic
(2) Enable root detection [1] and confidential data protec- values, i.e. TempID, for a long time until a successful transaction
tion [10] at application startup. takes place, instead of clearing the values periodically. As the data
could be read by a remote device, if an attacker never completes the
In addition, as the database records timestamps and contact
transaction, he will always obtain the same TempID from a user,
IDs, the leakage of the database from a root device infected by
which may enable the long-term tracking of the user. However, this
a mobile system virus can be exploited to mount linkage attacks
issue has been fixed by removing the entry to catch when a device
by adversaries [9]. If enough data in a region were collected by
is disconnected. The root cause of CVE-2020-12858 is because of the
attackers, contact IDs and timestamps in the database can be used
generation and use of the unchanged advertising payload, which
to analyze movements by comparing data and device owners may
means that an attacker is able to track a device by identifying its
be identified through a linkage attack [54].
advertising payload. In the latest update, the payload will not be
Private Kit. Similar to DP3T, Private Kit does not encrypt the data- cached.
base and contains plaintext data. Besides, the app creates temporary
JSON files to store user’s location data. Without any encryption 4 PRIVACY RISK ASSESSMENT
and root detection, the temporary JSON files can be dumped from
In this section, we describe the privacy analysis we conducted
root devices; thus increasing the risk of privacy leakage.
on the 10 selected contact tracing solutions in Table 1 to assess
Security guidance 3: To prevent potential data breaches, their protection against potential privacy breaches under our threat
tracing records and confidential data must not be stored in model.
temporary files in plain text.
scope of their capabilities. They may utilize some devices, such as that the device will only receive Bluetooth broadcasts from Bob.
a Bluetooth broadcaster or receiver, to attack the system or gain Once Bob is diagnosed, Mallory will receive an at-risk alarm and
extra information. They may also modify the app and impersonate immediately acknowledge that the infected patient is Bob. In addi-
a legitimate user to access the system, which is difficult to prevent tion, Mallory can log the timestamp and the received ephemeral
unless remote attestation is applied. ID when in contact with Bob. Once Bob is diagnosed, Mallory is
able to trace back the source of recording and re-identify Bob and
4.2 Potential Attacks potentially infected users. Similar attacks were described as Pa-
parazzi Attack and Nerd Attack in an analysis of DP3T [59]. Note
As discussed previously, the privacy of users is hard to preserve in
that Mallory is able to extend such attacks to Sybil attacks to enable
a contact tracing system. To introduce potential privacy risks, we
the identification and the tracing back of multiple targets at the
will let Alice be an at-risk user, and let Bob be a diagnosed user who
same time. Even worse, if Mallory distributes multiple broadcast
has been in contact with Alice. Mallory will be a malicious attacker,
receivers, which could be also considered as a Sybil attack, in a
and Grace will be the government server (or other authority). Here
large area with some layout, e.g. honeycomb, they could even trace
we discuss four potential attacks. According to our threat model in
the movement of Bob by tracing the records on each device. Thus,
Section 4.1, if an attacker is not able to re-identify a user or inject
none of the 10 typical solutions can fully protect users’ privacy
fake reports to a contact tracing system through a specific privacy
against linkage attacks by Mallory.
attack, we try to determine the system as well-protected to prevent
such an attack; otherwise, the system will be considered as at-risk. Privacy guidance 2: To protect users’ privacy against link-
The vetting results are summarized in Table 7. age attacks by an adversary, a solution should:
Linkage attacks by servers. In centralized systems, the major (1) Avoid data sharing between users or
privacy concern is metadata leakage by the server. For example, in (2) Ensure privacy protections exist for any published data.
Coronavirus Disease-19 website, TraceTogether and COVIDSafe,
a central server is used to collect PII information and to evaluate False positive claims. In some systems, such as Coronavirus Aus-
at-risk individuals. Consequently, Grace will be able to collect a tralia, Bob can register as infected and upload data through the
large amount of PII, such as names, phone numbers, contact lists, contact tracing app to the server, which enables Alice to receive an
post code, home addresses, location trails. Therefore Grace is able at-risk alarm. However, if Mallory exploits such a mechanism and
to deduce the social connections of Alice. Even for PEPP-PT, a registers as a (fake) infected user, Alice will receive a false-positive
centralized Bluetooth system with solutions to avoid PII collection, at-risk alarm, which may cause social panic or negatively impact
the re-identifiable risks still exist. For example, from the server side, evidence-driven public health policies. Most solutions mitigate this
Grace is able to link ephemeral IDs to the corresponding permanent issue by implementing an authorization process, i.e. Bob is only
app identifier and thus trace Alice based on IDs observed in the permitted to upload data after receiving a one-time-use permis-
past, as well as tracing future movements. Thus, no centralized sion code generated by the server. Without the permission code,
solutions in Table 1 can prevent linkage attacks by the server. Mallory is not allowed to claim they are infected and Alice will
In contrast, for decentralized Bluetooth solutions, Alice’s privacy always receive a true at-risk alarm. Only two solutions, i.e. DP3T
is protected as her PII will not be sent to a central server by a and PACT [19], have no authorization process implemented.
diagnosed user and her health status is evaluated on her own device. Privacy guidance 3: To protect a system against false-positive-
Thus, decentralized Bluetooth systems are able to protect users’ claim attacks, a solution should establish an authorisation
privacy against linkage attacks by the server. However, in location- process.
based decentralized systems, e.g. Hamagen, the server learns users’
location trails. Relay attacks.4 To apply such an attack, Mallory could collect
Privacy guidance 1: To protect users’ privacy against link- existing broadcast messages exchanged between users, then replay
age attacks by a server, a contact tracing solution should: it at another time or forward it through proxy devices to a remote
(1) Avoid sharing PII information with central points or location and replay the messages. Due to the lack of message valida-
(2) Implement a decentralized design. tion in solutions that utilize information broadcasts, a user will not
be able to determine whether a received broadcast is from a valid
Linkage attacks by users. Linkage attacks, performed by Mallory, source or from a malicious device. Any received broadcast will be
try to re-identify Alice or Bob. As discussed previously, in contact recorded as a contact event, even though no actual contact exists.
tracing systems that directly publish users’ PII, e.g. Coronavirus A malicious attacker can potentially redirect all the traffic from
Disease-19, Hamagen, and Health Code, Bob is at the risk of privacy one place to another, resulting in a targeted area being incorrectly
leakage. For example, in Coronavirus Disease-19, Mallory could locked-down by replaying fake information.
be able to re-identify Bob as he will know Bob’s gender, age, and For example, suppose Mallory records the broadcasts from Bob
location history from public information. and then replays it hours later, or transmits it to a remote location
The other 7 systems listed in Table 1 further rely on information and replays the messages to Alice. Alice’s device, although not
exchange between users. In number of the apps, e.g. DP3T, which actually being in contact with Bob, will receive and record the
implement an ephemeral ID design, Mallory is still able to identify replayed broadcast in its local log. Once Bob is diagnosed, Alice will
Bob using more advanced attacks. For example, if Mallory places a 4 The combination of man-in-the-middle and replay attacks are henceforth referred to
Bluetooth receiver near Bob’s home or working place and ensures as a relay attack.
8
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications
Table 7: Privacy protections against the attacks. at setup, each broadcast device will register its MAC address to a
Solutions Linkage-Server Linkage-User False-Claim Relay back-end server and get a unique VenueID which will be broadcast
through Bluetooth at every Time Interval T . Hence, users are not
CovidSafe
Health Code required to broadcast information.
PEPP-PT
DP3T Applications installed in the user’s phone. For every VenueID
HaMagen broadcast by a device in its proximity at time t, after receiving the
TraceTogether broadcast VenueID, a pair (ID, t) will be created and a timer will be
Coronavirus
Disease-19 started in the user application. If a user receives the VenueID after
Covid Watch T again, the user stores a tuple which satisfies the following:
Private Kit
PACT (ID, tst ar t , tend ), where tst ar t − tend ≥ T , (1)
: the system is well protected : the system is at-risk where tst ar t is the first timestamp of the received the broadcast
VenueID in the local storage for 14 days; tend is the last timestamp
receive an at-risk alarm even though she has never been in contact of a period over which a user continuously received VenueID. For
with Bob. Such an attack will falsely enlarge the contact range of example, if T is set to 10 minutes and Bob stays in a public place
Bob and create a large amount of false-positive alarms, which may for more than 30 minutes, he will at least receive the broadcast for
cause panic among citizens. Solutions that do not utilize information three times, e.g. t 1 , t 2 , and t 3 . The application will record the tuple
broadcasts, such as Coronavirus Disease-19 and Hamagen, can (ID, t 1 , t 3 ) in local log, where t 3 - t 1 = 20 minutes, indicating that
avoid relay attacks. he stayed in a public place for at least 20 minutes.
In DP3T, a solution is provided to limit the replay attack by in- Once Bob is diagnosed, with his consent, he will receive a per-
cluding temporal information in the broadcast ephemeral identifiers. mission code from the health authority, then Bob can upload his
However, it cannot effectively prevent replay attacks occurring at log (ID Bob , ts Bob , te Bob ) and the permission code to the back-end
the same moment. Another promising solution is to use an ambient server. To ensure the security of data transmission, the data will be
physical sensing approach, e.g. ambient audio. This has been shown encrypted with the Public Key of the back-end server. Every twenty-
to secure proximity detection [28, 56] by comparing the ambient four hours, Alice will download logs in a format of
information embedded in the broadcast messages with the local am- (ID Server , tsServer , teServer ) from the back-end server and a record
bient. It allows a receiver to validate whether the source is nearby match and an evaluation will be conducted locally. A local record,
as the range of Bluetooth broadcast is generally within 50 m. i.e. (ID Alice , tsAlice , teAlice ), is considered as matched if the fol-
Privacy guidance 4: To protect a system against relay at- lowing conditions are true.
tacks, a solution should: ID Alice = ID Server
(1) Either avoid utilizing information broadcast or (2)
(tsAlice , teAlice ) ⊆ (tsServer , teServer ),
(2) Implement a validation approach.
which indicates that Alice has been in a public place during the
period (tsServer , teServer ) and may have been in an at-risk envi-
ronment. If there is a match in Alice’s local log, Alice’s device will
5 OUR RECOMMENDATIONS generate an at-risk alarm.
As discussed in Section 4, a contact tracing application should
Decentralized back-end server. The back-end server supports
preserve the privacy of generic and at-risk users. Although the
the activities of (i) registering the Bluetooth broadcaster by storing
diagnosed user may reveal their privacy to health authorities, we
its MAC addresses and VenueIDs; (ii) generating and authorizing
should not release this data to the public. Furthermore, we argue
permission codes to health authorities; (iii) publishing the public
that a contact tracing solution should focus on tracing anonymous
key and deciphering the data uploaded by diagnosed users with its
daily routines or occasional contacts, instead of close contacts.
private key; (iv) validating the received VenueIDs and permission
Here we propose a venue-accessing-based solution, VenueTrace,
code; and (v) publishing at-risk information to regular users in the
to overcome privacy risks. We will first describe the framework
format of tuples (ID Server , tsServer , teServer ),
of this solution and assess its privacy performance as well as its
limitations. ID Server = ID Bob ,
tsServer = ts Bob − T − random, (3)
5.1 Our Privacy-by-Design: VenueTrace teServer = te Bob + δt + random ′,
The architecture of VenueTrace contact tracing consists of the fol-
where the at-risk time interval is extended based on Bob’s record
lowing modules as described in Figure 4.
by T in the upper limit and δt in the lower limit. A typical δt could
Bluetooth broadcaster in public places. Instead of broadcasting be set to 12 hours to ensure that Alice will be informed that she
from each user’s mobile phone, the VenueTrace proposes utilizing has been in an at-risk venue where she may have touched a virus-
Bluetooth broadcast devices installed in public places, e.g. restau- contaminated surface or inhaled airborne droplets, even though she
rants, movie theaters, working places, and public transport hubs has never been in close physical contact with Bob. We could also fur-
and stops. This is in contrast to most existing Bluetooth solutions ther extend the visiting duration with random noise to blur the time-
that rely on human-to-human contact. To facilitate contact tracing, line. For example, if Bob visited a public place from 9 am to 10 am,
9
Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, and Damith Ranasinghe
the released infected duration could be from 8:30 am to 11:15 pm, linkage attacks by users. In the worst case, the attackers may phys-
where T = 5 mins, random = 25 mins, δt = 12 hours, random ′ = ically visit public places and record the venue IDs, then they may
75 mins. Considering the time-related functionality of a public place, link venue IDs to locations. After an at-risk alarm, the attacker may
this duration could be further capped. perceive where a diagnosed user appeared. However, the attackers
still does not have enough information to re-identify the infected
individuals, unless they log all persons having appeared in multi-
ple public places for a long period and is able to infer the persons
5.2 Defending Against Attacks matching the timeline. Thus, without information shared between
A major flaw in many implemented and proposed applications is users, the linkage attack by users and real-time movement tracks
that they cannot fully guarantee user privacy when using user- are impossible.
provided data. Of the applications discussed in this paper, this issue False positives. Aside from real people misreporting their symp-
is particularly prevalent with centralized solutions, e.g. TraceTo- toms, applications that rely on diagnosed users’ information are at
gether, as they may suffer from linkage attacks not only by users risk of malicious false-positive claim. For location-based applica-
but also by the server. However, any application that requires symp- tions, these attacks are very effective. For a GPS based system, an
tom reporting from its user base could potentially be vulnerable, as attacker could spoof a series of GPS coordinates to an app. If no
discussed in Section 4. authentication measures are implemented, uploading such spoofed
Decentralized computation. Compared to centralized systems, GPS data to the server can cause potential havoc across the system.
our solution has the inherent advantage of decentralized systems, This scenario, in retrospect, emphasizes the design of permission
that is, users’ privacy is not exposed to the server. The back-end code in our recommendation. A user is allowed to register as in-
server will only receive the timestamp and VenueID of a public fected only after being authorized with a permission code, which
place that is visited by diagnosed users. Supposing a malicious prevents false-positive claims.
attacker successfully extracts data from the back-end server, he is Further considerations. To protect users’ privacy against relay at-
still not able to link the information to any location or users as tacks, the VenueTrace solution can be updated by including blurred
there is no location information stored in the server. Thus, the user location information in broadcasting. As a relay attack will replay a
privacy is protected against the linkage attack by a server. VenueID in another remote place and thereby expanding the broad-
Coarse-grained location. Furthermore, in contrast to location cast range and causing false-positive alarms, we can distinguish
based solutions (e.g. Hamagen) our solution does not utilize GPS fake broadcasts by combining the location of broadcasters in the
information as Bluetooth is more advantageous than GPS signals in broadcast message. When the user receives a broadcast, the appli-
high risk indoor environments. However, considering the extension cation can parse out the location of the broadcaster and compare it
and blurring in timelines, the back-end server will only receive and with the location of a receiver to filer those bounded by a distance,
publish venue IDs with at most coarse-grained location information. e.g. greater than 1 km. This allows eliminating such replay forgeries.
In addition, our solution overcomes the limitation of location-based As the original intention of the VenueTrace solution is to prevent
tracing by installing the broadcaster in public venues and transports the back-end server from obtaining locations of broadcasters and
in contrast to user devices. users, we can only add vague location information to the broadcast
information, such as adding a 1 km error, to prevent the relay attack
No token exposure. Many Bluetooth-based decentralized systems while preserving users’ privacy. However, this, to some extent,
provide a privacy preserving solution by only sharing the tempo- weakens the location privacy protections of our solutions.
rary tokens between users. However, it is still amenable to linkage
attacks by users. Compared to other decentralized solutions, our
solution further preserves user privacy as no information is ex-
changed between users. Consequently, our system is immune to
10
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications
[21] Hyunghoon Cho, Daphne Ippolito, and Yun William Yu. 2020. Contact tracing [45] Israel Ministry of Health. 2020. Hamagen. https://govextra.gov.il/
mobile apps for COVID-19: Privacy considerations and related trade-offs. arXiv ministry-of-health/hamagen-app/,2020,accessed:2020-04-23
preprint arXiv:2003.11511 (2020). [46] Jim Mussared. [n.d.]. Privacy issues discovered in the BLE implementation
[22] Chris Culnane and Kobi Leins. 2019. Misconceptions in privacy protection and of the COVIDSafe Android app. https://docs.google.com/document/
regulation. Law in Context. A Socio-legal Journal 36, 2 (2019), 1–12. d/1u5a5ersKBH6eG362atALrzuXo3zuZ70qrGomWVEC27U/edit#heading=h.
[23] Australia Department of Health. 2020. Coronavirus Australia app. q4sovraiy4kn
https://www.health.gov.au/resources/apps-and-tools/coronavirus- [47] Richard Nelson. [n.d.]. COVIDSafe iOS 1.5 bugs. https://docs.google.
australia-app,2020,accessed:2020-04-23 com/document/d/1dsSxC48cJ91X17PoOybpun1U163YDxxL0CDk3kmAHvY/
[24] Australia Department of Health. 2020. COVIDSafe. https://www.health.gov. mobilebasic
au/resources/apps-and-tools/covidsafe-app,2020,accessed:2020- [48] World Health Organization. 2020. Operational considerations for
04-23 case management of COVID-19 in health facility and community.
[25] Luca Ferretti, Chris Wymant, Michelle Kendall, Lele Zhao, Anel Nurtay, Lucie https://apps.who.int/iris/bitstream/handle/10665/331492/WHO-
Abeler-Dörner, Michael Parker, David Bonsall, and Christophe Fraser. 2020. Quan- 2019-nCoV-HCF_operations-2020.1-eng.pdf
tifying SARS-CoV-2 transmission suggests epidemic control with digital contact [49] Charlie Osborne. [n.d.]. New ransomware masquerades as COVID-19 contact-
tracing. Science (2020). tracing app on your Android device. https://www.zdnet.com/article/new-
[26] Tara Gould, Gage Mele, Parthiban Rajendran, and Rory Gould. [n.d.]. crycryptor-ransomware-masquerades-as-covid-19-contact-tracing-
Anomali threat research identifies fake COVID-19 contact tracing apps app-on-your-device/
used to download malware that monitors devices, steals personal data. [50] Raymond Zhong Paul Mozur and Aaron Krolik. 2020. In Coron-
https://www.anomali.com/blog/anomali-threat-research-identifies- avirus Fight, China Gives Citizens a Color Code, With Red Flags.
fake-covid-19-contact-tracing-apps-used-to-monitor-devices- https://www.nytimes.com/2020/03/01/business/china-coronavirus-
steal-personal-data surveillance.html,2020,accessed:2020-05-07
[27] Yaron Gvili. 2020. Security analysis of the covid-19 contact tracing specifications [51] Iasonas Polakis, George Argyros, Theofilos Petsios, Suphannee Sivakorn, and
by apple inc. and google inc. Technical Report. Cryptology ePrint Archive, Report Angelos D Keromytis. 2015. Where’s Wally? Precise user discovery attacks in
2020/428. location proximity services. In Proceedings of the 22nd ACM SIGSAC Conference
[28] Tzipora Halevi, Di Ma, Nitesh Saxena, and Tuo Xiang. 2012. Secure proximity on Computer and Communications Security. 817–828.
detection for NFC devices based on ambient sensor data. In European Symposium [52] Ramesh Raskar, Isabel Schunemann, Rachel Barbar, Kristen Vilcans, Jim Gray,
on Research in Computer Security. Springer, 379–396. Praneeth Vepakomma, Suraj Kapa, Andrea Nuzzo, Rajiv Gupta, Alex Berke, et al.
[29] Zhiyun Qian Hang Zhang, Dongdong She. 2015. Android Root and its Providers: 2020. Apps gone rogue: Maintaining personal privacy in an epidemic. arXiv
A Double-Edged Sword. (2015). https://dl.acm.org/doi/abs/10.1145/ preprint arXiv:2003.08567 (2020).
2810103.2813714 [53] Siegfried Rasthofer, Steven Arzt, and Eric Bodden. 2014. A machine-learning
[30] Ren He, Haoyu Wang, Pengcheng Xia, Liu Wang, Yuanchun Li, Lei Wu, Yajin approach for classifying and categorizing Android sources and sinks. In NDSS,
Zhou, Xiapu Luo, Yao Guo, and Guoai Xu. 2020. Beyond the virus: A first look at Vol. 14. Citeseer, 1125.
Coronavirus-themed mobile malware. arXiv preprint arXiv:2005.14619 (2020). [54] Yilin Shen, Fengjiao Wang, and Hongxia Jin. 2014. Defending against user identity
[31] Yangyu Hu, Haoyu Wang, Li Li, Yao Guo, Guoai Xu, and Ren He. 2019. Want linkage attack across multiple online social networks. In Proceedings of the 23rd
to earn a few extra bucks? A first look at money-making apps. In 2019 IEEE International Conference on World Wide Web. 375–376.
26th International Conference on Software Analysis, Evolution and Reengineering [55] Milan Stute, Sashank Narain, Alex Mariotto, Alexander Heinrich, David Kre-
(SANER). IEEE, 332–343. itschmann, Guevara Noubir, and Matthias Hollick. 2019. A billion open interfaces
[32] Apple Inc and Google Inc. [n.d.]. Exposure Notification-Bluetooth Speci- for Eve and Mallory: MitM, DoS, and tracking attacks on iOS and macOS through
fication. https://www.blog.google/documents/58/Contact_Tracing_- Apple Wireless Direct Link. In 28th {USENIX } Security Symposium ( {USENIX }
_Bluetooth_Specification_v1.1_RYGZbKW.pdf Security 19). 37–54.
[33] Apple Inc and Google Inc. [n.d.]. Exposure Notification-Cryptography Spec- [56] Yvonne Taunton. 2020. Smartphone-based automated contact
ification. https://www.blog.google/documents/56/Contact_Tracing_- tracing: Is it possible to balance privacy, accuracy and security?
_Cryptography_Specification.pdf https://www.uab.edu/news/research/item/11299-smartphone-based-
[34] Eleanor McMurtry Jim Mussared. [n.d.]. The COVIDSafe App automated-contact-tracing-is-it-possible-to-balance-privacy-
- 4 week update. https://docs.google.com/document/d/ accuracy-and-security,2020,accessed:2020-05-07
17sVyBIG5CqhF9XtuEfeG2MfYsFNXuV4yxp3BERDTJoI/preview# [57] The PEPP-PT team. 2020. PEPP-PT High Level Overview. https:
[35] Meggin Kearney. [n.d.]. Remote Debugging WebViews. https://developers. //github.com/pepp-pt/pepp-pt-documentation/blob/master/PEPP-PT-
google.com/web/tools/chrome-devtools/remote-debugging/webviews high-level-overview.pdf
[36] Min Joo Kim and Simon Denyer. [n.d.]. A ‘travel log’ of the times [58] Carmela Troncoso, Mathias Payer, Jean-Pierre Hubaux, Marcel Salathé, James
in South Korea: Mapping the movements of coronavirus carriers. Larus, Edouard Bugnion, Wouter Lueks, Theresa Stadler, Apostolos Pyrgelis,
https://www.washingtonpost.com/world/asia_pacific/coronavirus- Daniele Antonioli, et al. 2020. Decentralized privacy-preserving proximity tracing.
south-korea-tracking-apps/2020/03/13/2bed568e-5fac-11ea-ac50- arXiv preprint arXiv:2005.12273 (2020).
18701e14e06d_story.html [59] Serge Vaudenay. 2020. Analysis of DP3T Between Scylla and Charybdis. (2020).
[37] Siddique Latif, Muhammad Usman, Sanaullah Manzoor, Waleed Iqbal, Junaid https://eprint.iacr.org/2020/399.pdf
Qadir, Gareth Tyson, Ignacio Castro, Adeel Razi, Maged N. Kamel Boulos, Adrian [60] Sydney Von Arx and Daniel Blank. 2020. Slowing the Spread of Infectious
Weller, and Jon Crowcroft. 2020. Leveraging Data Science To Combat COVID-19: Diseases Using Crowdsourced Data. Covid Watch (2020).
A Comprehensive Review. (4 2020). https://doi.org/10.36227/techrxiv. [61] Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, and Ben Y
12212516.v1 Zhao. 2016. Defending against Sybil devices in crowdsourced mapping services.
[38] Kobi Leins, Chris Culnane, and Benjamin IP Rubinstein. 2020. Tracking, tracing, In Proceedings of the 14th Annual International Conference on Mobile Systems,
trust: Contemplating mitigating the impact of COVID-19 through technological Applications, and Services. 179–191.
interventions. The Medical Journal of Australia (2020), 1. [62] Minhui Xue, Cameron Ballard, Kelvin Liu, Carson Nemelka, Yanqiu Wu, Keith
[39] Douglas J Leith and Stephen Farrell. 2020. Coronavirus Contact Tracing App Ross, and Haifeng Qian. 2016. You can yak but you can’t hide: Localizing anony-
Privacy: What Data Is Shared By The Singapore OpenTrace App? (2020). mous social network users. In Proceedings of the 2016 Internet Measurement
[40] Jinfeng Li and Xinyi Guo. 2020. COVID-19 Contact-tracing Apps: A Survey on Conference. 25–31.
the Global Deployment and Challenges. arXiv preprint arXiv:2005.03599 (2020). [63] Minhui Xue, Yong Liu, Keith W Ross, and Haifeng Qian. 2016. Thwarting location
[41] Tianshi Li, Cori Faklaris, Jennifer King, Yuvraj Agarwal, Laura Dabbish, Jason I privacy protection in location-based social discovery services. Security and
Hong, et al. 2020. Decentralized is not risk-free: Understanding public perceptions Communication Networks 9, 11 (2016), 1496–1508.
of privacy-utility trade-offs in COVID-19 contact-tracing apps. arXiv preprint
arXiv:2005.11957 (2020).
[42] Joseph K Liu, Man Ho Au, Tsz Hon Yuen, Cong Zuo, Jiawei Wang, Amin Sakzad,
Xiapu Luo, and Li Li. [n.d.]. Privacy-Preserving COVID-19 contact tracing app: a
zero-knowledge proof approach. ([n. d.]).
[43] Tianming Liu, Haoyu Wang, Li Li, Xiapu Luo, Feng Dong, Yao Guo, Liu Wang,
Tegawendé Bissyandé, and Jacques Klein. 2020. MadDroid: Characterizing and
Detecting Devious Ad Contents for Android Apps. In Proceedings of The Web
Conference 2020. 1715–1726.
[44] Caio Milani. [n.d.]. Understanding Multidex in Android. https://blog.
mindorks.com/understanding-multidex-in-android
12
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications