KEMBAR78
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications | PDF | Privacy | Malware
0% found this document useful (0 votes)
96 views13 pages

Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications

This document analyzes the security and privacy of 34 COVID-19 contact tracing applications from around the world. The analysis finds that around 70% of the apps have potential security risks, such as using insecure cryptography or storing sensitive data without encryption. Over 60% have vulnerabilities through excessive permissions. Additionally, around 75% of the apps contain trackers that could leak private user data to third parties. The document proposes a new venue-access based contact tracing solution called VenueTrace that aims to preserve privacy while still enabling proximity tracing.

Uploaded by

cdmaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views13 pages

Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications

This document analyzes the security and privacy of 34 COVID-19 contact tracing applications from around the world. The analysis finds that around 70% of the apps have potential security risks, such as using insecure cryptography or storing sensitive data without encryption. Over 60% have vulnerabilities through excessive permissions. Additionally, around 75% of the apps contain trackers that could leak private user data to third parties. The document proposes a new venue-access based contact tracing solution called VenueTrace that aims to preserve privacy while still enabling proximity tracing.

Uploaded by

cdmaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Vetting Security and Privacy of Global COVID-19 Contact

Tracing Applications
Ruoxi Sun Wei Wang Minhui Xue
The University of Adelaide The University of Adelaide The University of Adelaide
Australia Australia Australia

Gareth Tyson Seyit Camtepe Damith Ranasinghe


Queen Mary University of London CSIRO-Data61 The University of Adelaide
United Kingdom Australia Australia
arXiv:2006.10933v3 [cs.CR] 22 Jul 2020

ABSTRACT are currently deployed around the globe. These include the Health
The rapid spread of COVID-19 has made traditional manual contact Code in China [50], the public COVID-19 website in South Ko-
tracing to identify potential persons in close physical proximity to rea [14], and the mobile contact tracing apps released in Singa-
an known infected person challenging. Hence, a number of public pore [17], Israel [45], and Australia [23, 24]. Contact tracing apps
health authorities have experimented with automated contact trac- operate by recording prolonged and close proximity interactions
ing apps. While the global deployment of contact tracing apps aims between individuals by using proximity sensing methods, e.g. Blue-
to protect the health of citizens, these apps have raised security and tooth. The data gathered allows notifications to be generated to
privacy concerns. In this paper, we assess the security and privacy inform persons of a potential exposure to the virus.
of 34 exemplar contact tracing apps using three methodologies: Proponents argue that the low cost and scalable nature of contact
(i) evaluate the design paradigms and the privacy protections pro- tracing apps make them an attractive option for health authorities.
vided; (ii) static analysis to discover potential vulnerabilities and Despite this, contact tracing apps are not universally popular, with
data flows to identify potential leaks of private data; and (iii) evalu- a number of prominent critics. They have proven particularly con-
ate the robustness of privacy protection approaches. Based on the troversial due to potential violations of privacy [38], and security
results, we propose a venue-access-based contact tracing solution, consequences from the mass-scale installation of (rapidly devel-
VenueTrace, which preserves user privacy while enabling proximity oped) apps across entire populations. Despite attempts to alleviate
contact tracing. We hope that our systematic assessment results these concerns by both governments and industry, it is well known
and concrete recommendations can contribute to the development that the anonymization of individual information is a challenging
and deployment of applications against COVID-19 and help gov- problem [22]. This study, to the best of our knowledge, performs
ernments and application development industry build secure and the first security and privacy vetting of contact tracing apps. We
privacy-preserving contact tracing applications. describe our key contributions below:
• We assess the security and privacy of 34 worldwide Android
1 INTRODUCTION contact tracing applications, listed in Table 3. We discover
COVID-19 is now a global pandemic affecting over 200 countries, about 70% of the apps pose potential security risks due to:
after its first recorded outbreak in China in December 2019. To (i) employing cryptographic algorithms that are insecure or
counter its spread, numerous measures have been undertaken by not part of best practice; and (ii) storing sensitive informa-
public health authorities, e.g. quarantining of people, lock-downs, tion in clear text that could be potentially read by attack-
curfews, physical distancing, and mandatory use of face masks. ers. Over 60% of apps pose vulnerabilities through Manifest
Identifying those who have been in close contact with infected weaknesses, e.g. allowing permissions for backup (hence, the
individuals, followed by self-isolation (so called contact tracing) has copying of potentially unencrypted application data). Fur-
proven particularly effective [48]. Consequently, contact tracing ther, we identify that approximately 75% of the apps contain
has emerged as a key tool to mitigate the spread. However, manual at least one tracker, potentially causing serious privacy leak-
contact tracing, using an army of “detectives” is not trivial and has age, i.e. data leaks that lead to exposing private information,
proven challenging for many countries, e.g. UK and Italy. Notably, to third parties. To facilitate further research, we will pub-
it is difficult due to the rapid and exponential growth patterns of licly release the dataset, the scripts developed for analysis,
the virus and the increased demands on qualified human resources. and security assessment reports in due course.
After 5 months of the pandemic, the number of daily new case has • We analyze user privacy exposure and privacy protections
increased more than 80 fold.1 Thus, in many countries it has become provided by 10 solutions—covering 3 different frameworks
extremely difficult to perform manual contact tracing [20, 25, 37]. (PACT [19], Covid Watch [60], and PEPP-PT [57]), the Coron-
Government authorities around the world, together with indus- avirus Disease-19 website [14], and 6 applications — from 7
try, have sought to address the challenge by developing contact countries around the world. We establish a threat model and
tracing applications and services. A plethora of apps and services analyze the vulnerabilities of the apps to multiple privacy
1 1,354 confirmed per day in the first 30 days from 11 January to 10 February 2020 and
attacks. The results demonstrate that there is no solution
109,615 confirmed per day in the recent 30 days from 13 May to 12 June 2020—using that is able to protect users’ privacy against all of attacks
data released by the World Health Organization (WHO). investigated. The replay attack, in which a malicious user
1
Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, and Damith Ranasinghe

can replay valid identifiers to redirect all the traffic from • Level II: “Tokens are shared with proximity users”, a medium
one place to another to virtually or digitally alter the foot- exposure level with only tokens containing no Personal Iden-
print of the contact, could result in the targeted area being tifiable Information (PII) exchanged between users.
incorrectly locked-down due to false information. Generally, • Level III: “Tokens are shared with the server”, a medium
Bluetooth-based decentralized solutions that avoid direct exposure level with tokens exposed to the server.
location tracking outperform centralized systems. • Level IV: “PII is shared with a server”, a high risk exposure
• We synthesize the findings from our extensive COVID-19 level in which the users’ PII is shared with the server.
contact tracing app vetting exercises to: (i) provide best prac- • Level V: “PII is published to public”, the highest risk exposure
tice security and privacy guidance to governments and app level.
industry; and (ii) recommend a novel decentralized venue-
accessing-based contact tracing approach, termed Venue-
Trace, to overcome potential privacy issues highlighted in 2.2 Analysis of Design Paradigms and User
the state-of-the-practice solutions. Our VenueTrace proposal Privacy Exposure
has the capability to significantly increase the privacy pro-
We select 10 well known contact tracing solutions—including both
tections for citizens whilst being securely implemented.
current apps and proposed frameworks. Table 1 presents an overview
We have disclosed our findings and detailed security and privacy of the 10 selected solutions. Four of the selected solutions are de-
risk reports to the related stakeholders on 23 May 2020, at 11 am, velopments of apps used from the early stages of the pandemic
UTC. We have received acknowledgements from numerous vendors, (supported by governments such as China, South Korea, Singapore,
such as MySejahtera (Malaysia), Pakistan’s National Action Plan for and Israel). The one service and a proposed framework from Eu-
COVID-19 (Pakistan), Contact Tracer (USA), and Private Kit (USA). rope were selected because they are the first solutions that enable
We believe our study can provide useful insights for governments, anonymous identifier exchange. We have also selected three solu-
developers and researchers in the software industry to develop tions that are about to or have already been deployed from North
secure and privacy-preserving contact tracing apps. We hope the America and one app deployment from Oceania. As summarized in
results and the proposed contact tracing approach will contribute Table 2, all 10 solutions have user privacy exposure to some extent.
to increasing the trustworthiness of solutions to contain infectious We next discuss them in the context of the two broad categories
diseases now and in the future. of: (i) centralized architectures; and (ii) decentralized/distributed
architectures.
2 CONTACT TRACING APPLICATIONS Centralized solutions. Many solutions utilize a centralized sys-
A range of contact tracing applications (or “apps”) are used world- tem in which the central server is responsible for: (i) collecting
wide. Given the large number of contact tracing apps and proposed the contact records from diagnosed users; and (ii) health status
solution frameworks, we survey a representative sample to more evaluation for users and at-risk user determination. In some East
broadly study the architectures employed, design paradigms and Asian countries, e.g. China and South Korea, where the outbreaks
the privacy exposure of the user groups we identified in Section 2.1. first occurred, contact tracing systems were quickly developed and
released. The systems helped health authorities to successfully
2.1 Users Groups and Privacy Exposure control the spread of COVID-19, but a huge amount of Personal
Identifiable Information (PII) was collected.
We define model user groups and privacy exposure levels to aid our
In South Korea, the Coronavirus Disease-19 website [14] (#7 in Ta-
investigations into app architectures (in Section 2.2) and privacy
ble 1) is supported by the Ministry of Health and Welfare. Although
vetting (in Section 4). We envisage three groups of contact tracing
Alice’s privacy is protected as no data is required from her, the sys-
app users, based on their health status. We describe the three users
tem publishes Bob’s information to the public (marked as Level V in
groups below and analyze their privacy exposure in Section 2.2.
Table 2). The information exposed includes gender, nationality, age,
• Generic user. A typical user of the contact tracing system, diagnosis date, hospital, and movement history (removed in the
who is healthy or has not been diagnosed yet. latest version). This directly puts Bob at risk of being re-identified,
• At-risk user. Alice, who has recently been in contact with raising serious privacy concerns. For example, as reported by The
an infected user, Bob. Ideally, Alice will receive an at-risk Washington Post [36], in Cheonan, a city in South Korea, a text
alarm from her application. alert to residents showed that an infected person visited “Imperial
• Diagnosed user. As a diagnosed patient, Bob will be asked Foot Massage at 13:46 on 24 February”.
to reveal his private information as well as the information In China, QR-code contact tracing apps were developed by the
of at-risk users to the health authorities, e.g. the diagnosis two Tech companies, Alibaba and Tencent [50] (#2). The apps use
of his infection, his movement history, the persons he has a colour code to present the health condition of an individual–
been in contact with. green implies people can travel freely, while yellow or red indicates
they must report to the authorities. Users need to provide their
We define user group exposures in different apps using five
name, national ID, and phone number to register and use the app
levels:
to enter public places, e.g. the metro stations, supermarkets, and
• Level I: “No data is shared with a server or users”, the most airports. The apps are mandatory and jointly developed by govern-
secure level in which there is no user data shared. ment departments and supported by data from health and transport
2
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications

Table 1: Representative state-of-the-practice solutions from seven countries and five continents.
ID Country Name Developer Private/State Technique Architecture
1 Australia COVIDSafe Australian Department of Health State Bluetooth Centralized
2 China Health Code Alibaba Private QR code Centralized
3 Europe PEPP-PT∗ International consortium Private Bluetooth Centralized
4 Europe DP3T International consortium Private Bluetooth Decentralized
5 Israel HaMagen Ministry of Health State Location Decentralized
6 Singapore TraceTogether Government Technology Agency State Bluetooth Centralized
7 South Korea Coronavirus Disease-19 Ministry of Health and Welfare State Location Centralized
8 USA Covid Watch∗ Standford University Private Bluetooth Decentralized
9 USA Private Kit Massachusetts Institute of Technology Private GPS+Bluetooth Decentralized
10 USA PACT∗ University of Washington Private Bluetooth Decentralized
∗ Note: frameworks with no application implemented

Table 2: User exposure. The solutions that expose PII may not work well in countries
Solutions Generic At-risk Diagnosed Architecture with different societal norms. Thus, many western countries devel-
oped solutions with no PII related information exchange, e.g. PEPP-
COVIDSafe Centralized
Health Code Centralized PT [57]. In PEPP-PT, an ephemeral user ID is implemented based
PEPP-PT Centralized on a seed randomly generated by a user device. Users will exchange
DP3T Decentralized ephemeral IDs, instead of encrypted PII messages, to record a prox-
HaMagen Decentralized
TraceTogether Centralized imity contact event, thus reducing the privacy exposure of diag-
Coronavirus Disease-19 Centralized nosed users to Level III where tokens are shared with servers.
Covid Watch Decentralized
PACT Decentralized Decentralized solutions. The second type of solution is decentral-
Private Kit Decentralized ized, where: (i) the back-end server is only responsible for collecting
Level I : No data is shared with servers or users, the identifiers that was used by diagnosed users, e.g. the broadcast
Level II : Token shared with proximity users, Level III : Token shared with the server,
token, from diagnosed patients; and (ii) the health evaluation is
Level IV : PII shared with the server, Level V : PII is published to public.
conducted on users’ devices, locally. This design prevents the cen-
authorities. In this solution, for all types of users, their privacy in- tral server from knowing the infected person, and their physical
formation will be shared with the central server. Thus, we evaluate proximity contacts.
the user exposure of Health Code as Level IV in Table 2. However, Some decentralized systems rely on location information. For
although such solutions may collect and expose more privacy infor- example, Hamagen (#5), an app provided by the Israeli Ministry of
mation than other solutions we discuss later, a public perceptions Health, obtains but does not share location data from the user’s
survey [41] indicates that the users in USA prefer centralized sys- phone and compares it with the information stored in a central
tems that share diagnosed users’ recent locations in public venues. server regarding the location histories of confirmed cases. As no
TraceTogether [17] from Singapore is the first solution that uses data is shared before diagnosis, the exposure level of at-risk users
Bluetooth technology. Bluetooth-based solutions rely on proximity are evaluated as Level I in Table 2. However, the diagnosed users
tracing via Bluetooth broadcasts from apps. As these occur exclu- will be notified and given the option of reporting their exposure
sively between devices in proximity, these methods provide more to the Health Ministry by filling out a form; subsequently, their
scope for privacy-preserving computations compared to those that location trails are released to public.
use GPS locations (Coronavirus Disease-19 and Hamagen [45]). In Other Bluetooth solutions, e.g. DP3T [58] (#4), Covid Watch [60]
TraceTogether, proximity between two users is measured through (#8), and PACT [19] (#10), implement decentralized designs to allow
the Bluetooth broadcast signals and encrypted user information is users to download diagnosed anonymous identifiers from the back-
stored on mobile devices. Once diagnosed, the user will be asked to end server and compare with local records to obtain their risk of
upload their local on-device records to the Ministry of Health with exposure to the virus. The design paradigm reduces the exposure
the authority to decrypt the data and obtain the mobile numbers level of at-risk users to Level II and the level of diagnosed users to
of the user’s close contacts within a period of time (e.g. 21 days) Level III in Table 2 as no PII is shared by users. Apple and Google
that covers the incubation period of the virus. Such a centralized have released the “privacy-preserving contact tracing” API, which
BLE-based solution preserves more personal privacy as the data can support building decentralized contact-tracing apps [2].
exchanged between users is not related to absolute location in- Another application, Private Kit [8] (#9), a decentralized solution
formation. COVIDSafe [24] from Australia also utilizes a similar developed by Raskar et al. [52], enables individuals to log their own
technique. However, considering that the PII, e.g. phone numbers, location information. Particularly, Private Kit also allows Bluetooth
is collected by the government, the at-risk and diagnosed users’ broadcasts between users to enable direct notification between
PII is exposed to the central server (Level IV in Table 2), while the users. As the sharing of diagnosed users’ location trails and the
exposure of generic users still remains in Level II as the tokens are broadcasts between users is privacy protected, we determine Private
only shared between users. Kit’s user exposure levels to be the same as other Bluetooth-based
decentralized solutions.
3
Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, and Damith Ranasinghe

3 SECURITY VETTING Table 3: Contact tracing apps considered in our analysis.


In this section, we consider the list of 34 contact tracing apps cu- Applications∗ Country Downloads Version
rated and summarised in Table 3. We downloaded the apps from Coronavirus AlgÃľrie Algeria 100K 1.0.3
Google Play Store and evaluated their security performance against Stopp Corona Austra 100K 1.1.4.11
CoronaReport Austra 10K 2.9.5
the four vetting categories: (i) manifest weaknesses; (ii) general Coronavirus Australia Australia 500K 1.0.2
vulnerabilities; (iii) data leaks (with a focus on those that violate COVIDSafe Australia 1M 1.0.11
user privacy); and (iv) malware detection. We detail the list of issues BeAware Bahrain Bahrain 100K 0.1.4
Coronavirus Bolivia Bolivia 50K 1.2.7
considered in Table 4. Coronavirus - SUS Brasil 1M 2.0.5
COVID-19! Czech 10K 0.9.4
Stop Covid Georgia 100K 1.0.461
3.1 Methodology COVA Punjab India 1M 1.2.2
An overview of our security vetting methodology is shown in Fig- CG Covid-19 ePass India 500K 1.0.6
West Bengal Emergency Fund India 10K 1.3
ure 1 and we describe the method used for selecting the apps for Hamagen Israel 1M 1.1.2
our investigation in Section 3.1.1. We perform: (i) static analysis, Stop COVID-19 KG Kyrgyzstan 10K 0.3.137.325
including code analysis and data flow analysis; (ii) dynamic analysis MySejahtera Malaysia 500K 1.0.8
SOS CORONA Mali 10K 0.0.6
to detect malware. Nepal COVID-19 Surveillance Nepal 5K 1.1.1
Hamro Swasthya Nepal 50K 1.3.2
3.1.1 Apps Selection. In order to curate a list of contact tracing COVID Radar Netherlands 50K 1.1.2
apps, we first searched the keywords, e.g. “contact tracing”, “Covid”, National Action Plan Pakistan 50K 1.1
Corona Map Saudi Arabia 50K 1.0.0
“tracing coronavirus”, in Google Play Store. We also started with TraceTogether Singapore 500K 1.8.0
known official apps from countries, e.g. the COVIDSafe recom- NICD COVID-19 Case Investigation South Africa 10K 1.16
STOP COVID19 CAT Spain 500K 1.0.2
mended by Australian government. After a contact tracing app is StopTheSpread COVID-19 UK 100K 1.0.0
found, we assess its functionality by reading the app descriptions Coronavirus UY Uruguay 100K 2.2.3
and select those with in excess of 10,000 downloads. Subsequently, Private Kit USA 10K 0.5.19
Contact Tracing USA 10K 1.3.8
we include the app into the set and look for new apps through the Contact Tracer USA 10K 2.0.2
recommendation links in the app store, until there are no contact COVID-19 Vietnam 100K 1.0
tracing apps found. We repeated this procedure one week later and NCOVI Vietnam 1M 1.5.3
Vietnam Health Declaration Vietnam 100K 1.0.12
finalised the list of 34 contact tracing apps summarised in Table 3 Bluezone Vietnam 100K 1.0.1
on 1 May 2020. ∗ Note: apps were collected on 24 April 2020 and 1 May 2020 from Google Play Store.

3.1.2 Static Analysis. We perform static analysis on the Android


Package (APK) binary files. We first de-compile the APK of each
app to its corresponding class and xml files. Then, we utilize the
Table 4: Security vetting categories.
Mobile Security Framework (MobSF) [6] to perform code analysis
and FlowDroid [15] for data flow analysis. Notably, we augment Vetting Category∗ Security Issues
our static analysis with manual inspections to further increase Insecure flag settings, e.g. app data backup allowed
the robustness of the vetting process. We detail our approach in Manifest Weaknesses Non-standard launch mode
Appendix A. Clear text traffic
Sensitive data logged
SQL injection
3.1.3 Dynamic Analysis. We rely on malware scanners to flag mali- IP address disclosure
cious artifacts in contact tracing apps. Concretely, we send the APKs Uses hard-coded encryption key
Vulnerabilities
to VirusTotal [12], a free online service that integrates over 70 Uses improper encryption
Uses insecure SecureRandom
antivirus scanners, which has been widely adopted by the research
Uses insecure hash function
community [31, 43]. As shown in Table 4, the results of malware Remote WebView debugging enabled
detection will identify the detected viruses, worms, Trojans, and Privacy Leaks
Trackers
other malicious content embedded in the apps. Potential Leakage Paths from Sources to Sinks
Viruses, worms, Trojans
Malware Detection
and other kinds of malicious content
3.2 Security Vetting Results ∗ Note:
security issues are summarized from FlowDroid (Privacy Leaks), VirusTotal
(Malware), and MobSF (Manifest Weaknesses, Privacy Leaks, & Vulnerabilities).
We next inspect the presence of security vulnerabilities among the
34 considered apps. Figure 2 shows the percentage of contact tracing
and MediaPlayer, which may enable a network attacker to im-
apps that have security weakness found in our Code Analysis.
plement man-in-the-middle (MITM) [5] attacks during network
Code analysis. Figure 2 shows that the most prominent vulnerabil- transmission.
ities extracted from the manifest weaknesses. We observed that 68% Notably, during our manual review of the vetting results from
of apps do not set the flag allowBackup to False. Consequently, MobSF. We found false positives in three results, i.e. Clear Text
users with enabled USB debugging can copy application data from Storage, Saving Data in Temporary File, and SQL Injection. For
the device. Other weaknesses detected are related to “Clear Text example, in the application COVIDSafe, broadcast and channel
Traffic” such as plaintext HTTP, FTP stacks, DownloadManager, identifiers, encryption algorithm names, and placeholders, which
4
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications

Figure 1: Overview of our security vetting methods. Importantly, we also augment the analysis with manual inspections.

are used to receive or query specific values but do not contain sen-
sitive information, are stored as constant values. However, MobSF
regards all constant string values as potential clear text storage.
Some applications, e.g. Coronavirus UY (Uruguay), create template
files while decompressing and loading multiple dex files in order
to avoid the 64K reference limit [44]. Other applications, e.g. CG
Covid-19 ePass (India), are able to scan other’s barcode and save it
into temporary files in order to read the content. However, these
behaviours are mis-regarded as temporary file leakage by MobSF.
All these false-positives were removed through manual inspections
Figure 2: Code analysis results.
from our further analysis.
Figure 2 shows that the most frequent weakness detected by
static analysis is the “Risky Cryptography Algorithm”. Over 90% of
Table 5: Trackers identified in contact tracing apps.
apps use at least one of the deprecated cryptographic algorithms, e.g.
MD5 and SHA-1. For instance, in the app MySejahtera (Malaysia), Trackers # Apps Percentage
the parameters in WebSocket requests are combined and encrypted
with MD5 which will be compared with the content from requests Google Firebase 25 71.4%
Google CrashLytics 6 17.1%
in the class Draft_76 in order to verify the validity of connections.
Other Google trackers 4 11.4%
Although this has been listed in the top 10 OWASP [7] mobile Facebook trackers 3 8.6%
risks 2016, the results show that it is still a common security issue. Other trackers 9 25.7%
Another frequent weakness is “Clear Text Storage” (files may con-
tain hard-coded sensitive information like usernames, passwords, methods calling from Location and database.Cursor. These may
keys etc.). In class DataBaseSQL of COVID-19 (Vietnam) app, the obtain sensitive information from a geographic location sensor or
password of SQLite database is stored in the source code without en- from a database query. Most of the sensitive data will be transferred
cryption; CG Covid-19 ePass (India) also hard-coded its encryption to sinks, such as Bundle, Service, and OutputStream, which may
key in class Security. leak sensitive information out of apps. As discussed previously,
In total, 20 trackers have also been identified, including Google sending sensitive information to the Bundle object may reveal sen-
Firebase Analytics, Google CrashLytics, and Facebook Ana- sitive data to other activities. Besides, developers usually utilize
lytics. Approximately 75% of the apps contain at least one tracker. Log to print debugging information into Logcat [4] panel. How-
As shown in Table 5, the most frequent tracker is Google Fire- ever, human errors from developers can lead to mistakenly print
base Analytics which is identified in more than 70% of the apps. sensitive data. Notably, we discover that some apps transmit loca-
Notably, a research study [39] argues that TraceTogether using tion information through SMS messages. Considering Hamagen
Google’s Firebase service to store user information may leak user’s (Israel) as an example, location information is detected and obtained
privacy to third parties, such as Google. In the most extreme case, a by a source method initialize(Context,Location,e) and then
contact tracing app, the Contact Tracing (USA), contains 8 trackers. flows to a sink method where Handler.sendMessage(Message)
Data flow analysis. Figure 3 presents the flow of data between is called. This is a potential vulnerability as malware could easily
sources and sinks. This is counted by the number of source-to-sink intercept the outbox of Android SMS service [15].
paths found in each apps. The top sources of sensitive data are We also manually vet the FlowDroid results for false positives.
In total, 60 out of 371 paths (16.17%) are false positives (results
5
Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, and Damith Ranasinghe

Table 6: Results of regression testing of apps.

Application Version∗ Issues Patched


✓ Disabled Allow Backup;
TraceTogether 2.0.15
✓ Fixed potential privacy leakage.
✓ Fixed Insufficient Random issue as they do not
use Microsoft package anymore;
STOP COVID19
2.0.3 ✓ Fixed potential privacy leakage;
CAT
✗ Allows clear text traffic in manifest;
✗ New tracker detected: Google CrashLytics.
✓ Fixed the incorrect launch mode of an activity;
Mysejahtera 1.0.19 ✓ Removed three trackers: Google Analytics,
Google CrashLytics, and Google Tag Manager.
BlueZone 2.0.2 ✓ Fixed potential privacy leakage.
✗ New WebView weakness is detected, which could
Figure 3: Data flows detected between sources and sinks. Per- enable popup windows;
centages indicate the fraction of flows originating at the COVA Punjab 1.3.11 ✗ New potential privacy leakage path found;
✗ New tracker detected: Google CrashLytics and
sources (left) and terminating at the sinks (right). Google Ads.
Coronavirus
presented in Figure 3 are excluding these false positives). There Australia
1.1 ✗ New tracker detected: Google CrashLytics.
are mainly two categories of false positives. The first one is re-
Coronavirus UY 4.3.2 ✗ New tracker detected: Google CrashLytics.
lated to “Log” sinks where FlowDroid marks all log methods as
Contact Tracer N/A ✓ No longer available in Google Play Store
sinks, while some of them are not actually sensitive. For instance, in
✓: Fixed, ✗: New vulnerabilities found
TraceTogether, error messages, such as SQLiteException from the ∗ Note: new versions were collected on 24 June 2020 from Google Play Store.
stack trace that occurs while data querying, will be logged by Log.e
method. This matches the keywords and is false-positively identi- longer available in Google Play Store (see Table 6). Meanwhile, new
fied as a sink. Another example is, in Private Kit, while the status vulnerabilities are identified in the updated versions of several apps.
of LocationProvider changes, geo-location data are read in func- For example, COVA Punjab (India) enables popup window in Web-
tion LocationListener.onStatusChanged. According to the app View setting, STOP COVID19 CAT (Spain) allows clear text traffic
source code, we found only the status values, including OUT_OF_- in manifest setting, and some apps have more trackers identified.
SERVICE, TEMPORARILY_UNAVAILABLE, and AVAILABLE, are logged
by Log.v or Log.d, instead of the logging of the geo-location data.
Most of the false positives we find fall into this category. Another
3.3 Case Studies
type of false positive results come from preference leakage detection.
For example, the app STOP COVID19 CAT (Spain) stores country From our curated list of 34 apps, we select four typical app to further
code (e.g. UK and AU) by invoking Locale.getCountry method highlight key lessons we can learn with respect to security and
which is recognized as a source. As country code is not confiden- privacy risks. The case studies are based on TraceTogether, DP3T,
tial and does not leak privacy, we consider this as a false-positive Private Kit, and COVIDSafe.
source. TraceTogether. According to the static analysis results, root detec-
Malware detection. We discovered only one application, Stop tion [29] has been implemented, which potentially prevents SQL
COVID-19 KG (Kyrgyzstan)2 , containing malware. Two risks are injection and data breaches, thereby reducing the the risk to a cer-
identified: a variant Of Android/DataCollector.Utilcode.A and tain extent. For example, in o/C3271ax.java, root detection logic
an Adware (0053e0591). This is also aligned with the finding of is implemented by detecting the existence of specific root files in
COVID-19 apps’ threats [26]. Consider the limited spreads of Stop system, e.g. /system/app/Superuser.apk and /system/xbin/su;
COVID-19 KG (about 10,000 downloads), we conclude that the vast by assessing their integrity, the application can detect whether a
majority of contact tracing apps we collected from Google Play device is rooted and subsequently block users from either login or
Store are free of malware. However, the rise of contact tracking apps opening the applications.
has also attracted the interest of malicious developers. A recent However, TraceTogether also includes a third-party customer
report [49] disclosed that new ransomware has targeted the contact feedback library, zendesk SDK, in which the remote WebView
tracing app in Canada even before the app is publicly released. debugging is enabled. This potentially allows attackers to dump the
content in the WebView [18]. When a user inputs confidential data,
Feedback from developers. We recently re-checked all of these including passwords and identity information, in a debug-enabled
apps by a regression testing and found that all potential privacy WebView, attackers may be able to inspect all elements in the web
leakage on three apps âĂŞ TraceTogether (Singapore), BlueZone page by using remote debug tools [35]. Fortunately, according to
(Vietnam), STOP COVID19 CAT (Spain) âĂŞ has been fixed. Ad- the static analysis, the only WebView with debugging mode enabled
ditionally, the trackers of the app, Mysejahtera (Malaysia), have is to display articles; therefore, does not contain confidential data.
been removed and the vulnerable app, Contact Tracer (USA), is no
Security guidance 1: Never leave WebView with debugging
2 https://play.google.com/store/apps/details?id=kg.cdt.stopcovid19
mode enabled in the App release.

6
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications

DP3T. According to the static analysis, DP3T’s database is not en- Security guidance 4: To protect the system against false-
crypted, and data is saved in plain text. In contrast to TraceTogether, negatives caused by malfunction, thorough and comprehen-
the app does not implement any root detection capabilities. This sive testings must be carried out. In particular, the situations,
means that a malicious app could possibly access the database di- such as the mobile phone is locked and app are running in
rectly and manipulate the database containing COVID-19 contact the background, should be seriously considered.
records. Potentially, an adversary could spread false-positive.
Besides aforementioned issues, in accordance with the report
Security guidance 2: To protect the database from being
released on 14 May 2020 [46], several vulnerabilities, such as CVE-
dumped and prevent data breaches, a solution should:
2020-12857 and CVE-2020-12858, have been fixed. In CVE-2020-
(1) Implement database encryption [11] and 12857, the COVIDSafe app improperly catches GATT characteristic
(2) Enable root detection [1] and confidential data protec- values, i.e. TempID, for a long time until a successful transaction
tion [10] at application startup. takes place, instead of clearing the values periodically. As the data
could be read by a remote device, if an attacker never completes the
In addition, as the database records timestamps and contact
transaction, he will always obtain the same TempID from a user,
IDs, the leakage of the database from a root device infected by
which may enable the long-term tracking of the user. However, this
a mobile system virus can be exploited to mount linkage attacks
issue has been fixed by removing the entry to catch when a device
by adversaries [9]. If enough data in a region were collected by
is disconnected. The root cause of CVE-2020-12858 is because of the
attackers, contact IDs and timestamps in the database can be used
generation and use of the unchanged advertising payload, which
to analyze movements by comparing data and device owners may
means that an attacker is able to track a device by identifying its
be identified through a linkage attack [54].
advertising payload. In the latest update, the payload will not be
Private Kit. Similar to DP3T, Private Kit does not encrypt the data- cached.
base and contains plaintext data. Besides, the app creates temporary
JSON files to store user’s location data. Without any encryption 4 PRIVACY RISK ASSESSMENT
and root detection, the temporary JSON files can be dumped from
In this section, we describe the privacy analysis we conducted
root devices; thus increasing the risk of privacy leakage.
on the 10 selected contact tracing solutions in Table 1 to assess
Security guidance 3: To prevent potential data breaches, their protection against potential privacy breaches under our threat
tracing records and confidential data must not be stored in model.
temporary files in plain text.

COVIDSafe. According to our experiments, COVIDSafe 1.0.11 4.1 Threat Model


stores all tracing histories, including contacted device IDs and times- We consider four attackers in our threat model in addition to the
tamps, into SQLite database with plain text. Since the application user groups defined in Section 2.1:
does not implement a root detection logic, tracing histories may Application users. Those who install contact tracing applications
be leaked from root devices and potential Linkage Attacks can be on their mobile phones will receive information about COVID-
implemented [54]. However, in the latest version, COVIDSafe fixed 19, e.g. an at-risk alarm. A regular user may reveal their private
this issue by encrypting local database with a public key. information, e.g. name, gender, phone number, national ID, home
Mussared and McMurtry [34] discussed long-term device track- address, and location history, to the contact tracing systems, as well
ing and some other privacy-related attacks, substantiated as CVE- as discover other users’ private information from the system—pubic
2020-12856. In addition, due to the use of Generic Attribute Profile information or broadcasts from other users.
(GATT), the phone model name and the device name are shared
between users. Although this information may not be considered Health authorities. The actors are responsible for diagnosing
as PII, it could be set by users in a form of “Firstname Lastname’s infections and collecting health information from Application users.
Phone Moedl”, e.g. “Jim Green’s Pixel 2”, which allows an attacker They may learn or deduce private information about at-risk users.
to easily re-identify and track a user as this information will be Health authorities will also help the diagnosed users record or
continually broadcast. A practical demonstration of extracting such upload information to the contact tracing system.
information is available online.3 Furthermore, a bug is found when Governments. These actors work with technology providers and
a phone is locked; if its temporary ID is expired, the phone can- are often responsible for operating the contact tracing system. They
not provide a new ID to devices in the proximity [47]. In such a may access the data stored in a central server. In our threat model,
situation, a user will not be recorded by other users and will not we suppose the Government (and even the cloud operator) is “un-
receive an at-risk alarm if someone she contacted with is diagnosed. trusted”, that is, they may use the collected data for purposes beyond
Considering that it is usual to keep a phone locked, this may lead the pandemic.
to serious false-negatives.
Malicious adversaries. These adversaries have access to local
app information. They follow the defined algorithms, but wish
to learn more than the allowed information. They may have the
capability to access the local log of contact tracing applications, but
3 https://twitter.com/wabzqem/status/1257547477542027270 hacking the back-end server or another user’s device is out of the
7
Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, and Damith Ranasinghe

scope of their capabilities. They may utilize some devices, such as that the device will only receive Bluetooth broadcasts from Bob.
a Bluetooth broadcaster or receiver, to attack the system or gain Once Bob is diagnosed, Mallory will receive an at-risk alarm and
extra information. They may also modify the app and impersonate immediately acknowledge that the infected patient is Bob. In addi-
a legitimate user to access the system, which is difficult to prevent tion, Mallory can log the timestamp and the received ephemeral
unless remote attestation is applied. ID when in contact with Bob. Once Bob is diagnosed, Mallory is
able to trace back the source of recording and re-identify Bob and
4.2 Potential Attacks potentially infected users. Similar attacks were described as Pa-
parazzi Attack and Nerd Attack in an analysis of DP3T [59]. Note
As discussed previously, the privacy of users is hard to preserve in
that Mallory is able to extend such attacks to Sybil attacks to enable
a contact tracing system. To introduce potential privacy risks, we
the identification and the tracing back of multiple targets at the
will let Alice be an at-risk user, and let Bob be a diagnosed user who
same time. Even worse, if Mallory distributes multiple broadcast
has been in contact with Alice. Mallory will be a malicious attacker,
receivers, which could be also considered as a Sybil attack, in a
and Grace will be the government server (or other authority). Here
large area with some layout, e.g. honeycomb, they could even trace
we discuss four potential attacks. According to our threat model in
the movement of Bob by tracing the records on each device. Thus,
Section 4.1, if an attacker is not able to re-identify a user or inject
none of the 10 typical solutions can fully protect users’ privacy
fake reports to a contact tracing system through a specific privacy
against linkage attacks by Mallory.
attack, we try to determine the system as well-protected to prevent
such an attack; otherwise, the system will be considered as at-risk. Privacy guidance 2: To protect users’ privacy against link-
The vetting results are summarized in Table 7. age attacks by an adversary, a solution should:
Linkage attacks by servers. In centralized systems, the major (1) Avoid data sharing between users or
privacy concern is metadata leakage by the server. For example, in (2) Ensure privacy protections exist for any published data.
Coronavirus Disease-19 website, TraceTogether and COVIDSafe,
a central server is used to collect PII information and to evaluate False positive claims. In some systems, such as Coronavirus Aus-
at-risk individuals. Consequently, Grace will be able to collect a tralia, Bob can register as infected and upload data through the
large amount of PII, such as names, phone numbers, contact lists, contact tracing app to the server, which enables Alice to receive an
post code, home addresses, location trails. Therefore Grace is able at-risk alarm. However, if Mallory exploits such a mechanism and
to deduce the social connections of Alice. Even for PEPP-PT, a registers as a (fake) infected user, Alice will receive a false-positive
centralized Bluetooth system with solutions to avoid PII collection, at-risk alarm, which may cause social panic or negatively impact
the re-identifiable risks still exist. For example, from the server side, evidence-driven public health policies. Most solutions mitigate this
Grace is able to link ephemeral IDs to the corresponding permanent issue by implementing an authorization process, i.e. Bob is only
app identifier and thus trace Alice based on IDs observed in the permitted to upload data after receiving a one-time-use permis-
past, as well as tracing future movements. Thus, no centralized sion code generated by the server. Without the permission code,
solutions in Table 1 can prevent linkage attacks by the server. Mallory is not allowed to claim they are infected and Alice will
In contrast, for decentralized Bluetooth solutions, Alice’s privacy always receive a true at-risk alarm. Only two solutions, i.e. DP3T
is protected as her PII will not be sent to a central server by a and PACT [19], have no authorization process implemented.
diagnosed user and her health status is evaluated on her own device. Privacy guidance 3: To protect a system against false-positive-
Thus, decentralized Bluetooth systems are able to protect users’ claim attacks, a solution should establish an authorisation
privacy against linkage attacks by the server. However, in location- process.
based decentralized systems, e.g. Hamagen, the server learns users’
location trails. Relay attacks.4 To apply such an attack, Mallory could collect
Privacy guidance 1: To protect users’ privacy against link- existing broadcast messages exchanged between users, then replay
age attacks by a server, a contact tracing solution should: it at another time or forward it through proxy devices to a remote
(1) Avoid sharing PII information with central points or location and replay the messages. Due to the lack of message valida-
(2) Implement a decentralized design. tion in solutions that utilize information broadcasts, a user will not
be able to determine whether a received broadcast is from a valid
Linkage attacks by users. Linkage attacks, performed by Mallory, source or from a malicious device. Any received broadcast will be
try to re-identify Alice or Bob. As discussed previously, in contact recorded as a contact event, even though no actual contact exists.
tracing systems that directly publish users’ PII, e.g. Coronavirus A malicious attacker can potentially redirect all the traffic from
Disease-19, Hamagen, and Health Code, Bob is at the risk of privacy one place to another, resulting in a targeted area being incorrectly
leakage. For example, in Coronavirus Disease-19, Mallory could locked-down by replaying fake information.
be able to re-identify Bob as he will know Bob’s gender, age, and For example, suppose Mallory records the broadcasts from Bob
location history from public information. and then replays it hours later, or transmits it to a remote location
The other 7 systems listed in Table 1 further rely on information and replays the messages to Alice. Alice’s device, although not
exchange between users. In number of the apps, e.g. DP3T, which actually being in contact with Bob, will receive and record the
implement an ephemeral ID design, Mallory is still able to identify replayed broadcast in its local log. Once Bob is diagnosed, Alice will
Bob using more advanced attacks. For example, if Mallory places a 4 The combination of man-in-the-middle and replay attacks are henceforth referred to
Bluetooth receiver near Bob’s home or working place and ensures as a relay attack.
8
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications

Table 7: Privacy protections against the attacks. at setup, each broadcast device will register its MAC address to a
Solutions Linkage-Server Linkage-User False-Claim Relay back-end server and get a unique VenueID which will be broadcast
through Bluetooth at every Time Interval T . Hence, users are not
CovidSafe
Health Code required to broadcast information.
PEPP-PT
DP3T Applications installed in the user’s phone. For every VenueID
HaMagen broadcast by a device in its proximity at time t, after receiving the
TraceTogether broadcast VenueID, a pair (ID, t) will be created and a timer will be
Coronavirus
Disease-19 started in the user application. If a user receives the VenueID after
Covid Watch T again, the user stores a tuple which satisfies the following:
Private Kit
PACT (ID, tst ar t , tend ), where tst ar t − tend ≥ T , (1)
: the system is well protected : the system is at-risk where tst ar t is the first timestamp of the received the broadcast
VenueID in the local storage for 14 days; tend is the last timestamp
receive an at-risk alarm even though she has never been in contact of a period over which a user continuously received VenueID. For
with Bob. Such an attack will falsely enlarge the contact range of example, if T is set to 10 minutes and Bob stays in a public place
Bob and create a large amount of false-positive alarms, which may for more than 30 minutes, he will at least receive the broadcast for
cause panic among citizens. Solutions that do not utilize information three times, e.g. t 1 , t 2 , and t 3 . The application will record the tuple
broadcasts, such as Coronavirus Disease-19 and Hamagen, can (ID, t 1 , t 3 ) in local log, where t 3 - t 1 = 20 minutes, indicating that
avoid relay attacks. he stayed in a public place for at least 20 minutes.
In DP3T, a solution is provided to limit the replay attack by in- Once Bob is diagnosed, with his consent, he will receive a per-
cluding temporal information in the broadcast ephemeral identifiers. mission code from the health authority, then Bob can upload his
However, it cannot effectively prevent replay attacks occurring at log (ID Bob , ts Bob , te Bob ) and the permission code to the back-end
the same moment. Another promising solution is to use an ambient server. To ensure the security of data transmission, the data will be
physical sensing approach, e.g. ambient audio. This has been shown encrypted with the Public Key of the back-end server. Every twenty-
to secure proximity detection [28, 56] by comparing the ambient four hours, Alice will download logs in a format of
information embedded in the broadcast messages with the local am- (ID Server , tsServer , teServer ) from the back-end server and a record
bient. It allows a receiver to validate whether the source is nearby match and an evaluation will be conducted locally. A local record,
as the range of Bluetooth broadcast is generally within 50 m. i.e. (ID Alice , tsAlice , teAlice ), is considered as matched if the fol-
Privacy guidance 4: To protect a system against relay at- lowing conditions are true.
tacks, a solution should: ID Alice = ID Server
(1) Either avoid utilizing information broadcast or (2)
(tsAlice , teAlice ) ⊆ (tsServer , teServer ),
(2) Implement a validation approach.
which indicates that Alice has been in a public place during the
period (tsServer , teServer ) and may have been in an at-risk envi-
ronment. If there is a match in Alice’s local log, Alice’s device will
5 OUR RECOMMENDATIONS generate an at-risk alarm.
As discussed in Section 4, a contact tracing application should
Decentralized back-end server. The back-end server supports
preserve the privacy of generic and at-risk users. Although the
the activities of (i) registering the Bluetooth broadcaster by storing
diagnosed user may reveal their privacy to health authorities, we
its MAC addresses and VenueIDs; (ii) generating and authorizing
should not release this data to the public. Furthermore, we argue
permission codes to health authorities; (iii) publishing the public
that a contact tracing solution should focus on tracing anonymous
key and deciphering the data uploaded by diagnosed users with its
daily routines or occasional contacts, instead of close contacts.
private key; (iv) validating the received VenueIDs and permission
Here we propose a venue-accessing-based solution, VenueTrace,
code; and (v) publishing at-risk information to regular users in the
to overcome privacy risks. We will first describe the framework
format of tuples (ID Server , tsServer , teServer ),
of this solution and assess its privacy performance as well as its
limitations. ID Server = ID Bob ,
tsServer = ts Bob − T − random, (3)
5.1 Our Privacy-by-Design: VenueTrace teServer = te Bob + δt + random ′,
The architecture of VenueTrace contact tracing consists of the fol-
where the at-risk time interval is extended based on Bob’s record
lowing modules as described in Figure 4.
by T in the upper limit and δt in the lower limit. A typical δt could
Bluetooth broadcaster in public places. Instead of broadcasting be set to 12 hours to ensure that Alice will be informed that she
from each user’s mobile phone, the VenueTrace proposes utilizing has been in an at-risk venue where she may have touched a virus-
Bluetooth broadcast devices installed in public places, e.g. restau- contaminated surface or inhaled airborne droplets, even though she
rants, movie theaters, working places, and public transport hubs has never been in close physical contact with Bob. We could also fur-
and stops. This is in contrast to most existing Bluetooth solutions ther extend the visiting duration with random noise to blur the time-
that rely on human-to-human contact. To facilitate contact tracing, line. For example, if Bob visited a public place from 9 am to 10 am,
9
Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, and Damith Ranasinghe

Figure 4: Overview of VenueTrace framework.

the released infected duration could be from 8:30 am to 11:15 pm, linkage attacks by users. In the worst case, the attackers may phys-
where T = 5 mins, random = 25 mins, δt = 12 hours, random ′ = ically visit public places and record the venue IDs, then they may
75 mins. Considering the time-related functionality of a public place, link venue IDs to locations. After an at-risk alarm, the attacker may
this duration could be further capped. perceive where a diagnosed user appeared. However, the attackers
still does not have enough information to re-identify the infected
individuals, unless they log all persons having appeared in multi-
ple public places for a long period and is able to infer the persons
5.2 Defending Against Attacks matching the timeline. Thus, without information shared between
A major flaw in many implemented and proposed applications is users, the linkage attack by users and real-time movement tracks
that they cannot fully guarantee user privacy when using user- are impossible.
provided data. Of the applications discussed in this paper, this issue False positives. Aside from real people misreporting their symp-
is particularly prevalent with centralized solutions, e.g. TraceTo- toms, applications that rely on diagnosed users’ information are at
gether, as they may suffer from linkage attacks not only by users risk of malicious false-positive claim. For location-based applica-
but also by the server. However, any application that requires symp- tions, these attacks are very effective. For a GPS based system, an
tom reporting from its user base could potentially be vulnerable, as attacker could spoof a series of GPS coordinates to an app. If no
discussed in Section 4. authentication measures are implemented, uploading such spoofed
Decentralized computation. Compared to centralized systems, GPS data to the server can cause potential havoc across the system.
our solution has the inherent advantage of decentralized systems, This scenario, in retrospect, emphasizes the design of permission
that is, users’ privacy is not exposed to the server. The back-end code in our recommendation. A user is allowed to register as in-
server will only receive the timestamp and VenueID of a public fected only after being authorized with a permission code, which
place that is visited by diagnosed users. Supposing a malicious prevents false-positive claims.
attacker successfully extracts data from the back-end server, he is Further considerations. To protect users’ privacy against relay at-
still not able to link the information to any location or users as tacks, the VenueTrace solution can be updated by including blurred
there is no location information stored in the server. Thus, the user location information in broadcasting. As a relay attack will replay a
privacy is protected against the linkage attack by a server. VenueID in another remote place and thereby expanding the broad-
Coarse-grained location. Furthermore, in contrast to location cast range and causing false-positive alarms, we can distinguish
based solutions (e.g. Hamagen) our solution does not utilize GPS fake broadcasts by combining the location of broadcasters in the
information as Bluetooth is more advantageous than GPS signals in broadcast message. When the user receives a broadcast, the appli-
high risk indoor environments. However, considering the extension cation can parse out the location of the broadcaster and compare it
and blurring in timelines, the back-end server will only receive and with the location of a receiver to filer those bounded by a distance,
publish venue IDs with at most coarse-grained location information. e.g. greater than 1 km. This allows eliminating such replay forgeries.
In addition, our solution overcomes the limitation of location-based As the original intention of the VenueTrace solution is to prevent
tracing by installing the broadcaster in public venues and transports the back-end server from obtaining locations of broadcasters and
in contrast to user devices. users, we can only add vague location information to the broadcast
information, such as adding a 1 km error, to prevent the relay attack
No token exposure. Many Bluetooth-based decentralized systems while preserving users’ privacy. However, this, to some extent,
provide a privacy preserving solution by only sharing the tempo- weakens the location privacy protections of our solutions.
rary tokens between users. However, it is still amenable to linkage
attacks by users. Compared to other decentralized solutions, our
solution further preserves user privacy as no information is ex-
changed between users. Consequently, our system is immune to
10
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications

6 RELATED WORK scope to software vulnerabilities and privacy leakage. Examining


Location tracking. Recent works have explored extracting loca- Bluetooth Low Energy and network traffic originating from contact
tion information from mobile apps [51, 55, 61, 63]. For instance, tracing apps are worth further exploration. Another future avenue
Xue et al. [62] presented a supervised machine learning methodol- is to examine any vulnerabilities associated with iOS counterparts.
ogy to localize users without any reverse engineering of the app. To overcome a number of these issues, we have proposed a
In our vetting, we considered such situation as a linkage attack privacy-preserving contact tracing design, termed VenueTrace. The
by users, which extended the scope to also include tracking via proposed recommendation has a decentralized architecture in which
Bluetooth. Although it may cost much to build up a large-scale no information is exchanged among users and no location and iden-
Bluetooth broadcasting and receiving network, e.g. a honeycomb tifiable information will be exposed to the server. However, just as
layout, to apply a movement tracking attack, it is still a considerable with other apps, it is impossible to address all potential risks, e.g.
risk of privacy leakage. our solution is similarly vulnerable to the relay attack; a solution
requires compromising the privacy of our approach. We hope our
Contact tracing tools analysis. Several recent works have fo-
study can inform and aid the software industry to design, develop,
cused on the evaluation and analysis of contact tracing applications.
and deploy more secure (and privacy-preserving) contact tracing
Just 10 days after TraceTogether launched, Cho et al. [21] presented
apps whilst allowing citizens to use contact tracing apps with more
a constructive discussion of potential modifications to encourage
confidence in the capability of the apps to protect their security
community efforts to develop solutions with stronger privacy pro-
and privacy.
tection and argued that privacy is a central feature of mobile contact
tracing apps. One week later, Vaudenay [59] analyzed the DP3T
solution and pointed out that some privacy protection measure- REFERENCES
ments by DP3T may have the opposite effect. Gvili [27] presented a [1] [n.d.]. Android Root Detection Techniques. https://blog.netspi.com/
security analysis of the Bluetooth and cryptography specifications android-root-detection-techniques/
[2] [n.d.]. Apple and Google partner on COVID-19 contact tracing technol-
published by Apple and Google [32, 33], arguing that significant ogy. https://www.apple.com/au/newsroom/2020/04/apple-and-google-
risks may be introduced by this solution. Other works [40, 42] partner-on-covid-19-contact-tracing-technology/
conducted a review of the centralized and decentralized solutions [3] [n.d.]. CWE-330: Use of Insufficiently Random Values. https://cwe.mitre.
org/data/definitions/330.html
and proposed contact tracing using a zero-knowledge protocol. He [4] [n.d.]. Logcat command-line tool. https://developer.android.com/studio/
et al. [30] inspect the expansion of COVID-19 related apps more command-line/logcat
[5] [n.d.]. Man-in-the-middle attack. https://en.wikipedia.org/wiki/Man-in-
broadly, identifying the presence of malware. the-middle_attack
Our work. In contrast to aforementioned works, our research not [6] [n.d.]. Mobile-Security-Framework-MobSF. https://github.com/MobSF/
Mobile-Security-Framework-MobSF
only focuses on the analysis of one solution or the comparison [7] [n.d.]. OWASP. https://owasp.org/www-project-mobile-top-10/
between centralized and decentralized designs, but also conducts [8] [n.d.]. Private Kit: Safe Paths; Privacy-by-Design COVID-19 Solutions using
a security and privacy vetting on multiple state-of-the-practice GPS+Bluetooth for Citizens and Public Health Officials. http://safepaths.
mit.edu/
approaches. Concretely, we aim to extract security and privacy [9] [n.d.]. Protecting Against Linkage Attacks that Use âĂŸAnonymous
guidance from various solutions and propose more practical pro- DataâĂŹ. https://www.marklogic.com/blog/protecting-linkage-
tection of individual security and privacy. attacks-use-anonymous-data/
[10] [n.d.]. Support Direct Boot mode. https://developer.android.com/
training/articles/direct-boot
7 CONCLUDING REMARKS [11] [n.d.]. Using The SQLite Encryption Extension. https://sqlite.org/
android/doc/trunk/www/see.wiki
This paper has conducted a security analysis of 34 contact tracing [12] [n.d.]. Virustotal. https://www.virustotal.com/
applications and evaluated the privacy performance of 10 solutions. [13] [n.d.]. Visual Studio Code. https://code.visualstudio.com/
[14] 2020. Coronavirus Desease-19. http://ncov.mohw.go.kr,2020,accessed:
The results show that security risks remain; such as using dep- 2020-03-23
recated cryptographic algorithms, storing sensitive information [15] Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bar-
in clear text, and allowing permissions for backup. Thus, we rec- tel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014.
Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint
ommend that the reported vulnerabilities be patched as soon as analysis for Android apps. Acm Sigplan Notices 49, 6 (2014), 259–269.
possible, although we appreciate that developers may prioritize the [16] Mehrdad Bahrini, Nina Wenig, Marcel Meissner, Karsten Sohr, and Rainer Malaka.
2019. HappyPermi: Presenting Critical Data Flows in Mobile Application to Raise
speed of product release to counter the pandemic. That said, the User Security Awareness. In Extended Abstracts of the 2019 CHI Conference on
majority of patches are straightforward. For example, over 70% of Human Factors in Computing Systems. 1–6.
developers still use insecure hash functions such as SHA-1 and MD5, [17] Jason Bay, Joel Kek, Alvin Tan, Chai Sheng Hau, Lai Yongquan, Janice Tan, and
Tang Anh Quy. 2020. BlueTrace: A privacy-preserving protocol for community-
or storing sensitive information in clear text. Further, to ensure se- driven contact tracing across borders. Government Technology Agency-Singapore,
curity and remove potential vulnerabilities, code should be released Tech. Rep (2020).
for public review. [18] Ashish Bhatia. 2019. Android Security: Don’t leave WebView debugging enabled
in production. https://dev.to/ashishb/android-security-don-t-leave-
Our analysis has shown that protecting privacy is more challeng- webview-debugging-enabled-in-production-5fo9
ing, particularly as this must be balanced against the urgency of [19] Justin Chan, Shyam Gollakota, Eric Horvitz, Joseph Jaeger, Sham Kakade, Ta-
dayoshi Kohno, John Langford, Jonathan Larson, Sudheesh Singanamalla, Jacob
the pandemic. To the best of our knowledge, there are no solutions Sunshine, et al. 2020. PACT: privacy sensitive protocols and mechanisms for
that can protect users’ privacy against all potential attacks. Besides mobile contact tracing. arXiv preprint arXiv:2004.03544 (2020).
the solutions that collect private information, the results of our [20] Bill Chappell. 2020. Coronavirus: sacramento county gives up on auto-
matic 14-Day quarantines. https://www.npr.org/sections/health-
privacy vetting indicates that most of the contact tracing apps are shots/2020/03/10/813990993/coronavirus-sacramento-county-gives-
potentially vulnerable to malicious privacy attacks. We limit our up-on-automatic-14-day-quarantines,2020,accessed:2020-03-23
11
Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, and Damith Ranasinghe

[21] Hyunghoon Cho, Daphne Ippolito, and Yun William Yu. 2020. Contact tracing [45] Israel Ministry of Health. 2020. Hamagen. https://govextra.gov.il/
mobile apps for COVID-19: Privacy considerations and related trade-offs. arXiv ministry-of-health/hamagen-app/,2020,accessed:2020-04-23
preprint arXiv:2003.11511 (2020). [46] Jim Mussared. [n.d.]. Privacy issues discovered in the BLE implementation
[22] Chris Culnane and Kobi Leins. 2019. Misconceptions in privacy protection and of the COVIDSafe Android app. https://docs.google.com/document/
regulation. Law in Context. A Socio-legal Journal 36, 2 (2019), 1–12. d/1u5a5ersKBH6eG362atALrzuXo3zuZ70qrGomWVEC27U/edit#heading=h.
[23] Australia Department of Health. 2020. Coronavirus Australia app. q4sovraiy4kn
https://www.health.gov.au/resources/apps-and-tools/coronavirus- [47] Richard Nelson. [n.d.]. COVIDSafe iOS 1.5 bugs. https://docs.google.
australia-app,2020,accessed:2020-04-23 com/document/d/1dsSxC48cJ91X17PoOybpun1U163YDxxL0CDk3kmAHvY/
[24] Australia Department of Health. 2020. COVIDSafe. https://www.health.gov. mobilebasic
au/resources/apps-and-tools/covidsafe-app,2020,accessed:2020- [48] World Health Organization. 2020. Operational considerations for
04-23 case management of COVID-19 in health facility and community.
[25] Luca Ferretti, Chris Wymant, Michelle Kendall, Lele Zhao, Anel Nurtay, Lucie https://apps.who.int/iris/bitstream/handle/10665/331492/WHO-
Abeler-Dörner, Michael Parker, David Bonsall, and Christophe Fraser. 2020. Quan- 2019-nCoV-HCF_operations-2020.1-eng.pdf
tifying SARS-CoV-2 transmission suggests epidemic control with digital contact [49] Charlie Osborne. [n.d.]. New ransomware masquerades as COVID-19 contact-
tracing. Science (2020). tracing app on your Android device. https://www.zdnet.com/article/new-
[26] Tara Gould, Gage Mele, Parthiban Rajendran, and Rory Gould. [n.d.]. crycryptor-ransomware-masquerades-as-covid-19-contact-tracing-
Anomali threat research identifies fake COVID-19 contact tracing apps app-on-your-device/
used to download malware that monitors devices, steals personal data. [50] Raymond Zhong Paul Mozur and Aaron Krolik. 2020. In Coron-
https://www.anomali.com/blog/anomali-threat-research-identifies- avirus Fight, China Gives Citizens a Color Code, With Red Flags.
fake-covid-19-contact-tracing-apps-used-to-monitor-devices- https://www.nytimes.com/2020/03/01/business/china-coronavirus-
steal-personal-data surveillance.html,2020,accessed:2020-05-07
[27] Yaron Gvili. 2020. Security analysis of the covid-19 contact tracing specifications [51] Iasonas Polakis, George Argyros, Theofilos Petsios, Suphannee Sivakorn, and
by apple inc. and google inc. Technical Report. Cryptology ePrint Archive, Report Angelos D Keromytis. 2015. Where’s Wally? Precise user discovery attacks in
2020/428. location proximity services. In Proceedings of the 22nd ACM SIGSAC Conference
[28] Tzipora Halevi, Di Ma, Nitesh Saxena, and Tuo Xiang. 2012. Secure proximity on Computer and Communications Security. 817–828.
detection for NFC devices based on ambient sensor data. In European Symposium [52] Ramesh Raskar, Isabel Schunemann, Rachel Barbar, Kristen Vilcans, Jim Gray,
on Research in Computer Security. Springer, 379–396. Praneeth Vepakomma, Suraj Kapa, Andrea Nuzzo, Rajiv Gupta, Alex Berke, et al.
[29] Zhiyun Qian Hang Zhang, Dongdong She. 2015. Android Root and its Providers: 2020. Apps gone rogue: Maintaining personal privacy in an epidemic. arXiv
A Double-Edged Sword. (2015). https://dl.acm.org/doi/abs/10.1145/ preprint arXiv:2003.08567 (2020).
2810103.2813714 [53] Siegfried Rasthofer, Steven Arzt, and Eric Bodden. 2014. A machine-learning
[30] Ren He, Haoyu Wang, Pengcheng Xia, Liu Wang, Yuanchun Li, Lei Wu, Yajin approach for classifying and categorizing Android sources and sinks. In NDSS,
Zhou, Xiapu Luo, Yao Guo, and Guoai Xu. 2020. Beyond the virus: A first look at Vol. 14. Citeseer, 1125.
Coronavirus-themed mobile malware. arXiv preprint arXiv:2005.14619 (2020). [54] Yilin Shen, Fengjiao Wang, and Hongxia Jin. 2014. Defending against user identity
[31] Yangyu Hu, Haoyu Wang, Li Li, Yao Guo, Guoai Xu, and Ren He. 2019. Want linkage attack across multiple online social networks. In Proceedings of the 23rd
to earn a few extra bucks? A first look at money-making apps. In 2019 IEEE International Conference on World Wide Web. 375–376.
26th International Conference on Software Analysis, Evolution and Reengineering [55] Milan Stute, Sashank Narain, Alex Mariotto, Alexander Heinrich, David Kre-
(SANER). IEEE, 332–343. itschmann, Guevara Noubir, and Matthias Hollick. 2019. A billion open interfaces
[32] Apple Inc and Google Inc. [n.d.]. Exposure Notification-Bluetooth Speci- for Eve and Mallory: MitM, DoS, and tracking attacks on iOS and macOS through
fication. https://www.blog.google/documents/58/Contact_Tracing_- Apple Wireless Direct Link. In 28th {USENIX } Security Symposium ( {USENIX }
_Bluetooth_Specification_v1.1_RYGZbKW.pdf Security 19). 37–54.
[33] Apple Inc and Google Inc. [n.d.]. Exposure Notification-Cryptography Spec- [56] Yvonne Taunton. 2020. Smartphone-based automated contact
ification. https://www.blog.google/documents/56/Contact_Tracing_- tracing: Is it possible to balance privacy, accuracy and security?
_Cryptography_Specification.pdf https://www.uab.edu/news/research/item/11299-smartphone-based-
[34] Eleanor McMurtry Jim Mussared. [n.d.]. The COVIDSafe App automated-contact-tracing-is-it-possible-to-balance-privacy-
- 4 week update. https://docs.google.com/document/d/ accuracy-and-security,2020,accessed:2020-05-07
17sVyBIG5CqhF9XtuEfeG2MfYsFNXuV4yxp3BERDTJoI/preview# [57] The PEPP-PT team. 2020. PEPP-PT High Level Overview. https:
[35] Meggin Kearney. [n.d.]. Remote Debugging WebViews. https://developers. //github.com/pepp-pt/pepp-pt-documentation/blob/master/PEPP-PT-
google.com/web/tools/chrome-devtools/remote-debugging/webviews high-level-overview.pdf
[36] Min Joo Kim and Simon Denyer. [n.d.]. A ‘travel log’ of the times [58] Carmela Troncoso, Mathias Payer, Jean-Pierre Hubaux, Marcel Salathé, James
in South Korea: Mapping the movements of coronavirus carriers. Larus, Edouard Bugnion, Wouter Lueks, Theresa Stadler, Apostolos Pyrgelis,
https://www.washingtonpost.com/world/asia_pacific/coronavirus- Daniele Antonioli, et al. 2020. Decentralized privacy-preserving proximity tracing.
south-korea-tracking-apps/2020/03/13/2bed568e-5fac-11ea-ac50- arXiv preprint arXiv:2005.12273 (2020).
18701e14e06d_story.html [59] Serge Vaudenay. 2020. Analysis of DP3T Between Scylla and Charybdis. (2020).
[37] Siddique Latif, Muhammad Usman, Sanaullah Manzoor, Waleed Iqbal, Junaid https://eprint.iacr.org/2020/399.pdf
Qadir, Gareth Tyson, Ignacio Castro, Adeel Razi, Maged N. Kamel Boulos, Adrian [60] Sydney Von Arx and Daniel Blank. 2020. Slowing the Spread of Infectious
Weller, and Jon Crowcroft. 2020. Leveraging Data Science To Combat COVID-19: Diseases Using Crowdsourced Data. Covid Watch (2020).
A Comprehensive Review. (4 2020). https://doi.org/10.36227/techrxiv. [61] Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, and Ben Y
12212516.v1 Zhao. 2016. Defending against Sybil devices in crowdsourced mapping services.
[38] Kobi Leins, Chris Culnane, and Benjamin IP Rubinstein. 2020. Tracking, tracing, In Proceedings of the 14th Annual International Conference on Mobile Systems,
trust: Contemplating mitigating the impact of COVID-19 through technological Applications, and Services. 179–191.
interventions. The Medical Journal of Australia (2020), 1. [62] Minhui Xue, Cameron Ballard, Kelvin Liu, Carson Nemelka, Yanqiu Wu, Keith
[39] Douglas J Leith and Stephen Farrell. 2020. Coronavirus Contact Tracing App Ross, and Haifeng Qian. 2016. You can yak but you can’t hide: Localizing anony-
Privacy: What Data Is Shared By The Singapore OpenTrace App? (2020). mous social network users. In Proceedings of the 2016 Internet Measurement
[40] Jinfeng Li and Xinyi Guo. 2020. COVID-19 Contact-tracing Apps: A Survey on Conference. 25–31.
the Global Deployment and Challenges. arXiv preprint arXiv:2005.03599 (2020). [63] Minhui Xue, Yong Liu, Keith W Ross, and Haifeng Qian. 2016. Thwarting location
[41] Tianshi Li, Cori Faklaris, Jennifer King, Yuvraj Agarwal, Laura Dabbish, Jason I privacy protection in location-based social discovery services. Security and
Hong, et al. 2020. Decentralized is not risk-free: Understanding public perceptions Communication Networks 9, 11 (2016), 1496–1508.
of privacy-utility trade-offs in COVID-19 contact-tracing apps. arXiv preprint
arXiv:2005.11957 (2020).
[42] Joseph K Liu, Man Ho Au, Tsz Hon Yuen, Cong Zuo, Jiawei Wang, Amin Sakzad,
Xiapu Luo, and Li Li. [n.d.]. Privacy-Preserving COVID-19 contact tracing app: a
zero-knowledge proof approach. ([n. d.]).
[43] Tianming Liu, Haoyu Wang, Li Li, Xiapu Luo, Feng Dong, Yao Guo, Liu Wang,
Tegawendé Bissyandé, and Jacques Klein. 2020. MadDroid: Characterizing and
Detecting Devious Ad Contents for Android Apps. In Proceedings of The Web
Conference 2020. 1715–1726.
[44] Caio Milani. [n.d.]. Understanding Multidex in Android. https://blog.
mindorks.com/understanding-multidex-in-android
12
Vetting Security and Privacy of Global COVID-19 Contact Tracing Applications

APPENDICES source and confirm whether the source is reachable. If reachable, we


consider it as a privacy leakage. Similar to the code analysis phase,
A STATIC ANALYSIS we also conduct a manual inspection described in Appendix C. We
Code analysis. MobSF is a state-of-the-art and open source pen- describe the limitations to the vetting method in Appendix D.
testing, malware analysis and security vetting framework [16],
which flags vulnerabilities in code. B CODE ANALYSIS: MANUAL INSPECTIONS
For context, MobSF works in the following way. The de-compiled To increase the accuracy of the vetting results, we manually verified
AndroidManifest.xml file is first parsed to extract essential infor- the testing results of MobSF. First, considering that MobSF mainly
mation about the application, such as Permission, Components, relies on keywords and sentences matching in APIs, we check the
Intents. Then, the system assess requested permissions by the rules defined in MobSF’s source code. Then, we collect rules with
application and examine whether all Components (e.g. Service, weak keywords defined. For example, if a rule only uses Log.v or
Receiver, Activity, Provider) are protected by at least one per- System.out.print to find “sensitive data logging”, without check-
mission explicitly requested in manifest file. Other attribute config- ing whether the data is sensitive, we consider this is a weak rule.
urations, such as the allowBackup, debuggable, and networkSe- We automated the extraction of all Java files which are identified
curityConfig flags, will also be checked. by these weak rules for manual inspection. Finally, we removed the
The class files are subsequently parsed via a Sensitive Data false-positive cases from the testing results.
Match module, which utilizes keyword matching, e.g. “password”
and “secret”. The Method Extraction module matches methods in C DATA FLOW ANALYSIS: MANUAL
class files with pre-defined rules to extract vulnerable methods. INSPECTIONS
For example, if a method contains the keyword .hashCode(), it For data flow taint analysis, we manually review the FlowDroid
will be considered as using Java Hash Code, a weak hash function results of all 34 apps. Since all APK files have been decompiled by
that should not be used in a secure cryptography implementation. MobSF, sink paths in XML reports generated by FlowDroid are
However, as a weakness could be defined in third-party APIs, the closely contrasted with decompiled source codes in order to identify
vulnerable method may never be executed during run-time. To false-positive cases. First, sink paths are picked out to analyze the
address this, the Determining vulnerable Calls module will vet possibility of potential leakage. If any of these paths are suspected
whether a vulnerable method is actually called and assess whether to be false-positive cases, we load source codes decompiled by
the sensitive data is accessed. The system will record all the vul- MobSF in Visual Studio Code [13] and use its global search feature
nerabilities listed in the Manifest Weaknesses and Vulnerabilities to find invoked methods mentioned in suspected sink paths. Finally,
categories in Table 4. Further, the trackers in apps, e.g. Google after analyzing the logic from source code, false-positive cases are
Firebase Analytics, Facebook Analytics, and Microsoft Ap- confirmed.
pcenter Analytics are detected by the Tracker Detection module
and will be recorded in the Privacy Leaks category in Table 4. D THREATS TO VALIDITY
The vetted vulnerabilities include SQL injection, IP address dis-
Potential limitations to our methodology. Considering that
closure, hard-coded encryption keys, improper encryption, use of
both the core mechanisms of MobSF and FlowDroid heavily rely
insufficiently random values (CWE 330) [3], insecure hash func-
on keywords matching, a potential cause of false negatives is largely
tions, and remote WebView debugging is enabled. We detail the
due to the scope of keywords. Concretely, in MobSF, there may
manual inspections adopted subsequently in the vetting process in
exist vulnerabilities not defined in analysis rules. Similarly, in our
Appendix B and limitations to the code analysis vetting methods
data flow vetting, as we utilize the sources and sinks extracted by
in Appendix D.
SuSi project, there may exist sensitive leakage that does not match
Data flow analysis. We conduct a data flow analysis using Flow- any sources or sinks or that is not detected. We aim to improve
Droid [15] to screen out high risk privacy leaks. Such data flow the false negative by updating the rules and keywords database of
analysis extracts the paths from data sources to sinks, and the MobSF and FlowDroid in future avenue. Currently, our vetting
statements transmitting the data outside of the application. We use focuses more on the identified vulnerabilities and privacy leakage
the sources and sinks inferred by SuSi project [53] which defines paths.
sources as calls to resource methods, e.g. getLatitude() and data-
base.Cursor.getString(), while sinks are methods that may leak
sources, e.g. Log.e() and Bundle.putAll().
FlowDroid searches the application for lifecycle and callback
methods and then generates a call graph. Starting at the detected
sources, the analysis tracks taints by traversing the call graph. If
private data flows from a source to sink, it indicates that there is
a risky privacy leak path. To remove false-positives, we conduct
a backward flow analysis. If the vulnerable code is reachable, we
determine it is a valid privacy leak. For example, if we find there is
sensitive data that flows into a sink (e.g. Bundle, Log output, SMS)
unauthorized users can access, we will trace it backwards to its
13

You might also like