Deciphering Malware'S Use of Tls (Without Decryption) : Blake Anderson Subharthi Paul David Mcgrew
Deciphering Malware'S Use of Tls (Without Decryption) : Blake Anderson Subharthi Paul David Mcgrew
Decryption)
Abstract—The use of TLS by malware poses new challenges To further motivate the need for a study exposing mal-
to network threat detection because traditional pattern-matching ware’s use of TLS, we consider the limitations of a pattern-
arXiv:1607.01639v1 [cs.CR] 6 Jul 2016
techniques can no longer be applied to its messages. However, matching approach when faced with TLS, and analyzed a
TLS also introduces a complex set of observable data features popular community Intrusion Protection System (IPS) rule set
that allow many inferences to be made about both the client [32]. As of this writing, there were 3,437 rules in that set,
and the server. We show that these features can be used to
detect and understand malware communication, while at the same
3,307 of which inspect packet contents. Only 48 rules were
time preserving the privacy of benign uses of encryption. These TLS specific, and of those, only 6 detected malware, using
data features also allow for accurate malware family attribution strings in self-signed certificates. Of the remainder, 19 detect
of network communication, even when restricted to a single, Heartbleed or other overflow attacks against TLS implemen-
encrypted flow. tations, and 23 detect plaintext over ports typically assigned
to TLS. These numbers show that traditional signature-based
To demonstrate this, we performed a detailed study of how techniques have not heavily invested in TLS-specific malware
TLS is used by malware and enterprise applications. We provide signatures to date. However, the rules that match certificate
a general analysis on millions of TLS encrypted flows, and a
targeted study on 18 malware families composed of thousands
strings hint that it is possible to detect malware through the
of unique malware samples and ten-of-thousands of malicious passive inspection of TLS. Our goal in this paper to confirm
TLS flows. Importantly, we identify and accommodate the bias and substantiate this idea, by identifying data features and
introduced by the use of a malware sandbox. The performance illustrating methodologies that allow for the creation of rules
of a malware classifier is correlated with a malware family’s use and machine learning classifiers that can detect malicious,
of TLS, i.e., malware families that actively evolve their use of encrypted network communication. For instance, we identify
cryptography are more difficult to classify. features of both the TLS client and server gathered from
unencrypted handshake messages that could be used to create
We conclude that malware’s usage of TLS is distinct from
benign usage in an enterprise setting, and that these differences
IPS rules.
can be effectively used in rules and machine learning classifiers. In this paper, we provide a comprehensive study of mal-
ware’s use of TLS by observing the unencrypted TLS hand-
shake messages. We give a high-level overview of malware’s
I. I NTRODUCTION
use of TLS compared to what we have observed on an
Encryption is necessary to protect the privacy of end users. enterprise network. Enterprise traffic typically uses up-to-date
In a network setting, Transport Layer Security (TLS) is the cryptographic parameters that are indicative of up-to-date TLS
dominant protocol to provide encryption for network traffic. libraries. On the other hand, malware typically uses older and
While TLS obscures the plaintext, it also introduces a complex weaker cryptographic parameters. Malware’s usage of TLS is
set of observable parameters that allow many inferences to be distinct compared to enterprise traffic, and, for most families,
made about both the client and the server. this fact can be leveraged to accurately classify malicious
traffic patterns. We examine these difference from both a TLS
Legitimate traffic has seen a rapid adoption of the TLS client and a TLS server perspective.
standard over the past decade, with some studies stating that
as much as 60% of network traffic uses TLS [1]. Unfortunately, In addition to our in-depth technical analysis, it is inter-
malware has also adopted TLS to secure its communication. In esting to note the general tone that malware authors have
our dataset, ∼10% of the malware samples use TLS. This trend towards encryption. There is an FAQ section in the open-
makes threat detection more difficult because it renders the sourced Zeus/Zbot malware [3] where the following question
use of deep packet inspection (DPI) ineffective. It is important and answer occur (content left as is):
to determine whether encrypted network traffic is benign or
Question: Why traffic is encrypted with symmetric
malicious, and do so in a way that preserves the integrity of
encryption method (RC4), but not asymmetric (RSA)?
the encryption. And while 10% of malware samples utilizing
TLS seems low, we make the assumption that this number will Answer: Because, in the use of sophisticated algorithms
increase as the level of encryption in network traffic increases. it makes no sense, encryption only needs to hide traffic.
Along these lines, we have seen a slight, but statistically
significant, increase in malicious, encrypted traffic over the In the current privacy climate, this attitude most certainly does
past 12 months. not hold for enterprise network traffic [4], [26]. Again, this
divergence is another tool we can take advantage of to more Port Percentage of TLS Flows
accurately classify malicious flows. 443 98.4%
When applying machine learning classifiers on a per-family 9001 1.2%
basis, it is clear that some families/subfamilies are more 80 0.1%
difficult to classify. Our goal is not to show optimized machine 9101 0.1%
learning classifiers, but rather to identify what characteristics 9002 0.1%
of the specific family make it difficult to classify. For instance,
we can correlate poor classifier performance on encrypted TABLE I: Based on malware data collected between August
traffic patterns with one family’s use of strong [33] and varied 2015 and May 2016, we investigated which ports malware used
cryptography. We also examine additional features extracted the most for TLS Encrypted communication.
from unencrypted TLS handshake messages that significantly
increase the performance of the classifiers. In general, we have
found this approach to be fruitful: identify weaknesses in the
features used to represent a flow on a per-family basis, and then some interesting features about the servers that the malware
augment that representation with more informative features. samples are connecting to, our main focus is client oriented.
The classification algorithms we develop are heavily dependent
Finally, we show how we can perform family attribution
on client-based features, which allows our algorithms to cor-
given only network based data. This problem is positioned as a
rectly classify a malicious agent connecting to google.com
multi-class classification problem where each malware family
versus a typical enterprise agent connecting to google.com,
has its own label. We identify families who use identical TLS
i.e., we can leverage the client’s cryptographic parameters to
parameters, but can still be accurately classified because their
differentiate these two events. For this reason, we do not filter
traffic patterns with respect to other flow-based features are
the malware’s TLS traffic to only include command and control
distinct. We also identify subfamilies of malware that cannot be
flows, but also allow other types of TLS-encrypted traffic such
distinguished from one another with only their network data.
as click-fraud.
We are able to achieve an accuracy of 90.3% for the family
attribution problem when restricted to a single, encrypted flow, In this paper, we focus on TLS encrypted flows over
and an accuracy of 93.2% when we make use of all encrypted port 443 to make the comparisons between enterprise TLS
flows within a 5-minute window. and malicious TLS be as unbiased as possible. To further
motivate this choice, Table I lists the 5 most used ports for
We use a commercial sandbox environment to collect the
TLS by the malware samples collected between August 2015
first five minutes of a malware sample’s network activity.
and May 2016. To determine if a flow was TLS, we used
We collected tens-of-thousands of unique malware samples
deep packet inspection and a custom signature based on the
and hundreds-of-thousands of malicious, encrypted flows from
TLS versions and message types of the clientHello and
these samples. We collected millions of TLS encrypted flows
serverHello messages. In total, we found 229,364 TLS
from an enterprise network to compare against the malware
flows across 203 unique ports, and port 443 was by far the
data. We used an open source project to collect the data
most common port for malicious TLS. Although the diversity
and transform it to a JSON format that contained the typical
of port usage in malware was great, these diverse ports were
network 5-tuple, the sequence of packet lengths and inter-
relatively uncommon.
arrival times, the byte distribution, and the unencrypted TLS
handshake information. All of the analysis done in this paper Given that our non-malware data was collected on an
uses only network data, and does not assume an endpoint enterprise network, it naturally follows that the categorization
presence. and classification results presented in this paper are most
applicable to the enterprise setting. We do not claim that these
The remainder of the paper is organized as follows: Section
results hold for the general class of networks, e.g., service
II outlines some basic assumptions we make with respect to
provider data. That being said, we do believe that securing
the data and our methodology, and Section III reviews how
enterprise networks is an important use case and that the
we obtained our data, specifies the datasets we use for each
conclusions presented in this paper offer enterprise network
experiment, and what features we use to classify the network
operators significantly novel and valuable results.
flows. Section IV gives an overview of how malware’s usage
of TLS differs from that of an enterprise network from both The enterprise network data used in this paper was initially
the perspective of a TLS client and a TLS server. Section V filtered using a well known IP blacklist [10]. This removed
shows which families are difficult to classify from a network ∼0.05% of the initial traffic. After this filtering stage, we take
flow point-of-view, and explains why this is the case, and the data “as-is”. We are aware that there is most likely more
Section VI gives results showing how we can attribute a flow malicious traffic in this dataset, but this fact is just taken as a
to a particular family. Section VII reviews previous and related base assumption for reasons of practicality.
work, Section VIII outlines some limitations of our approach,
and finally, we conclude in Section IX. III. DATA
II. P RELIMINARIES AND A SSUMPTIONS The data for this paper was collected from a commer-
cial sandbox environment where users can submit suspicious
Our primary concern in this paper is to categorize and executables. Each submitted sample is allowed to run for 5
classify malicious, TLS encrypted flows. While we do use the minutes. The full packet capture is collected and stored for
serverHello and certificate messages to highlight each sample. Due to constraints of the sandbox environment,
2
Malware Family Unique Samples Encrypted Flows We first analyze the differences between the TLS pa-
Bergat 192 332 rameters typically seen on an enterprise network versus the
Deshacop 69 129 TLS parameters used by the general malware population. To
Dridex 38 103 proceed, we first removed all of the TLS flows that offered
an ordered ciphersuite list that matched a list found in the
Dynamer 118 372
default Windows XP SChannel implementation [23]. This
Kazy 228 1,152 was done to help ensure that the TLS clients we observed
Parite 111 275 were representative of the malware’s behavior and not that of
Razy 117 564 the TLS library provided by the underlying operating system.
Sality 612 1,200 This removed ∼40% of the malicious TLS flows and ∼0.4% of
Skeeyah 81 218 the enterprise TLS flows. After this filtering stage, we used all
Symmi 494 2,618 of the TLS flows in our dataset. From August 2015 until May
Tescrypt 137 205 2016, we collected 133,744 TLS flows initiated by malicious
programs. During the 4 day periods in May and June 2016,
Toga 156 404
we collected 1,500,005 TLS flows from an enterprise network.
Upatre 377 891 All of these TLS flows successfully negotiated the full TLS
Virlock 1,208 12,847 handshake and sent application data.
Virtob 115 511
Yakes 100 337 To analyze the differences between the TLS parameters
used by different malware families, we used the malware sam-
Zbot 1,291 2,902
ples from October 2015 to May 2016 that had an identifiable
Zusy 179 733 family name. Table II gives a summary of the number of
Total 5,623 25,793 samples and flows for each malware family. The family name
was generated by a majority vote from the signatures provided
TABLE II: Summary of the malicious families used in our by VirusTotal [2]. Malware samples without a clear family
analysis. We collected 18 malicious families, 5,623 malicious name were discarded, i.e., any sample without at least four
samples, and 25,793 encrypted flows that successfully negoti- different antivirus programs using the same name (ignoring
ated the TLS handshake and sent application data. common names such as Trojan). Family names with less
than 100 flows were not used. This process pruned our original
set of 20,548 samples that used TLS to 5,623 unique samples
across 18 families. It is difficult to determine the family, if any,
all network traffic observed in the sandbox is considered to be associated with a malware sample, even with the information
that of the originally submitted sample. For instance, if sample provided through dynamic analysis in a sandbox setting. These
A downloads and installs B and C, then the traffic generated samples generated 25,793 TLS encrypted flows that success-
from B and C would be considered A’s. fully negotiated the TLS handshake and sent application data.
This method of data collection is straightforward, and while In this paper, we also make use of machine learning
it ignores some details about what is occurring on the endpoint, classifiers in three experiments. The first is to demonstrate the
it is consistent with our goal of understanding each sample value of the additional TLS features through 10-fold cross-
based solely on its network communications. Some biases were validation. For this experiment, we use all of the malicious
introduced with this approach. First, to reduce the number of TLS flows collected from August 2015 until May 2016, and a
false positives, we only considered samples that were known random subset of the May and June 2016 enterprise network’s
bad. In this setting, known bad means hitting on four or more TLS flows. In total, there were 225,740 malicious and 225,000
antivirus convictions from unique vendors in VirusTotal [2]. enterprise flows for this experiment. To account for the bias
Second, due to hardware constraints, the samples are only that the Windows XP-based sandbox could introduce, we also
allowed to run for 5 minutes in a Windows XP-based virtual present results on a dataset composed of only flows that offered
machine. Any encrypted network traffic that happens after this an ordered ciphersuite list that did not match a list found in
initial 5 minute window will not be captured. Similarly, any the default Windows XP SChannel implementation: 133,744
samples that are not compatible with Windows XP will not malicious and 135,000 enterprise TLS flows.
run in this environment.
The enterprise data was collected from an enterprise net- In the next set of experiments, we analyzed how well a
work with ∼500 active users and ∼4,000 unique IP addresses. trained classifier is able to detect the TLS flows generated by
The majority of the machines on this network run Windows the different malware families. To train the classifier, we used
7, with the second most popular operating system being OS X the same 225,000 enterprise flows as above for the negative
El Capitan. class, and 76,760 malicious TLS flows collected during August
and September 2015 for the positive class. The testing data
consisted of the TLS flows from October 2015 to May 2016
A. Dataset and Sample Selection
that could be assigned a ground truth family as described
The malware traffic used in this paper was collected from above. Again, Table II gives a summary of the number of
August 2015 to May 2016, and the enterprise traffic was samples and flows for each malware family. While we do
collected during a 4 day period in May 2016 and a 4 day period not remove flows that offered an ordered ciphersuite list that
in June 2016. In this work, we performed several experiments matched a list found in the default Windows XP SChannel
on different subsets of this data. implementation in this experiment, we do make explicit the
3
families that have this bias. web server and client, and is supported by most major web
servers. HTTPS typically uses port 443.
Finally, to assess the malware family attribution potential of
TLS handshake metadata, we used 10-fold cross-validation and The TLS version, the ordered list of offered ciphersuites,
multi-class classification on the data listed in Table II. Again, and the list of supported TLS extensions are collected from
we do not remove samples that offered an ordered ciphersuite the client hello message. The selected ciphersuite and
list that matched a list found in the default Windows XP selected TLS extensions are collected from the server
SChannel implementation in this experiment because all of hello message. The server’s certificate is collected from the
the samples would have the same bias. certificate message. The client’s public key length is
collected from the client key exchange message, and
B. Feature Extraction is the length of the RSA ciphertext or DH/ECDH public key,
depending on the ciphersuite. Similar to the sequence of packet
To extract the data features of interest, we wrote software lengths and times, the sequence of record lengths, times, and
tools to extract the data features of interest from live traffic types is collected from TLS sessions.
or packet capture files. The open source project will export
all of the data in a convenient JSON format. The machine In our classification algorithms, the list of offered cipher-
learning classifiers are built using traditional flow features, suites, the list of advertised extensions, and the client’s public
traditional “side-channel” features, and features collected from key length were used. 176 offered ciphersuite hex codes were
the unencrypted TLS handshake messages. observed in our full dataset, and a binary vector of length
176 was created where a one is assigned to each ciphersuite
1) Flow Metadata: The first set of features investigated are
in the list of offered ciphersuites. Similarly, we observed 21
modeled around traditional flow data that is typically collected
unique extensions, and a binary vector of length 21 was created
in devices configured to export IPFIX/NetFlow. These features
where a one is assigned to each extension in the list of
include the number of inbound bytes, outbound bytes, inbound
advertised extensions. Finally, the client’s public key length
packets, outbound packets; the source and destination ports;
was represented as a single integer value. In total, 198 TLS
and the total duration of the flow in seconds. These features
client-based features were used in the classification algorithms.
were normalized to have zero mean and unit variance.
In some experiments, we use an additional TLS server-based
2) Sequence of Packet Lengths and Times.: The sequence binary feature: whether the certificate was self-signed or not.
of packet lengths and packet inter-arrival times (SPLT) has
been well studied [25], [39]. In our open source imple-
mentation, the SPLT elements are collected for the first 50 IV. M ALWARE FAMILIES AND TLS
packets of a flow. Zero-length payloads (such as ACKs) and Although malware uses TLS to secure its communication,
retransmissions are ignored. our data suggests that for the majority of the families we
A Markov chain representation is used to model the SPLT analyzed, malware’s use of TLS is quite distinct from that
data. For both the lengths and times, the values are discretized of the enterprise network’s traffic. In this section, we highlight
into equally sized bins, e.g., for the length data, 150 byte bins these differences from the perspective of the TLS client and
are used where any packet size in the range [0,150) will go also from the perspective of the TLS server.
into the first bin, any packet size in the range [150,300) will go For the comparisons between general malware and enter-
into the second bin, etc. A matrix A is then constructed where prise traffic, we first removed all of the TLS flows that offered
each entry, A[i, j], counts the number of transitions between an ordered ciphersuite list that matched a list found in the
the i’th and j’th bin. Finally, the rows of A are normalized to default Windows XP SChannel implementation [22], [29]
ensure a proper Markov chain. The entries of A are then used from our full dataset. We found that ∼40% of TLS flows
as features to the machine learning algorithms. from malware samples offered this list. To help ensure that
3) Byte Distribution.: The byte distribution is a length-256 our analysis was capturing trends in the malware’s use of
array that keeps a count for each byte value encountered in TLS, and not that of the underlying operating system, we
the payloads of the packets for each packet in the flow. The removed all of these flows. After this filtering stage, we used
byte value probabilities can be easily computed by dividing the all of the TLS flows in our dataset. From August 2015 to May
byte distribution counts by the total number of bytes found in 2016, we collected 133,744 TLS flows initiated by malicious
the packets’ payloads. The 256 byte distribution probabilities programs that successfully negotiated the full TLS handshake
are used as features by the machine learning algorithms. and sent application data. In May and June 2016, we collected
The full byte distribution provides a lot of information about 1,500,005 TLS flows from an enterprise network using the
the encoding of the data. Additionally, the byte distribution same criteria.
can give information about the header-to-payload ratios, the
composition of the application headers, and if any poorly The malware data collection process can introduce biases
implemented padding is added. in terms of malware family representation, and the conclusions
that can be made from the TLS parameters collected. To
4) Unencrypted TLS Header Information.: TLS (Transport account for this, we also analyze the TLS clients that malware
Layer Security) is a cryptographic protocol that provides uses and the TLS servers that malware connects to on a per-
privacy for applications. TLS is usually implemented on top of family basis. In this analysis, we highlight the families that
common protocols such as HTTP for web browsing or SMTP use the default Windows TLS library, and the families which
for email. HTTPS is the usage of TLS over HTTP, which is include their own TLS client. The data for this experiment is
the most popular way of securing communication between a listed in Table II.
4
100 Offered Ciphersuites 100 Advertised TLS Extensions
Benign
80 80 Malware
Percentage of Flows
60 60
40 40
20 20
0 0
12
50
35
05
05
23
15
17
10
3
74
9
04
6b
0b
0a
4
b
0a
0d
a
2f
0f
f
c02
ff0
c01
c00
c00
c01
c02
00
00
00
75
00
00
00
00
00
00
00
00
00
33
00
00
00
00
60 Client's Public Key Length 30 TLS Client
50 25
Percentage of Flows
40 20
30 15
20 10
10 5
0 ) ) ) ) )
0 7 51 11 6 12 .x
RSA RSA RSA DSA RSA .x 0 9 5
HE_ HE_ 2048 ( _EC (DHE_ fox 4 me IE fari 9.h0romeS5afari irefoxO4pera Tor 0.2Opera 1
C D C D H E 8 Fire Chro Sa C F
(E (E (EC
D 204
512 768 512
Fig. 1: Malware’s use of TLS versus that of enterprise network traffic relative to the TLS client features. Some values and the
full ciphersuite names were omitted for clarity of presentation. Ciphersuites and extensions are represented as hex codes, which
are given in full in Appendix A.
5
Malware Number Most Seen Number of Distinct Most Frequently Client’s
Family of Flows TLS Client Ciphersuite Offer Vectors Advertised Extension Public Key
Bergat 332 IE 8* 1 None 2048-bit (RSA)
Deshacop 129 Tor Browser 4 3 SessionTicket TLS 2048-bit (RSA)
Dridex 103 IE 11 5 ec_point_formats 2048-bit (RSA)
supported_groups
renegotiation_info
Dynamer 372 Tor 0.2.2 10 SessionTicket TLS 512-bit (ECDHE_RSA)
Kazy 1152 IE 8* 5 None 2048-bit (RSA)
Parite 275 IE 8* 11 None 2048-bit (RSA)
Razy 564 Tor Browser 4 8 None 2048-bit (RSA)
Sality 1,200 IE 8* 133 None 2048-bit (RSA)
Skeeyah 218 Tor 0.2.7 11 SessionTicket TLS 512-bit (ECDHE_RSA)
Symmi 2,618 Opera 15 19 ec_point_formats 512-bit (ECDHE_RSA)
supported_groups
Tescrypt 205 IE 8* 6 None 2048-bit (RSA)
Toga 404 Tor 0.2.2 2 SessionTicket TLS 2048-bit (RSA)
ec_point_formats
supported_groups
Upatre 891 IE 8* 3 None 2048-bit (RSA)
Virlock 12,847 Opera 12 1 signature_algorithms 2048-bit (DHE_RSA)
Virtob 511 IE 8* 4 None 2048-bit (RSA)
Yakes 337 IE 8* 3 None 2048-bit (RSA)
Zbot 2,902 IE 8* 12 None 2048-bit (RSA)
Zusy 733 IE 8* 7 None 2048-bit (RSA)
TABLE III: The most popular TLS client configurations for the 18 malicious families. The TLS client was estimated using TLS
fingerprinting techniques [29]. For TLS extensions, in the case of a tie, all equally probable extensions are listed. (*) indicates
the fingerprint of the TLS client provided by the underlying sandbox operating system.
Finally, we mapped the TLS client parameters to well likely an artifact of the underlying Windows environment.
known client programs that use specific TLS libraries and
Table III also lists the number of distinct ciphersuite offer
configurations [29]. This information could be spoofed, but we
vectors observed for each malware family. In this context, a
feel that this is still a valuable and compact way to represent
client is taken to be unique if it has a different set of offered ci-
a client. As shown in Figure 1, the most popular clients for
phersuites and advertised extensions. Some families have very
malware and enterprise TLS connections are quite distinct. In
few unique clients, e.g., Bergat. On the other hand, Sality has
the enterprise setting, we found that the four most common
a large number of distinct ciphersuite offer vectors. And while
client configurations resembled the most recent releases of
Sality’s most used TLS client offered parameters similar to
the four most popular browsers: Firefox 47, Chrome 51,
Internet Explorer 8, it had hundreds of other unique
Internet Explorer 11, and Safari 9. On the other
combinations of offered ciphersuites and advertised extensions.
hand, malware most frequently used TLS client parameters
that matched those of Opera 12, Firefox 46, and Tor
B. TLS Servers
0.2.x.
1) Malware versus Enterprise: Figure 2 illustrates the
2) Malware Families: Table III gives the most popular differences between the servers connected to by the malware
TLS client parameters for each of the 18 malware families we and the enterprise TLS clients after filtering clients that used
had access to. The most popular TLS client was Internet typical Windows XP ciphersuite lists. The filtering was done
Explorer 8, which was used most frequently by 10 of for the server statistics because those clients have a significant
the 18 families. These families and client values are listed impact on what is sent in the server hello message.
for completeness, but should more accurately be read as
utilizing the TLS library provided by the underlying Windows As seen in Figure 2, the selected ciphersuites of the
environment. server hello messages are sharply divided for the major-
ity of enterprise and malicious TLS sessions. The following
The Tor client and browser were very popular among the four ciphersuites were selected by ∼90% of the servers that
malware families, being the most popular with Deshacop, malware communicated with:
Dynamer, Razy, Skeeyah, and Toga. Dynamer, Skeeyah, and
Symmi all used a 512-bit (ECDHE_RSA) public key as opposed • 0x000a
to the most popular public key: 2048-bit (RSA), which is most (TLS_RSA_WITH_3DES_EDE_CBC_SHA)
6
60 Selected Ciphersuites 60 Selected TLS Extensions
Benign
50 50 Malware
Percentage of Flows 40 40
30 30
20 20
10 10
0 f 5 0 4 b 9 a 4 b 5 f
0 f
1 b 0 0 7 3 5 4
c02 003 c03 c01 c02 003 000 000 006 000 002 ff0 000 000 001 001 002 000 000 337
10
8 30
6 20
4
10
2
0 0 1
05 83 51 58 36
37
5 2 10 5 7 12 3
73 36 63 59
Fig. 2: Malware’s use of TLS versus that of enterprise network traffic relative to the TLS server features. Some values and the
full ciphersuite names were omitted for clarity of presentation. Ciphersuites and extensions are represented as hex codes, which
are given in full in Appendix A.
7
Malware Number Unique Number of Selected Certificate
Family of Flows Server IPs SS Certs Ciphersuite Subject
Bergat 332 12 0 TLS_RSA_WITH_3DES_EDE_CBC_SHA www.dropbox.com
Deshacop 129 38 0 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.onion.to
Dridex 103 10 89 TLS_RSA_WITH_AES_128_CBC_SHA amthonoup.cy
Dynamer 372 155 3 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 www.dropbox.com
Kazy 1152 225 52 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.onestore.ms
Parite 275 128 0 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.google.com
Razy 564 118 16 TLS_RSA_WITH_RC4_128_SHA baidu.com
Sality 1,200 323 4 TLS_RSA_WITH_3DES_EDE_CBC_SHA vastusdomains.com
Skeeyah 218 90 0 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 www.dropbox.com
Symmi 2,618 700 22 TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA *.criteo.com
Tescrypt 205 26 0 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.onion.to
Toga 404 138 8 TLS_RSA_WITH_3DES_EDE_CBC_SHA www.dropbox.com
Upatre 891 37 155 TLS_RSA_WITH_RC4_128_MD5 *.b7websites.net
Virlock 12,847 1 0 TLS_DHE_RSA_WITH_AES_256_CBC_SHA256 block.io
Virtob 511 120 0 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.g.doubleclick.net
Yakes 337 51 0 TLS_RSA_WITH_RC4_128_SHA baidu.com
Zbot 2,902 269 507 TLS_RSA_WITH_RC4_128_MD5 tridayacipta.com
Zusy 733 145 14 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.criteo.com
TABLE IV: TLS server configurations for the servers most visited by the 18 malicious families. The certificate subject typically
has a long tail, but only the most frequent is reported. The reported number of self-signed certificates is not necessarily related
to the most popular certificate subject.
It is also interesting to note the frequency of TLS servers We used a logistic regression classifier with an l1 penalty
using certificates that are self-signed. In the enterprise data, [20] for all classification results. For the initial binary-class
1,352 out of the 1,500,005 TLS sessions, or ∼0.09%, used classification results, we trained four machine learning classi-
a self-signed certificate. In the malware data, 947 out of the fiers using different subsets of data features we collected. The
133,744 TLS sessions, or ∼0.7%, used a self-signed certificate, first classifier used the flow metadata (Meta), the sequence
which is roughly an order of magnitude more frequent than the of packet lengths and inter-arrival times (SPLT), and the
enterprise case. distribution of bytes (BD). The second classifier only used
the TLS information (TLS). The third classifier was trained
2) Malware Families: Table IV lists several interesting using the same features as the first, with the addition of the
statistics about the servers that malware most often connects TLS client information, specifically, the offered ciphersuites,
to. Some of the malicious families connect to a large number advertised extensions, and the client’s public key length. The
of unique IP addresses, e.g., Symmi and Dynamer. The family fourth classifier was trained with all data, and an additional,
with the most flows, Virlock, only connects to 1 unique IP custom feature: whether the server certificate was self-signed
address owned by block.io. (SS).
8
All Data No SChannel
Dataset Total Accuracy 0.01% FDR Total Accuracy 0.01% FDR
Meta+SPLT+BD+TLS+SS 99.6% 92.6% 99.6% 87.4%
Meta+SPLT+BD+TLS 99.6% 92.8% 99.6% 87.2%
TLS 98.2% 63.8% 96.7% 59.1%
Meta+SPLT+BD 98.9% 1.3% 98.5% 0.9%
TABLE V: Classifier accuracy for different combinations of data features, showing the overall accuracy and the accuracy at a
0.01% FDR.
TABLE VI: Classifier accuracy when separated by family. Families with an (*) offered an ordered ciphersuite list that matched a
list found in the default Windows XP SChannel implementation. Malware data from August and September 2015, and enterprise
data from May and June 2016 were used for training; these malware samples were collected from October 2015 to May 2016.
Results using unencrypted TLS handshake messages are given in addition to results based on only standard side-channel features.
The two baselines are the first two data columns: side-channel-only and TLS-only.
of the malicious TLS flows collected from August 2015 until The removal of the Windows XP SChannel TLS flows had
May 2016, and a random subset of the May and June 2016 no effect on the total accuracy of the classifiers based on all
enterprise network’s TLS flows. In total, there were 225,740 data views, but does reduce the performance at a 1-in-10,000
malicious and 225,000 enterprise flows for this experiment. FDR by ∼5%.
To account for the bias that the Windows XP-based sandbox
could introduce, we also present results on a dataset composed
of only flows that did not offer an ordered ciphersuite list that B. Malware Families
matched a list found in the default Windows XP SChannel
implementation [29]: 133,744 malicious and 135,000 enter- To determine how well a trained classifier is able to detect
prise TLS flows. the TLS flows generated by the different malware families,
we first trained the four classifiers from Table V on the
The 10-fold cross-validation results for the above problem same 225,000 enterprise flows as above for the negative class,
is shown in Table V. We see that using all available data and 76,760 malicious TLS flows collected during August and
views significantly improves the results. A 1-in-10,000 false September 2015 for the positive class. These binary classifiers
discovery rate (FDR) is defined as the accuracy on the positive were applied to the testing data consisting of the TLS flows
class given that only 1 false positive is allowed for every from October 2015 to May 2016, summarized in Table II.
10,000 true positives. As these results show, not using TLS While we do not remove flows that offered an ordered cipher-
header information leads to significantly worse performance, suite list that matched a list found in the default Windows XP
especially in the important case of a fixed, 1-in-10,000 FDR. SChannel implementation in this experiment, we do make
9
100 Offered Ciphersuites 100 Advertised TLS Extensions
Dridex
80 80 Virlock
Percentage of Flows
60 60
40 40
20 20
0 0
1
00 5
00 5
00 7
33
23
00 9
00
c003
00 9
04
6b
0b
00 c
c014
00 a
00 d
00 d
0a
0d
c00a
00 f
0f
2
ff0
3
0
0
6
3
c01
00
00
00
00
00
00
00
00
100 Client's Public Key Length 100 TLS Client
80 80
Percentage of Flows
60 60
40 40
20 20
0 ) ) 4 6 8
0 1 7
RSA RSA 206 205 204 IE 1 IE 9 IE 1
0 IE 8 6
fox 4 era 12 1
20 48 ( CDHE_ Fire Op
51 2 (E
Fig. 3: Dridex’s use of TLS versus that of Virlock’s. Some values and the full ciphersuite names were omitted for clarity of
presentation. Ciphersuites and extensions are represented as hex codes, which are given in full in Appendix A.
explicit the families that have this bias in the majority of their *.onion.to, and use TLS client configurations that indicate
flows. the Tor Browser for some of their TLS connections. This
is particularly interesting because a major goal of the Tor
Table VI lists the classification accuracy of the four classi- Browser is to maintain the privacy of its users, which in this
fiers for each family. Because only malware data was used to case are the malware families.
test the trained classifiers, false positives for this experiment
are ill-defined and are therefore not reported. In the August The classifier based only on the TLS data was able to
and September 2015 malware training data, there was strong perform quite well on the malware families that used TLS
representation of the malicious families presented in this paper, client configurations that matched those of Windows XP
but there were not any exact SHA1 matches. There were four SChannel-based clients, but this result is not guaranteed
families that had no representation in August or September: to hold if the malware runs on another operating system.
Bergat, Yakes, Razy, and Dridex. The TLS-only classifier performed the worst on most of the
families that used TLS client configurations that did not match
For the most part, combining traditional flow metadata, those of Windows XP SChannel-based clients, with the
typical side-channel information, and the TLS specific features exception of Toga and Virlock. Both of these families did a
led to the best performing machine learning models. Out of all poor job at varying the TLS client parameters in our dataset,
families, our classifiers with all data views performed the worst and they both used TLS client parameters that indicated older
on Deshacop with a 96.1% true positive rate. With respect to versions of clients: Toga → Tor 0.2.2 and Virlock →
only the malware families that primarily used ciphersuites sim- Opera 12.
ilar to those used by Windows XP SChannel-based clients,
our classifiers with all data views performed the worst on The machine learning classifiers were able to perform
Tescrypt with a 97.6% true positive rate. Both of these families reasonably on most malware families, with the exception of
most often visited servers with a server certificate subject of Dridex. Dridex was one of four families that did not have
10
Malware Similarity Matrix w.r.t. TLS Usage 1.0
parite
tescrypt 0.9
zbot
kazy 0.8
zusy
sality 0.7
yakes
razy 0.6
deshacop
bergat 0.5
upatre
0.4
virtob
skeeyah
0.3
dynamer
symmi 0.2
toga
virlock 0.1
dridex
ber op
tog i
dyn h
tesc te
virt e
ex
ty
t
es
des y
ske b
sym er
zbot
upa t
y
ock
m
ryp
ga
zus
tr
raz
kaz
eya
o
sali
i
am
yak
drid
hac
par
virl
Fig. 4: Similarity Matrix for the different malware families with respect to the observed TLS client’s parameters.
any representation in the training data. The classifier on the Meta+SPLT+BD+TLS, 78.5%, does not inspire confidence in
other three families, Bergat, Yakes, and Razy, had ∼96-100.0% a system designed to detect malicious, encrypted traffic. Our
total accuracy. In the case of Bergat and Yakes, this good hypothesis was that, although Dridex varies the behavior of
performance is expected because these families offered an its TLS clients, there might be an invariant with the servers
ordered ciphersuite list that matched a list found in the default that Dridex communicates with that would allow us to more
Windows XP SChannel implementation. easily classify these encrypted flows. Upon manual inspection,
this hypothesis was confirmed. We included a binary feature
Figure 3 shows Dridex’s use of TLS from a client point- indicating whether the server certificate was self-signed (de-
of-view. Unlike most of the other families, Dridex most often noted as SS), and retrained our machine learning classifier
selects: with this new feature. The 10-fold cross-validation results on
• 0x002f (TLS_RSA_WITH_AES_128_CBC_SHA) the training data were nearly identical. With the self-signed
feature, the new classifier with all data sources achieved an
Figure 2 shows that this ciphersuite is not uncommon for accuracy of 97.9% on Dridex, a significant improvement.
enterprise TLS sessions. Dridex also advertises several TLS
extensions and offers many current ciphersuites in the client
hello message. VI. FAMILY ATTRIBUTION
Figure 3 also compares Dridex’s TLS usage with that of Being able to accurately attribute malware samples to a
Virlock’s. Virlock is an example of a malicious family that used known family is highly valuable. Attribution provides incident
the same TLS client for every sample that we observed, and responders with actionable prior information before they begin
was able to be easily classified, i.e., all four classifiers achieved to reverse engineer malware samples. From a network point-
100% accuracy. While Dridex offers a variety of strong cipher- of-view, this attribution can help to prioritize the incident
suites, Virlock offers a smaller set of outdated ciphersuites. responders time, i.e., available resources should be assigned
Virlock also only advertises the signature_algorithms to investigate more serious infections. In these results, there
TLS extensions. Another significant difference between these are no enterprise samples; we only consider malicious samples
two families is that Virlock did not alter its TLS client’s and their associated families.
behavior once in our entire dataset. Virlock always used the
same client parameters that are similar to those of Opera 12. To analyze the differences between the TLS parameters
Virlock’s lack of adaptation makes it trivial for a machine used by different malware families, we used the malware
learning, or a rule-based, system to classify. Dridex’s use of samples from October 2015 to May 2016 that had an iden-
multiple TLS clients made a significant difference in terms of tifiable family name as described in Section III. This process
detection efficacy. pruned our original set of 20,548 samples to 5,623 unique
samples across 18 families. These samples generated 25,793
As we now show, awareness of self-signed certificates TLS encrypted flows that successfully negotiated the full TLS
proved to be crucial. The classification of Dridex using handshake and sent application data.
11
Confusion Matrix (Total Accuracy=90.3%) 1.0
kazy
symmi 0.9
virlock
yakes 0.8
razy
zusy 0.7
deshacop
zbot 0.6
True label
tescrypt
0.5
parite
sality 0.4
virtob
toga 0.3
upatre
bergat 0.2
dynamer
skeeyah 0.1
dridex
0.0
zbop
mi
drid h
ite
tre
a
ex
ty
des y
t
es
y
ob
ske r
t
dyn at
y
ock
ryp
ame
zus
raz
tog
kaz
eya
sali
g
yak
hac
sym
virt
par
upa
ber
virl
tesc
Predicted label
Fig. 5: Confusion matrix for the 18-class malware family classifier. The total 10-fold accuracy of the machine learning model
was 90.3%.
A. Similar TLS Usage multi-class classification problem. For this analysis, we use all
of the malware families and data features described in Section
Figure 4 shows a similarity matrix for the 18 malware fam-
III. Similar to the enterprise versus malware results in Section
ilies with respect to their TLS clients. The offered ciphersuites,
V, we used 10-fold cross validation and l1− multinomial
advertised extensions, and the client’s public key length were
logistic regression [21]. We not only present our results in
used as features, and a standard squared exponential similarity
terms of overall classification accuracy, but also as a confusion
function was used to compute the similarity values:
matrix showing the true positives and false positives broken
X down per-family. This was done to illustrate that we were not
exp −λ (xi − xj )2 (1) simply using a naı̈ve majority-class classifier, but were in fact
i,j making useful inferences.
with λ = 1, and xi being the mean of the feature vectors for Using all available data features led to the best cross-
the i’th family. The diagonal of this matrix will be 1.0 because validated performance, with a total accuracy of 90.3% for the
each family will be perfectly self-similar. 18-class classification problem using a single, encrypted flow.
The confusion matrix for this problem is shown in Figure 5.
There is a lot of structure in Figure 4. The upper left For a given row (family) in the confusion matrix, the column
block consists of families that have some number of flows entries represent the percentage of samples identified as that
that use the default Windows XP TLS library. The group specific family. A perfect confusion matrix would have all of
of Skeeyah, Dynamer, Symmi, and Toga all heavily make its weight focused on the diagonal. As an example, most of
use of offered ciphersuite lists and advertised extensions that Kazy’s TLS flows, the first row, were identified as Kazy, the
are indicative of Tor 0.2.x. Dridex and Virlock were the first column. Some of Kazy’s TLS flows were also identified
two most dissimilar malware families. And while Dridex was as Symmi (column: 2), Yakes (column: 4), Razy (column: 5),
difficult to accurately classify, Virlock was trivial. Uniqueness and Zbot (column: 8).
is not always a desirable quality.
The majority of the TLS flows were attributed to the
B. Multi-Class Classification appropriate family with ∼80-90% accuracy. Again, the two
exceptions are Dridex and Virlock. Attribution for these two
Finally, to assess the malware family attribution potential families are trivial, in large part because of their distinctive
of TLS flows, we used the data listed in Table II, and did not use of TLS compared to other malicious families.
remove samples that offered an ordered ciphersuite list that
matched a list found in the default Windows XP SChannel There were two sets of two families that the multiclass
implementation in this experiment because all of the samples classification algorithm had problems differentiating. The first
would have the same bias. We position the problem of attribut- of these was Bergat and Dynamer. Interestingly, Bergat used a
ing a malicious TLS flow to a known malware family as a Windows XP SChannel-like TLS client, but Dynamer used
12
a tor 0.2.2-like TLS client. The confusion came from the as utilizing side-channel attacks, such as analyzing the sizes
other data views, specifically the sequence of packet lengths. and inter-arrival times of packets, to learn more information
Both of these families most often connected to servers at about a flow. In [27], the authors derive features based on the
www.dropbox.com, and had similar communication pat- packet sizes to perform website fingerprinting attacks against
terns. encrypted traffic. In our work, we are only concerned with
identifying malware communication and we use information
Finally, Yakes and Razy were another two malicious fam- specific to the TLS protocol.
ilies that the multi-class classifier could not differentiate. Like
Bergat and Dynamer, Yakes and Razy most often connected There has been previous work that uses active probing [17]
to servers at baidu.com. In fact, these two families are and passive monitoring to gain visibility into how TLS is used
subfamilies of the Ramnit family. Upon manual inspection, the in the wild [19]. Unlike [19], our results specifically highlight
network behavior of Yakes and Razy looked mostly identical. malware’s use of the TLS protocol, and show how data features
from TLS can be used in rules and classifiers.
Determining the malware family based on a single, en-
crypted flow is an unnecessarily difficult problem. In our Malware clustering and family attribution has had a lot
dataset, the malware samples often created many encrypted of exposure in the academic literature [5], [7], [28], [30]. This
flows that can be used for attribution. In this framework, one work has taken a variety of data source, e.g, HTTP or dynamic
could initially classify all of the flows in a 5 minute sliding system call traces, and clustered the samples to attribute a
window for a given host, and use the suspicious flows to sample to a malicious family. In contrast, our work gives an
perform family attribution. We first trained an independent in-depth analysis of how malware uses TLS, and shows how
flow, multi-class classifier. Then, for each window in the data features from passive monitoring of TLS can be used for
testing set, each flow was classified, and a majority vote was accurate malware identification and family attribution.
used to classify all flows within the window. This is similar
to ensemble methods in machine learning [16]. The confusion VIII. L IMITATIONS AND F UTURE W ORK
matrix resulting from 10-fold cross validation on this problem
looked very similar to that shown in Figure 5. The accuracy Our method for collecting malware data was straightfor-
of the multi-class problem increased from 90.3% using single, ward and allowed us to quickly generate a large volume of
encrypted flows to 93.2% using a simplistic multiple flow network data, but the dependence on Windows XP and 5
algorithm. While there were several families that had improved minute runs introduced some biases in our presented results.
performance, this simple, multi-flow scheme increased the We accounted for these biases by specifically considering the
accuracy of Yakes and Razy most notably. This was most likely cases in which the TLS features reflected the operating system
because Razy was more promiscuous. and not the malware, and either analyzing the data with those
cases removed, or clearly labeling and analyzing those cases
VII. R ELATED W ORK otherwise. Accounting for the bias caused by the sandbox was
essential to understanding the actual malware use of TLS.
Identifying threats in encryption poses significant chal- From a practitioners point of view, however, it is sometimes
lenges. Nevertheless, the security community has put forth two worthwhile to consider the raw, biased data. Malware often
solutions to solve this problem. The first involves decrypting targets obsolete and unpatched software because it is vulnera-
all traffic that flows through a security appliance: Man-in- ble, and thus it is biased in the same direction as the sandbox.
the-Middle (MITM) [9]. Once the traffic has been decrypted, We leave running these samples under multiple environments
traditional signature-based methods, such as Snort [31], can and collecting the additional results for future work.
be applied. While this approach can be successful at finding
threats, there are several important shortcomings. First, this After family names were associated with our malware
method does not respect the privacy of the users on the samples, the original set of 20,548 samples that used TLS was
network. Second, this method is computationally expensive reduced to a set of 5,623 unique samples across 18 families.
and difficult to deploy and maintain. Third, this method relies It is difficult to reliably determine the family, if any, asso-
on malware clients and servers to not change their behavior ciated with a malware sample, even in a structured sandbox
when a MITM interposes itself. setting. While our multi-class, malware family classifier can
reasonably be criticized for failing to provide attribution for
The second method of identifying threats in encrypted ∼3/4 of the malware samples, this fact reflects the difficulty
network traffic leverages flow-based metadata. These methods of family attribution in a dynamic analysis environment, and
examine high-level features of a network flow, such as the not a limitation of the underlying approach. In future work,
number of packets and bytes within a flow. This data is typi- the malware families for the training data can be determined
cally exported and stored as IPFIX [12] or NetFlow [11]. There by a robust clustering algorithm [5] instead of relying on a
have been several papers that push the limits of traditional consensus vote from VirusTotal [2].
flow monitoring systems. For instance, [8] uses NetFlow and
external reputation scores to classify botnet traffic. This work Like nearly all other methods of threat detection, a moti-
can also be applied to encrypted network traffic, but does not vated threat actor could attempt to evade detection by mim-
take advantage of the TLS-specific data features. icking the features of enterprise traffic. For instance, in our
case, this could take the form of attempting to offer the
In addition to pure flow-based features to detect malware’s same TLS parameters as a popular Firefox browser and
network traffic, there has been many papers that augment using a certificate issued by a reputable certificate authority.
this data with more detailed features about a flow [14], [18], But, while evasion is always possible in principle, in practice
[24], [34], [35], [36], [37], [39]. This work can been seen it poses challenges for the malware operator. Mimicking a
13
popular HTTPS client implementation requires an ongoing flows to a specific malware family. We also observed some
and non-trivial software engineering effort; if a client offers malware families using TLS in exactly the same way, e.g.,
a TLS ciphersuite or extension that it cannot actually support, Yakes and Kazy, which most often offered an ordered cipher-
the session is unlikely to complete. On the server side, the suite list that matched a list found in the default Windows XP
certificate must mimic the issuer, subjectAltName, time of SChannel implementation. We demonstrated an accuracy of
issuance, and validity period of the benign server. In either 90.3% for the family attribution problem when restricted to
case, the detection methods outlined in this paper are not meant a single, encrypted flow, and an accuracy of 93.2% when we
to be exhaustive, and in a robust system, these methods would made use of all encrypted flows within a 5-minute window.
only be one facet of the final solution. An example of extending
this methodology for robustness would be to build a profile for We conclude that data features that are passively observed
an endpoint based on the user-agent string advertised in in TLS provide information about both the client and server
the unencrypted HTTP flows. If the TLS parameters indicate software and its configuration. This data can be used to detect
a user agent that has not been observed on an endpoint, this malware and perform family attribution, either through rules
could be an interesting indicator of compromise. or classifiers. Malware’s TLS data features obtained from
sandboxes are biased, and it is essential to understand and
All of the classification results presented in this paper account for this bias when using these features
used 10-fold cross-validation and l1-logistic regression. We
have found this classifier to be very efficient and to perform
extremely well for network data feature classification. This R EFERENCES
model reports a probabilistic output, allowing one to easily [1] Most Internet Traffic will be Encrypted by Year End. Here’s
change the threshold of the classifier. We did compare l1- Why. http://fortune.com/2015/04/30/netflix-internet-traffic-encrypted/,
logistic regression with a support vector machine (Gaussian accessed: 2016-03-23
kernel, width adjusted through CV), and found no statistically- [2] Virus Total. https://www.virustotal.com/ (2016)
significant improvement using a 10-fold paired t-test at a 5% [3] Zeus Source Code. https://github.com/Visgean/Zeus (2016)
significance level [15]. Because of the added computational [4] Adrian, D., Bhargavan, K., Durumeric, Z., Gaudry, P., Green, M.,
resources needed to train the SVM and the chosen model’s Halderman, J.A., Heninger, N., Springall, D., Thomé, E., Valenta,
L., VanderSloot, B., Wustrow, E., Zanella-Béguelin, S., Zimmermann,
robustness against overfitting [38], we only reported the l1- P.: Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice.
logistic regression results. We leave examining alternative In: Proceedings of the Conference on Computer and Communications
models and quantifying their advantages for future work. Security (CCS) (2015)
[5] Anderson, B., Storlie, C., Lane, T.: Multiple Kernel Learning Clustering
IX. C ONCLUSIONS with an Application to Malware. In: 12th International Conference on
Data Mining (ICDM). pp. 804–809. IEEE (2012)
Understanding malware’s use of TLS is imperative for [6] Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh,
developing appropriate techniques to identify threats and re- S., Lee, W., Dagon, D.: From Throw-Away Traffic to Bots: Detecting
spond to those threats accordingly. In this paper, we reviewed the Rise of DGA-Based Malware. In: USENIX Security Symposium.
what TLS parameters malware typically uses from both the pp. 491–506 (2012)
perspective of the TLS client and the TLS servers that the [7] Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.:
Scalable, Behavior-Based Malware Clustering. In: Proceedings of the
samples communicated with. Even when we accounted for the Network and Distributed System Security Symposium (NDSS). vol. 9,
bias caused by the underlying sandbox’s operating system, pp. 8–11. Citeseer (2009)
we found that malware generally offers and selects weak [8] Bilge, L., Balzarotti, D., Robertson, W., Kirda, E., Kruegel, C.: Disclo-
ciphersuites and does not offer the variety of extensions that sure: Detecting Botnet Command and Control Servers through Large-
we see in enterprise clients. Scale NetFlow Analysis. In: Proceedings of the 28th Annual Computer
Security Applications Conference. pp. 129–138. ACM (2012)
We also analyzed the TLS usage of malware on a per family [9] Callegati, F., Cerroni, W., Ramilli, M.: Man-in-the-Middle Attack to the
basis. We identified malware families that are most likely HTTPS Protocol. IEEE Security & Privacy 7(1), 78–81 (2009)
to use TLS client parameters that matched the TLS library [10] Cisco Talos: IP Blacklist Feed. http://www.talosintel.com/feeds/ip-filter.
provided by Windows XP, the underlying operating system of blf (2016)
the sandbox, e.g., Bergat and Yakes; malware families that use [11] Claise, B.: Cisco Systems NetFlow Services Export Version 9 (2013),
TLS client parameters that matched the TLS library provided RFC 3954
by the underlying operating system in addition to hundreds of [12] Claise, B., Trammell, B., Aitken, P.: Specification of the IP Flow Infor-
other TLS client configurations, e.g., Sality; and families that mation Export (IPFIX) Protocol for the Exchange of Flow Information
(2013), RFC 7011
exclusively used TLS client configurations that do not match
the TLS libraries supplied by the underlying operating system, [13] Dierks, T., Rescorla, E.: The Transport Layer Security (TLS) Protocol
Version 1.2 (2008), RFC 5246
e.g., Virlock. As anticipated, we found that families who
[14] Dietrich, C.J., Rossow, C., Pohlmann, N.: Cocospot: Clustering and
actively evolve their usage of TLS are more difficult to classify. Recognizing Botnet Command and Control Channels using Traffic
We also found a malware family that used TLS parameters Analysis. Computer Networks 57(2), 475–486 (2013)
that are similar to those found on an enterprise network, and [15] Dietterich, T.G.: Approximate Statistical Tests for Comparing Super-
was difficult to classify: Dridex. But, if we leverage additional, vised Classification Learning Algorithms. Neural computation 10(7)
domain-specific knowledge such as whether the TLS certificate (1998)
was self-signed, we can significantly increase the performance [16] Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Multiple
of our classifiers. classifier systems, pp. 1–15. Springer (2000)
[17] Durumeric, Z., Wustrow, E., Halderman, J.A.: Zmap: Fast Internet-
We showed that the differences in how malware families Wide Scanning and Its Security Applications. In: USENIX Security
use TLS can be used to attribute malicious, encrypted network Symposium. pp. 605–620 (2013)
14
[18] Gu, G., Perdisci, R., Zhang, J., Lee, W.: BotMiner: Clustering Analysis A PPENDIX A
of Network Traffic for Protocol-and Structure-Independent Botnet De- C IPHERSUITE AND E XTENSION H EX C ODES
tection. In: USENIX Security Symposium. vol. 5, pp. 139–154 (2008)
[19] Holz, R., Amann, J., Mehani, O., Wachs, M., Kaafar, M.A.: TLS in the
Wild: an Internet-Wide Analysis of TLS-Based Protocols for Electronic
Communication. In: Proceedings of the Network and Distributed System Hex Code Ciphersuite
Security Symposium (NDSS) (2016)
0x0004 TLS_RSA_WITH_RC4_128_MD5
[20] Koh, K., Kim, S.J., Boyd, S.P.: An Interior-Point Method for Large-
0x0005 TLS_RSA_WITH_RC4_128_SHA
Scale l1-Regularized Logistic Regression. Journal of Machine Learning
Research 8(8), 1519–1555 (2007) 0x000a TLS_RSA_WITH_3DES_EDE_CBC_SHA
[21] Krishnapuram, B., Carin, L., Figueiredo, M.A., Hartemink, A.J.: Sparse 0x002f TLS_RSA_WITH_AES_128_CBC_SHA
Multinomial Logistic Regression: Fast Algorithms and Generalization 0x0033 TLS_DHE_RSA_WITH_AES_128_CBC_SHA
Bounds. Pattern Analysis and Machine Intelligence, IEEE Transactions
on 27(6), 957–968 (2005) 0x0035 TLS_RSA_WITH_AES_256_CBC_SHA
[22] Microsoft: Choose the right ciphersuites in SChannel. https://www.ssl. 0x0039 TLS_DHE_RSA_WITH_AES_256_CBC_SHA
com/how-to/choose-the-right-cipher-suites-in-schannel-dll/ (2016) 0x003c TLS_RSA_WITH_AES_128_CBC_SHA256
[23] Microsoft: SChannel. https://msdn.microsoft.com/en-us/library/ 0x003d TLS_RSA_WITH_AES_256_CBC_SHA256
windows/desktop/ms678421%28v=vs.85%29.aspx (2016) 0x0067 TLS_DHE_RSA_WITH_AES_128_CBC_
[24] Moore, A.W., Zuev, D.: Internet Traffic Classification Using Bayesian SHA256
Analysis Techniques. In: ACM SIGMETRICS Performance Evaluation
Review. vol. 33, pp. 50–60. ACM (2005) 0x006b TLS_DHE_RSA_WITH_AES_256_CBC_
[25] Nguyen, T.T., Armitage, G.: A Survey of Techniques for Internet Traffic SHA256
Classification using Machine Learning. Communications Surveys & 0x00fd unassigned
Tutorials, IEEE 10(4), 56–76 (2008) 0xc009 TLS_ECDHE_ECDSA_WITH_AES_128_CBC_
[26] Opderbeck, D.W., Hurwitz, J.G.: Apple v. FBI: Brief in Support of SHA
Neither Party in San Bernardino iPhone case. http://ssrn.com/abstract=
2746100 (2016) 0xc00a TLS_ECDHE_ECDSA_WITH_AES_256_CBC_
[27] Panchenko, A., Lanze, F., Zinnen, A., Henze, M., Pennekamp, J., SHA
Wehrle, K., Engel, T.: Website Fingerprinting at Internet Scale. In: Pro- 0xc013 TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
ceedings of the Network and Distributed System Security Symposium 0xc014 TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
(NDSS) (2016)
0xc02b TLS_ECDHE_ECDSA_WITH_AES_128_GCM_
[28] Perdisci, R., Lee, W., Feamster, N.: Behavioral Clustering of HTTP-
Based Malware and Signature Generation using Malicious Network SHA256
Traces. In: NSDI. pp. 391–404 (2010) 0xc02f TLS_ECDHE_RSA_WITH_AES_128_GCM_
[29] Qualys: Qualys SSL Labs. https://www.ssllabs.com/ssltest/clients.html SHA256
(2016) 0xc030 TLS_ECDHE_RSA_WITH_AES_256_GCM_
[30] Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and SHA384
Classification of Malware Behavior. In: Detection of Intrusions and
Malware, and Vulnerability Assessment, pp. 108–125. Springer (2008)
[31] Roesch, M.: Snort - Lightweight Intrusion Detection for Networks. In: TABLE VII: Hex code to ciphersuite mapping for ciphersuites
Proceedings of the 13th USENIX Conference on System Administra- used in figures.
tion. pp. 229–238. LISA, USENIX Association (1999)
[32] Snort: Community Rules. https://www.snort.org/downloads/community/
community-rules.tar.gz (2016) Hex Code Ciphersuite
[33] Vassilev, A.: Annex A: Approved Security Functions for FIPS PUB 0x0000 server_name
140-2, Security Requirements for Cryptographic Modules. http://csrc.
0x0005 status_request
nist.gov/publications/fips/fips140-2/fips1402annexa.pdf (2016)
0x000a supported_groups
[34] Wang, K., Cretu, G., Stolfo, S.J.: Anomalous Payload-Based Worm
Detection and Signature Generation. In: Recent Advances in Intrusion 0x000b ec_point_formats
Detection. pp. 227–246. Springer (2006) 0x000d signature_algorithms
[35] Wang, L., Dyer, K.P., Akella, A., Ristenpart, T., Shrimpton, T.: Seeing 0x000f heartbeat
through Network-Protocol Obfuscation. In: Proceedings of the Confer-
ence on Computer and Communications Security (CCS). pp. 57–69.
0x0010 application_layer_protocol_
ACM (2015) negotiation
[36] Williams, N., Zander, S., Armitage, G.: A Preliminary Performance 0x0012 signed_certificate_timestamp
Comparison of Five Machine Learning Algorithms for Practical IP Traf- 0x0015 padding
fic Flow Classification. Computer Communication Review 30 (2006)
0x0017 extended_master_secret
[37] Wurzinger, P., Bilge, L., Holz, T., Goebel, J., Kruegel, C., Kirda, E.:
Automatically Generating Models for Botnet Detection. In: Computer 0x0023 SessionTicket TLS
Security–ESORICS 2009, pp. 232–249. Springer (2009) 0x3374 next_protocol_negotiation
[38] Yuan, G.X., Ho, C.H., Lin, C.J.: An Improved GLMNET for L1- 0x7550 channel_id
Regularized Logistic Regression. Journal of Machine Learning Research 0xff01 renegotiation_info
13(Jun), 1999–2030 (2012)
[39] Zander, S., Nguyen, T., Armitage, G.: Automated Traffic Classification
and Application Identification using Machine Learning. In: The 30th TABLE VIII: Hex code to extension mapping for extensions
IEEE Conference on Local Computer Networks. pp. 250–257. IEEE used in figures.
(2005)
15