KEMBAR78
Researchdemo 2 | PDF | Malware | Artificial Neural Network
0% found this document useful (0 votes)
22 views13 pages

Researchdemo 2

Uploaded by

devika Nair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views13 pages

Researchdemo 2

Uploaded by

devika Nair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Computers & Security 136 (2024) 103518

Contents lists available at ScienceDirect

Computers & Security


journal homepage: www.elsevier.com/locate/cose

CTIMD: Cyber threat intelligence enhanced malware detection using API


call sequences with parameters
Tieming Chen , Huan Zeng , Mingqi Lv *, Tiantian Zhu
College of Computer Science and Technology, Zhejiang University of Technology, China

A R T I C L E I N F O A B S T R A C T

Keywords: Dynamic malware analysis that monitors the sequences of API calls of the program in a sandbox has been proven
Malware detection to be effective against code obfuscation and unknown malware. However, most existing works ignore the run-
API sequence time parameters by only considering the API names, or lack an effective way to capture the correlations be­
Cyber threat intelligence
tween parameter values and malicious activities. In this paper, we propose CTIMD, a deep learning based dy­
Deep learning
namic malware detection method, which integrates the threat knowledge from CTIs (Cyber Threat Intelligences)
into the learning on API call sequences with run-time parameters. It first extracts IOCs (Indicators of Compro­
mise) from CTIs and uses IOCs to assist the identification of the security-sensitive levels of API calls. Then, it
embeds API calls and the associated security-sensitive levels into a unified feature space. Finally, it feeds the
feature vector sequences into deep neural networks to train the malware detection model. We conducted ex­
periments on two datasets. The experiment results show that CTIMD significantly outperforms existing methods
depending on raw API call sequences (F1-score is improved by 4.0 %~41.3 %), and also has advantage over
existing state-of-the-art methods that consider both API calls and run-time parameters (F1-score is improved by
1.2 %~6.5 %).

1. Introduction and Jha, 2003), string patterns (Y. Ye et al., 2008), and operation codes
(Ye et al., 2010)), and then decides whether the program is a malware or
Malware is the intrusive software developed by hackers to steal data not by matching the features to a signature database or learning a
or damage computer systems. Examples of malware include viruses, detection model (Masud et al., 2007). The major drawback of static
worms, Trojans, adware, and ransomware. According to recent reports detection technique is that it can be easily evaded by using code
from Trellix1 and VirusTotal,2 in 2021, there was a 46.12 % increase in obfuscation techniques (Moser et al., 2007). In addition, static detection
malware compared to the previous year. Malware is also the main technique relying on feature matching is ineffective in detecting un­
weapon to launch cyber-attacks (Kramer and Bradfield, 2009). For known malware.
example, Germany showed an increase of 32 % of identified To address the drawback of static detection technique, dynamic
cyber-attacks using ransomware from the second quarter to the third detection technique inspects the real behaviors of a program by
quarter in 2022, while the United States realized a 9 % increase in the executing it in a secured virtual environment (e.g., sandbox). The real
same period. Therefore, automatic malware detection is critical to behaviors include memory usage, network accesses, registry changes,
guarantee the security of computer systems and prevent users from and API (Application Programming Interface) calls. Among all these real
cyber threats (Ye et al., 2017). behaviors, API calls have been exploited the most by previous studies,
Malware detection techniques can be divided into two categories, i. since API call sequences contain fine-grained semantics of a program
e., static detection and dynamic detection (Ye et al., 2017). Static (Qiao et al., 2013). On the other hand, machine learning technique can
detection technique inspects the content of a program without actual learn the general patterns of API calls and their correlations with
execution of it. Specifically, the analyzer extracts a variety of features different types of malware from large datasets, and it has the potential to
from the codes of a program (including binary snippets (Christodorescu detect unknown malware. Thus, it has become the state-of-the-art

* Corresponding author.
E-mail address: mingqilv@zjut.edu.cn (M. Lv).
1
https://www.trellix.com/en-us/advanced-research-center/threat-reports/nov-2022.html
2
https://assets.virustotal.com/reports/2021trends.pdf

https://doi.org/10.1016/j.cose.2023.103518
Received 6 August 2023; Received in revised form 6 September 2023; Accepted 29 September 2023
Available online 1 October 2023
0167-4048/© 2023 Elsevier Ltd. All rights reserved.
T. Chen et al. Computers & Security 136 (2024) 103518

technique for API based malware detection (Martín et al., 2018; Tian 2.1. Static detection methods
et al., 2010; Ahmadi et al., 2013; Ye et al., 2008; Uppal et al., 2014).
Especially in recent years, deep learning technique has been widely Static detection methods detect malware by parsing the program and
exploited for building dynamic malware detection models, due to its analyzing the contents without running it. Specifically, it usually ex­
capability of automatically extracting semantic features from API call tracts a variety of features from the source codes, including opcode se­
sequences (Zhang et al., 2020; Agrawal et al., 2018; Rieck et al., 2011; quences (Ye et al., 2010), AST (Abstract Syntax Tree) (Ban et al., 2022),
Salehi et al., 2017; Chen et al., 2022; Jha et al., 2020; Amer and Zelinka, CFG (Control Flow Graph) (Gao et al., 2022), and binary image (Chen
2020; Tobiyama et al., 2016; Li et al., 2022). By embedding the API calls et al., 2020). In addition, complementary features can also be extracted
into the latent feature space, deep learning technique can greatly from the files (e.g., file metadata, file structure) (Weber et al., 2002).
improve the generalization ability for unknown malware detection. Afterwards, rule-based methods (Blount et al., 2011) or learning-based
Despite the promising results, the information insufficiency of API methods (Nataraj et al., 2011) could be used for malware detection
call sequences limits the capability of detecting stealthy malware, which based on these features.
tends to disguise malicious behaviors as normal API calls (Shaid and Static detection methods are efficient since they do not need to
Maarof, 2015). Several real cases demonstrating this issue can be found execute the programs. However, the main limitation is that they can be
in Section 5.5. To address this issue, several studies exploit the run-time easily evaded by using disguise techniques such as obfuscation,
parameters passed to API calls to augment the detection capability, since encryption, packaging and recompilation (Ye et al., 2017).
they contain security-sensitive information that can compensate for the
API calls (Chen et al., 2022). However, the existing studies have not 2.2. Dynamic detection methods
found a very effective way to capture the correlations between param­
eter values and various malicious behaviors. First, some studies try to Dynamic detection methods detect malware by monitoring and
learn malicious patterns directly from primitive parameter values (e.g., analyzing the run-time behaviors of the programs in a secured virtual
N-gram substrings of a file path) (Zhang et al., 2020; Agrawal et al., environment (e.g., sandbox). The run-time behaviors of a program
2018). However, the use of primitive parameter values limits the ability include memory usage, network accesses, registry changes, and API
to detect unknown malware with unseen parameter values in the calls, where the API calls are exploited the most by previous studies (e.
training set. Second, some studies try to define a set of empirical rules to g., Windows API calls). Since benign and malicious programs use the
capture the patterns of malware parameters (e.g., program with multiple same API set, it is usually difficult to determine whether the program is
suffixes, accessing files in system directories, etc.) (Chen et al., 2022). malicious or not by considering individual API calls. Therefore, most
However, the “hard rules” lack generalization ability and tend to cause previous works try to analyze the sequential patterns of API calls for
false alarms. malware detection using rule or learning based techniques.
Aiming at these limitations, this paper proposes CTIMD, a dynamic For example of the rule based techniques, Cho and Im (2015)
malware detection method by learning on CTIs (Cyber Threat In­ extracted API call patterns from API call sequences of a malware family
telligences) and API call sequences with parameters. The key insights of based on multiple sequence alignment algorithm, and then classified
CTIMD are twofold. First, the run-time parameters containing security- malware samples by referring to the API call patterns. Kim et al. (2019)
sensitive information can assist in accurate malware detection. Second, abstracted the API call sequences into character sequences and clustered
CTIs are collections of threat information used in industry to defend the character sequences into malware families, and extracted sequential
against cyber-attacks, which can provide clues to identify the security- patterns from each malware family. Then, the malware was detected by
sensitive levels of parameter values. CTIMD implements these key in­ calculating the similarity between samples and sequential patterns.
sights from two aspects. First, CTI enabled security-sensitive level labeling. Amer and Zelinka (2020) grouped APIs into clusters and converted API
CTIMD extracts various IOCs (Indicators of Compromise) from CTIs, and call sequences into cluster index sequences, and then performed mal­
then labels the security-sensitive level of each parameter by matching its ware detection by using Markov chain. However, the rule based tech­
value to the IOC database. Second, parameter assisted API call sequence niques lack generalization ability, and thus cannot detect unknown
learning. CTIMD encodes each API call and the associated security- malware.
sensitive levels of its parameters into embedding, and then learns the For the learning based techniques, since the API call sequence has
representation of each API call sequence by using a deep neural network much in common with natural language document (e.g., they are both
that is capable of modeling its temporal patterns (e.g., TextCNN, composed of semantic tokens), most recent studies applied techniques
BiLSTM, etc.). used in NLP domain such as word embedding and deep neural networks
In summary, the main contributions of this paper are as follows. to build malware detection model. For example, Jha et al. (2020) tested
three API representation schemes, i.e., hot encoding feature vector,
(1) We propose a dynamic malware detection method CTIMD based random feature vector, and word2vec feature vector, and found that
on data and knowledge driven artificial intelligence. Specifically, word2vec feature vector has achieved the best malware detection per­
it exploits both software execution samples and external knowl­ formance. Kolsnaji et al. (2016) applied the recurrent neural networks to
edge to train the malware detection model. classify malware into families using API call sequences. Li et al. (2022)
(2) We propose a method for identifying the security-sensitive levels used CNNs to extract semantic features from API call sequences, which
of an API call by matching the run-time parameters with an IOC are then inputted to a BiLSTM to build the malware detection model.
database extracted from CTIs. Tobiyama et al. (2016) used RNNs to generate features and detected
(3) We implement CTIMD based on three deep neural networks, and malware using CNNs. In addition, more recent studies utilized GNNs
evaluate it on two public datasets. The results demonstrate that (Graph Neural Networks) to build the malware detection model by
CTIMD is more adaptive to real application scenario, where the explicitly modeling the relations between API calls into a graph (Jha
trained model is applied to new data distribution. et al., 2020; Li et al., 2022). However, the capability of learning based
techniques is limited by the amount and diversity of training data. In
2. Related work addition, the API call sequences cannot fully reflect the malicious be­
haviors of a program.
In this section, we review the malware detection methods from two
perspectives, i.e., static detection methods and dynamic detection 2.3. API parameter enhanced methods
methods. In addition, we specifically review the malware detection
methods by considering API parameters. Previous studies have shown that considering the run-time

2
T. Chen et al. Computers & Security 136 (2024) 103518

parameters in addition to API calls can improve the malware detection sensitive information can assist in accurate malware detection, the
performance (Agrawal et al., 2018). However, the run-time parameters second one is that CTIs can provide clues to identify the security-
of API calls come in various forms, leading to different approaches to sensitive levels of parameter values, and the third one is that hints for
process the run-time parameters. For example, Tian et al. (2010) treated malware detection can be hidden in the context of API calls and run-time
the API calls and their parameters as separate strings and calculated the parameters. The three facts motivate the design of CTIMD.
frequencies of both API names and API parameters as features, and then In the first case, we show that the run-time parameters are helpful for
trained a malware detection model based on these features. Zhang et al. malware detection. We consider the Windows API CopyFileExA, the
(2020) utilized a hash algorithm to map different types of API param­ function of which is to copy an existing file to a new file. Since it is a
eters (e.g., paths, registry keys, URLs, IPs, etc.) together with API name common function that can be used by any program, it would not be
and API category into a feature vector, which is then used to train a deep identified as a malicious activity if we only consider the raw API call.
neural network for malware detection. Agrawal et al. (2018) used a However, if the destination path parameter of CopyFileExA is “C:…
feature vector to represent API names and the top-K frequent N-Gram of \Windows\Start Menu\Programs\Startup…”, it indicates the program
the API parameter strings, and used a stacked LSTM to detect malware. tries to add a file to the start menu for auto-start, which could be a
CruParamer (Chen et al., 2022) is a malware detection model by potential malicious activity. By leveraging this information, we can
leveraging heuristic rules to measure the sensitivity degrees of run-time classify this program more accurately.
parameters and integrating the sensitivity degrees into the learning of In the second case, we show that CTIs are helpful for evaluating the
the API call sequences. parameter values. We consider the Windows API NtCreateFile, the
These works try to capture the correlations between parameters and function of which is to create a new file or directory. If an program calls
malicious behaviors by either using primitive parameter values or NtCreateFile with the file path parameter as “C:…\AppData\Local
arbitrary heuristic rules. However, using primitive parameter values \Temp\vkiusrv.exe”, it is difficult to tell whether the API call is mali­
would significantly limit the generalization ability of the model, while cious or benign. However, if a CTI has recorded that “vkiusrv.exe” is a
using heuristic rules would easily cause false alarms. Finally, we sum­ virus, it can be ensured that the API call is malicious.
marize the related works in Table 1. In the third case, we show that jointly considering the context of API
calls and run-time parameters could provide more hints for malware
3. Preliminary detection. Table 2 illustrates a snippet of the API call sequence of a
Trojan of the GenKryptik family, where each line shows the API name
3.1. Motivation and the run-time parameters. From lines 1 to 3, the program calls Cre­
ateProcessInternalW to create a process (i.e., 962,629.exe), which
In this section, we try to demonstrate three facts using several cases, further creates and launches another process with a hidden interface (i.
where the first one is that run-time parameters containing security- e., nbveek.exe). In line 4, nbveek.exe calls NtAllocateVirtualMemory to
create memory pages with PAGE_GUARD attribute, which is used to
protect critical codes and data from modification. Finally, nbveek.exe
Table 1 creates a process in line 5, which then downloads an executable file in
A summary of the related works.
line 6 and creates another process in line 7 (i.e., rundll32.exe). The
Authors Parameters Features Model Type process rundll32.exe executes a command to instruct the creation of a
Ye et al. (2010) No opcode Clustering Static scheduled task, which is triggered every minute to execute nbveek.exe.
sequences Due to the PAGE_GUARD attribute, the scheduled task would become an
Ban et al. No string Abstract Syntax Static indefinite loop that cannot be stopped. In this case, by looking at each
(2022) features Tree
individual API call, there is no evident indicator of malicious behaviors,
Gao et al. No graph GNN Static
(2022) features which can only be discovered by performing an association analysis of
Chen et al. No binary DNN Static the contexts of API calls.
(2020) image
Blount et al. No rules Evolutionary Static
(2011) Algorithm 3.2. System architecture
Nataraj et al. No malware KNN Static
(2011) image The architecture of CTIMD is shown in Fig. 1, consisting of the
Cho and Im No API Multiple Sequence Dynamic
(2015) sequence Alignment
Kim et al. No API Multiple Sequence Dynamic Table 2
(2019) sequence Alignment A snippet of the API calls of a Trojan of the GenKryptik family.
Amer and No API Markov Chain Dynamic
Line API name Parameters
Zelinka sequence
(2020) 1 CreateProcessInternalW file_path: “C:\tmpbpm79a\962,629.exe”,
Jha et al. (2020) No API RNN Dynamic process identifier: 2816
sequence 2 CreateProcessInternalW file_path: “C:\Users\Admin\AppData\Local
Kolsnaji et al. No API DNN Dynamic \Temp\3447aac370\nbveek.exe”,
(2016) sequence process_identifier: 2816
C. Li et al. No API CNN + BiLSTM Dynamic 3 ShellExecuteExW file_path: “C:\Users\Admin\AppData\Local
(2022) sequence \Temp\3447aac370\nbveek.exe”, show_type:
Tobiyama et al. No process RNN + CNN Dynamic 0
(2016) behavior 4 NtAllocateVirtualMemory region_size: 8192, process_identifier: 3024,
Agrawal et al. Yes API CNN + LSTM Dynamic protection: PAGE_GUARD
(2018) sequence …
Tian et al. Yes API Random Forest Dynamic 5 CreateProcessInternalW process_identifier: 3000
(2010) sequence 6 InternetReadFile file_path: “C:\Users\Admin\AppData
Zhang et al. Yes API CNN + LSTM Dynamic \Roaming\0c1b614d3d0a85\cred.dll”
(2020) sequence 7 CreateProcessInternalW file_path: “C:\Windows\System32\rundll32.
Agrawal et al. Yes API N-Gram + LSTM Dynamic exe”, process_identifier: 2296, command_line:
(2018) sequence "SCHTASKS /Create /SC MINUTE /MO 1 /TN
Chen et al. Yes API TextCNN / BiLSTM Dynamic nbveek.exe /TR "C:\Users\Admin\AppData
(2022) sequence \Local\Temp\3447aac370\nbveek.exe" /F"

3
T. Chen et al. Computers & Security 136 (2024) 103518

Fig. 1. The architecture of CTIMD.

following four modules. malware samples, and security-sensitive file names/paths. Fig. 2 gives a
CTI mining: It collects CTIs from TIPs (Threat Intelligence Platforms), fragment of a CTI report describing the malicious activities of a malware
and extracts a variety of IOCs that can be potentially reflected in the named AuTo Stealer used by SideCopy (i.e., a Pakistani threat actor).3
parameter values from the CTIs, such as URL, file path, and IP address. In this section, we present an automatic IOC mining method by
Parameter labeling: Given a parameter value, it finds matched IOCs analyzing textual CTIs, including three steps, i.e., potential IOC filtering,
for the parameter value in the IOC database based on several matching IOC recognition, and IOC extraction.
functions, and labels the security-sensitive level of the parameter based
on the matching results. In addition, since it is impossible for CTIs to 4.1.1. Potential IOC filtering
cover all the malicious API calls, it also applies several heuristic rules to This step uses regex to quickly discover tokens that may potentially
compensate for the IOC matching results. be IOCs. The IOCs could be divided into various categories according to
API call embedding: It uses embedding techniques to encode each API the malicious activities they perform (e.g., IP, URL, hash, etc.), and each
call and the security-sensitive levels of its parameters into a feature category of IOCs follow a certain string pattern. For example, a typical
vector, which can capture the similarity semantics of APIs and the se­ URL is composed of protocol, path, and query parameters.
curity semantics of the parameters. There are dozens of IOC categories and we only design regex for the
API call sequence learning: It trains a malware detection model based following six IOC categories, as shown in Table 3. This is because that we
on deep neural networks that are capable of learning sequential patterns found that other IOC categories barely exist in API parameter values.
in the API call sequences. The model is essentially a binary classifier that Specifically, given a CTI report (denoted as D), we firstly segment D into
takes an API call embedding sequence as input and predicts it as benign a set of sentences (denoted as SS). Then, for each sentence si in SS, we
or malicious. segment si into a set of tokens (denoted as WSi). For each token wi(j) in
The source code of CTIMD is available at https://github.com/zengh WSi, if wi(j) matches a pre-defined regex with IOC category cj, we create
uan0620/CTIMD. a three-tuple ti = (wi(j), cj, si) and put it into the set of potential matched
elements (denoted as PS).
4. Methodology
4.1.2. IOC recognition
4.1. CTI mining The regex based IOC filtering method is unreliable, since there exist
many potential matched elements (e.g., URL, file path, etc.) that are not
CTIs (Cyber Threat Intelligences) refer to evidence-based knowledge IOCs. For example, the following sentence is from a CTI: “For more in­
about the context, mechanism, indicator, and impact of cyber-attacks formation and career opportunities, visit https://www.malwarebytes.com”,
(Abu et al., 2018). Most CTIs are presented and exchanged in the form where the URL in this sentence is not an IOC. Actually, it indicates the
of natural language reports in technical blogs, white papers, or posts.
The most valuable information of CTIs is the machine-digestible IOCs
(Indicators of Compromise), which are forensic artifacts, such as 3
https://www.malwarebytes.com/blog/threat-intelligence/2021/12/side­
IP/domains of C&C (Command and Control) servers, MD5 hashes of copy-apt-connecting-lures-to-victims-payloads-to-infrastructure

4
T. Chen et al. Computers & Security 136 (2024) 103518

Fig. 2. A fragment of a CTI report.

subnetwork accepts si as input and output a feature vector sei of size ds


Table 3
= sn × fn, where sn is the number of filter sizes and fn is the number of
The regex for different IOC categories.
filters for each filter size. The feature embedding subnetwork is built by
IOC Regex Examples using a MLP (Multi-Layer Perceptron), which accepts the one-hot
Category
encoding of cj as input and maps it to a dense feature vector cei of size
IP (^((2[0–4]\d.)|(25[0–5].)|(1\d{2}.)|(\d 192.168.0.1, dc. Finally, we concatenate sei and cei, and stack a fully connected layer
{1,2}.))((2[0–5]{2}.)|(1\d{2}.)|(\d 86.56.43.53:8080 with sigmoid function to output the probability of ti being a true IOC.
{1,2}.){2})((1\d{2})|(2[0–5]{2})|(\d
{1,2})))
Domain ^(?:(?=[a-zA-Z0–9-]{1,63}\.)[a-zA- baidu.com 4.1.3. IOC extraction
Z0–9]+(?:-[a-zA-Z0–9]+)*\.)+[a-z] After the IOC recognition step, we pick out all the potential matched
{2,6}$ elements ti = (wi(j), cj, si) in PS that is classified as “IOC”, and store the
URL ^(https?|ftp):\/\/(www\.)?([a-zA-Z0–9- https://www.xyz123.
IOC instance wi(j) and the corresponding IOC category cj into our IOC
]+\.)*[a-zA-Z0–9]+(\.[a-z]{2,6})(: com/index.html
[0–9]+)?(\/[^\s]*)?$ database.
File Path ^([a-zA-Z]:(\\[^\\/:*?<>|]+)*\\[^\ C:\Users\admin
\/:*?<>|]+\.[^\\/:*?<>|]+,)*[a-zA-Z]: \Documents\abc.txt
(\\[^\\/:*?<>|]+)*\\[^\\/:*?<>|]+\.[^ 4.2. Parameter labeling
\\/:*?<>|]+$’)
Registry ((HKCR|HKLM|HKCU|HKCC|HKU| SOFTWARE\Microsoft In this section, we label the security-sensitive level of each API
Path HKEY_CLASSES_ROOT| \Windows parameter by matching its value with the IOC database using several
HKEY_CURRENT_USER| \CurrentVersion\Run
HKEY_LOCAL_MACHINE|HKEY_USERS|
matching functions. In addition, we also use several heuristic rules to
HKEY_CURRENT_CONFIG)+[_A-Z]*\\[\ compensate for the IOC matching results in case of IOC shortage.
\{A-Za-z0–9-_\.}]+[\d\w])
Host ^(?=.{1255}$)([A-Za-z0–9_](?:(?:[A-Za- w.c1ts.ru 4.2.1. IOC matching
Name z0–9-_]){0,61}[A-Za-z0–9_])?\.)+(?:[A-
Given an API parameter value pvi, we conduct the IOC matching
Za-z]{2,}\.?|[A-Za-z0–9-]{2,}\.?)$
through two steps.
In the first step, we match pvi to the regex of each IOC category. If a
website of a company. To demonstrate this issue, we conducted a pre­ match is found, we record the IOC category of pvi (denoted as ci) and go
liminary experiment by selecting a corpus of CTIs containing 2400 po­ to the second step.
tential matched elements, only one half of which is real IOC. The In the second step, we try to find a matched IOC with category ci in
experiment results show that the recall of detecting IOCs is 0.993 and the IOC database for pvi. Since IOCs are static forensic artifacts sum­
the precision of detecting IOCs is only 0.634. It indicates that the regex marized from existing cyber-attacks, even a slight variant would cause
based IOC filtering method can discover most of the IOCs but the results the exact matching to fail. For example, “https://fareslower.com/login.
need secondary filtering. php” is an IOC of URL category, indicating a page of a fraud website.
To filter out normal elements from real IOCs, we should consider the Apparently, visiting other pages of this website (e.g., “https://fareslower.
contexts in the sentences. For example, the following sentence “One of com/index.php”) also indicates a high security-sensitive level. Therefore,
them was a malicious URL associated with other malicious DEX files and rather than making exact matches, we design a set of matching functions
APKs: https://dy.kr.wildpettykiwi.info/dykr/update” contains an IOC of to make fuzzy matches for different IOC categories.
the URL category. It can be seen that there are words like “malicious” Specifically, for IP, domain, registry path, and host name, we apply
and “file” in the context. exact matching strategy and compute the matching score based on Eq.
To exploit the contexts for real IOC recognition, we train a text (1). For URL and file path, we apply fuzzy matching strategy and
classification model to classify each potential matched element in PS as compute the matching score using the Levenshtein distance (Lev­
“IOC” or “non-IOC”. Fig. 3 shows the architecture of the proposed enshtein, 1966), which is used to measure the distance between two
model, which integrates a convolutional subnetwork and a feature strings by counting the minimum number of operations required to
embedding subnetwork. Given a potential matched element ti = (wi(j), cj, make the two strings equal. Given the API parameter value pvi and an
si), we input the sentence si into the convolutional subnetwork and input IOC wj, their matching score is computed based on Eq. (2), where ldist(•)
the potential IOC category cj into the feature embedding subnetwork. is the Levenshtein distance function. However, the fuzzy matching
Specifically, the convolutional subnetwork is built based on TextCNN strategy has the following problem. We have to compute the matching
(Kim, 2014), which utilizes multiple parallel convolutional filters of score between the API parameter value and every IOC string in the IOC
varying sizes to capture different N-Gram features. The convolutional database, leading to scalability problem as the database grows. Aiming
at this problem, we adopt a “query and rank” strategy. First, we use a

5
T. Chen et al. Computers & Security 136 (2024) 103518

Fig. 3. The architecture of the IOC recognition model.

keyword in the API parameter value pvi to quickly query the IOC data­ shown in Table 4, we use three empirical rules (i.e., the first to the third
base based on full-text search engine (e.g., Elasticsearch) and obtain a rules) and one statistical rule (i.e., the fourth rule). For the empirical
small set of IOC candidates (denoted as ICSi). We use the host as keyword rules, the matching score of an API parameter value pvi is computed
for URL and the file name as keyword for file path, since we believe they based on Eq. (4). For the statistical rule, the matching score of an API
are the key and invariant part of the IOCs. Second, we compute the parameter value pvi is computed based on Eq. (5), where TFm(pvi) de­
matching score between pvi and each IOC candidate in ICSi, and take the notes the frequency of malicious program containing pvi and TFb(pvi)
maximum matching score as the final matching score for pvi based on denotes the frequency of benign program containing pvi.
Eq. (3). Fig. 4 gives a simple example to illustrate the “query and rank”
strategy, where the bold part of the parameter value and IOC candidates
is the keyword.
Table 4
{
1 if a matched IOC is found The heuristic rules used to compensate for the IOC matching results.
mscore1 (pvi ) = (1)
0 if no matched IOC can be found Heuristic Description Examples
Rule
( )
( ) ldist pvi , wj Multiple The malware disguises its file Clay.exe.pif
mscore2 pvi , wj = 1 − ( ) (2)
max |pvi |, |wj | suffixes name using two or more
suffixes.
{ ( )} Sensitive The malware modifies system SOFTWARE\Microsoft\Windows
mscore2 (pvi ) = max mscore2 pvi , wj (3) registry policies by altering the \CurrentVersion\Run
wj ∈ICSi
path Windows registry.
Sensitive file The malware disguises as a C:\Windows\system32\system.
4.2.2. Rule matching path system file by placing itself in exe
The effectiveness of IOC based security-sensitive level estimation the system directory.
relies on the scale of the IOC database. Since it is impossible to collect all Sensitive The API parameter values that C:\Users\users\AppData
parameter frequently appear in malware \Roaming\Microsoft\Windows
IOCs in practice due to the continuous evolvement of cyber-attack
but seldom appear in benign \Start Menu\Programs\Startup\
strategies, we use several heuristic rules to compensate for the IOC software.
matching results. Following the experience in (Chen et al., 2022), as

Fig. 4. The query and rank strategy for fuzzy IOC matching.

6
T. Chen et al. Computers & Security 136 (2024) 103518

{
1 if the rule is satisfied parameters. Hence, we have to embed the APIs and matching score
mscore3 (pvi ) = (4)
0 if the rule is not satisfied vectors jointly into a unified feature space. Specifically, we separate the
API call embedding into two parts, i.e., the API embedding and security-
{
1 if TFm (pvi ) > TFb (pvi ) sensitive level embedding, and then fuse the two parts into a full
mscore4 (pvi ) = (5)
0 otherwise embedding.
API embedding: We treat an API as a “word” and an API sequence
4.2.3. Security-sensitive level labeling generated from the software execution trace as a “document”. Then, we
After the IOC matching and rule matching, we can obtain one IOC apply word2vec (Mikolov et al., 2013) to train on all the API sequences
matching score and four rule matching scores for each API parameter from a large software dataset. Finally, each API ai can be represented by
value. However, an API call can have uncertain number of parameters. a dense feature vector (denoted as ei). The idea of applying word2vec is
Hence, how to integrate the matching scores of all the individual API to make the feature vectors of APIs that are used in similar sequential
parameters of an API call into a single security-sensitive level label is a contexts to be close in feature space, since APIs used in similar
problem that requires to be solved. sequential contexts might usually have similar functions.
Aiming at this problem, we use a fixed-length matching score vector Security-sensitive level embedding: The security-sensitive level of the
to label the security-sensitive level of an API call. Specifically, suppose parameters of an API call aci is represented as a matching score vector
an API call aci has n parameters, the values of which are denoted as pv1i , mvi, which is a sparse multi-hot feature vector with a large proportion of
pv2i , ⋯, pvni , we firstly compute six IOC matching scores for each API zero elements. Therefore, we apply the feature embedding scheme
j (Grbovic and Cheng, 2018) to process mvi. Specifically, we use a MLP
parameter value pvi , each one corresponds to an IOC category. Note that
j
layer on top of mvi to convert it into a D-dimensional dense feature
if pvi does not match the regex of IOC category ck, its matching score for vector (denoted as mei). The parameter embedding can alleviate the
ck is 0. After that, we create a 6-dimensional IOC matching score vector feature sparsity issue and improve the generalization ability.
for API call aci (denoted as ivi), where ivi[k] is the maximum matching Embedding fusion: In order to embed the APIs and parameters into a
score for IOC category ck among the n parameter values. Second, we unified feature space, for an API call aci of API aj, we firstly concatenate
j
compute four rule matching scores for each API parameter value pvi , its API embedding ej and parameter security-sensitive level embedding
each one corresponds to a heuristic rule. After that, we also create a 4- mei into a full embedding (denoted as fei). Then, we feed fei into another
dimensional rule matching score vector for aci (denoted as rvi), where MLP and update it continuously with the training of the downstream API
rvi[k] is the maximum matching score for the kth heuristic rule among sequence learning model.
the n parameter values. Finally, we concatenate the IOC matching score
vector and the rule matching score vector as the final matching score 4.4. API call sequence learning
vector for API call aci (denoted as mvi = ivi ⊕ rvi).
In addition, to facilitate the embedding process in Section 4.3, we After the API call embedding step, the software execution trace
convert the final matching score vector mvi into a multi-hot feature represented by a sequence of API calls (denoted as tracek = < ack1 ,ack2 ,⋯,
vector. The matching scores of all heuristic rules and certain IOC cate­ ackN >) can be transformed into a sequence of feature vectors (i.e.,
gories are either 0 or 1, except for URL and file path, the matching scores tracek = < fek1 , fek2 , ⋯, fekN >). Therefore, we adopt classifiers with the
of which are in the range of [0, 1]. Thus, for IOC categories of URL and
sequential pattern learning ability to build the malware detection
file path, we first discretize [0, 1] into m equal bins, and then maps the
model. Specifically, we mainly explore three types of deep neural net­
continuous matching score into the corresponding bin. Finally, the
works as the backbone classifiers, i.e., TextCNN, ABiLSTM, and
continuous matching score of a URL or file path can be represented as a
Transformer.
one-hot vector. For example, if we discretize [0, 1] into four bins (i.e.,
TextCNN is a CNN specifically designed for the text classification
[0, 0.25], (0.25, 0.5], (0.5, 0.75], and (0.75, 1]) and the matching score
problem (Kim, 2014). TextCNN learns the sequential patterns by sliding
of a URL is 0.69, the matching score will be represented as 〈0, 0, 1, 0〉.
on contiguous word embeddings using multiple convolutional filters of
Fig. 5 gives a simple example to illustrate the security-sensitive level
various sizes.
labeling strategy.
ABiLSTM is the combination of BiLSTM and attention mechanism.
BiLSTM is an enhanced version of LSTM, which has the ability to learn
4.3. API call embedding long-term dependency in sequences, while the attention mechanism is
used to focus the model on more important elements in the sequences.
Similar to the text classification task in the NLP domain, using one- Transformer applies self-attention to learn semantic dependences
hot encoding to represent the APIs would lead to extremely sparse between elements regardless of their positions in a sequence. The key
feature vectors and ignore the semantic relations between APIs. There­ advantages of Transformer include parallelizability and the ability to
fore, the APIs should be embedded into dense feature vectors before learn adaptive embeddings for elements under different contexts.
being fed into the deep neural networks. In addition, after the parameter
labeling step, each API call will be associated with a matching score
vector to characterize the security-sensitive level of its run-time

Fig. 5. The strategy for security-sensitive labeling.

7
T. Chen et al. Computers & Security 136 (2024) 103518

5. Experiment the Inter-Eva strategy for evaluation.


In the first experiment, we tune the parameter d and the experiment
5.1. Experiment setup result is shown in Fig. 6(a), where d = 0 denotes that the original
matching score vector mvi is used in the model without embedding.
5.1.1. Dataset First, a larger embedding dimension d generally allows the model to
We use two public datasets4,5 provided by a third-party to evaluate capture more complex semantics. However, setting d too large would
CTIMD, denoted as DS1 and DS2. DS1 contains 30,000 API call sequences cause the model more prone to overfitting. Specifically, the F1-Score
obtained by executing the Windows PE files in a sandbox, in which improves from 0.938 to 0.997 as increasing d from 2 to 10. By further
10,000 are malicious and others are benign. DS2 contains 12,000 API increasing d from 10 to 20, the F1-Score shows a slight decrease. Second,
sequences, in which 8000 are malicious. DS1 involves 94 unique APIs the model has a satisfactory detection performance when d = 0 (the F1-
and DS2 involves 98 unique APIs. The number of overlapping APIs be­ Score is 0.972). The dimension of the original matching score vector is
tween DS1 and DS2 is 90. 10, and the best detection performance of the model is also achieved
To build the IOC database, we extract 17,341 documents from threat when d = 10. It is because that the category space of the knowledge of
intelligence sources including AT&T,6 symantec,7 malwarebytes.8 IOCs and heuristic rules is small, thus we do not need a large embedding
Finally, our IOC database totally contains 232,397 IOCs, including dimension to accommodate the category space. In addition, the detec­
49,167 IPs, 119,732 Domains, 34,107 URLs, 2441 File Paths, 17 Registry tion performance is better when d = 10 than that when d = 0. Although
Paths, and 26,933 Host Names. they have the same feature vector dimension, the embedding technique
converts the sparse original feature vector (d = 0) into a compact feature
5.1.2. Evaluation strategies vector (d = 10), which can better capture the semantic relations between
We use two evaluation strategies in the experiments. The first one is different aspects of knowledge.
the intra-dataset evaluation strategy (denoted as Intra-Eva), which In the second experiment, we tune the parameter m and the experi­
trains the model on 80 % of the samples in DS1|DS2 and tests it on the ment result is shown in Fig. 6(b). It exhibits a similar trend as the tuning
remaining 20 % of the samples in the same dataset. The second one is the of parameter d. Specifically, as m increases from 2 to 20, the F1-Score
inter-dataset evaluation strategy (denoted as Inter-Eva), which uses DS1| continuously improves and it reaches the optimum value of 0.997
DS2 as training set and tests on DS2|DS1. when m = 20. However, if we further increase m, the F1-Score starts a
We use four evaluation metrics to measure the performance, slight decrease phase. This is because that a too small value of m would
including Accuracy, Precision, Recall, and F1-Score calculated as follows: lead to the loss of too much information by blurring the difference be­
Accuracy = (TP + TN) / (TP + FP + TN + FN), Precision = TP / (TP + FP), tween diverse security-sensitive levels, while a too large value of m
Recall = TP / (TP + FN), and F1-Score = 2 * Precision * Recall / (Precision would fail to capture the similarity between adjacent security-sensitive
+ Recall), where TP is the number of malicious samples that are correctly levels and cause overfitting.
classified, TN is the number of benign samples that are correctly clas­
sified, FP is the number of samples that are benign but are incorrectly 5.3. Experiment 2: Ablation experiment
classified as malicious, and FN is the number of samples that are mali­
cious but are incorrectly classified as benign. 5.3.1. The evaluation of different classifiers
In this experiment, we try to evaluate the performance of different
5.1.3. Default parameter settings backbone classifiers, including traditional machine learning based
First, recent studies (Chen et al., 2022) have shown that two minutes classifiers such as LR (Logistic Regression), NB (Naive Bayes), DT (De­
execution of a program is sufficient to expose its malicious activities, cision Tree), and RF (Random Forest), and deep learning based classi­
while the execution traces in the first two minutes contain approxi­ fiers such as TextCNN, ABiLSTM, and Transformer. Note that all the
mately 1000 API calls in our datasets. Therefore, we set the length of the above methods use only the raw API call sequences as input without
API call sequence generated from software execution trace as 1200 by considering the run-time parameters. In addition, we also evaluate the
applying truncating or padding operations. Second, we set the dimen­ performance of CTIMD with different backbone classifiers, including
sion of API embedding as 128 by referring to the experiences of previous TextCNN (denoted as CTIMD-TC), ABiLSTM (denoted as CTIMD-AB),
studies (Chen et al., 2022). Third, as for the backbone classifiers, for the and Transformer (denoted as CTIMD-TF). Note that CTIMD-TC,
TextCNN model, we use four different kernel sizes (i.e., 3, 4, 5, and 6) CTIMD-AB, and CTIMD-TF use the run-time parameter enhanced API
and set the number of kernels for each size as 256. For the ABiLSTM call sequences.
model, we use two BiLSTM layers, and we set the dimension of hidden The experiment results are shown in Table 5. First, the deep learning
layer of BiLSTM as 256 and the attention size as 512. For the Trans­ based classifiers (i.e., TextCNN, ABiLSTM, and Transformer) signifi­
former model, we use two encoder layers, and we set the number of cantly outperform the traditional machine learning based classifiers (i.
multi-head attention as 4 and the dimension of hidden layer as 128. e., LR, NB, DT, and RF). This is because that the embedding scheme used
in deep neural networks can capture the semantic correlations between
APIs, which can improve the generalization ability in unseen API
5.2. Experiment 1: Parameter tuning experiment
sequential patterns. Second, the performance of CTIMD-TC, CTIMD-AB,
and CTIMD-TF has been improved against their naïve counterparts, i.e.,
In this section, we investigate the effect of two unique parameters of
TextCNN, ABiLSTM, and Transformer. However, the improvement de­
CTIMD, i.e., d (the dimension of security-sensitive level embedding) and
gree is higher under the Inter-Eva strategy than that under the Intra-Eva
m (the number of bins to discretize the continuous matching scores). In
strategy. For example, the F1-Score increases only from 0.973 of
this experiment, we use TextCNN as the backbone classifier and adopt
TextCNN to 0.984 of CTIMD-TC under the Intra-Eva strategy, while it
increases from 0.952 of TextCNN to 0.996 of CTIMD-TC under the Inter-
4 Eva strategy. This is because CTIMD exploits threat knowledge extracted
https://github.com/kericwy1337/Datacon2019-Malicious-Code-DataSet-
from CTIs to discover stealthy malware, the API sequential patterns of
Stage1
5
https://github.com/kericwy1337/Datacon2019-Malicious-Code-DataSet- which might not be appropriately learnt by the naïve counterparts in
Stage2 new datasets, while the threat knowledge is common across different
6
https://cybersecurity.att.com/blogs/labs-research datasets. This result demonstrates the effectiveness of the CTI enhanced
7
https://www.broadcom.com/products/cybersecurity run-time parameter analysis. Third, both CTIMD-TC|TextCNN and
8
https://www.malwarebytes.com CTIMD-AB|ABiLSTM outperform CTIMD-TF|Transformer. This might

8
T. Chen et al. Computers & Security 136 (2024) 103518

Fig. 6. The impact of parameters d and m.

Table 5
The evaluation of different classifiers.
Intra-Eva Strategy Inter-Eva Strategy
Accuracy Precision Recall F1-Score Accuracy Precision Recall F1-Score

LR 0.884 0.842 0.808 0.824 0.641 0.642 0.637 0.639


NB 0.836 0.735 0.795 0.764 0.720 0.825 0.560 0.667
DT 0.876 0.832 0.786 0.808 0.505 0.502 0.988 0.666
RF 0.936 0.908 0.900 0.903 0.732 0.809 0.609 0.695
TextCNN 0.982 0.986 0.961 0.973 0.939 0.994 0.914 0.952
ABiLSTM 0.982 0.977 0.968 0.972 0.984 0.994 0.982 0.988
Transformer 0.971 0.964 0.951 0.956 0.896 0.988 0.854 0.916
CTIMD-TC 0.983 0.978 0.970 0.984 0.995 0.994 0.998 0.996
CTIMD-AB 0.982 0.984 0.963 0.973 0.994 0.993 0.999 0.996
CTIMD-TF 0.971 0.969 0.944 0.957 0.960 0.993 0.946 0.969

because that Transformer is designed to learn polysemy in complex 0.985. It means that when 98.5 % malicious samples are correctly
contexts leveraging a great number of parameters, and thus it has to be detected, the false alarms only account for 2.95 %. In the rest of the
trained with very large datasets. In contrast, TextCNN and ABiLSTM experiments, we use TextCNN as the backbone classifier for CTIMD.
with significantly fewer parameters are more adaptable to smaller
datasets. 5.3.2. The evaluation of different components
Fig. 7 shows the ROC (Receiver Operating Characteristic) curves of In this experiment, we try to evaluate the benefits of the two key
the above ten models on the Inter-Eva strategy, where the X-axis denotes components in CTIMD (i.e., the IOC matching component and the rule
the FPR (False Positive Rate) and the Y-axis denotes the TPR (True matching component). Specifically, we test the following three variants
Positive Rate). The larger the AUC (Area Under Curve) value or the of CTIMD. We adopt the Inter-Eva strategy in this experiment.
closer of the curve to the top-left corner, the better the performance of
the model. It can be seen from the figure that CTIMD-TC achieves the (1) API: It is a variant of CTIMD that does not consider the run-time
highest AUC value (0.999). It also achieves a high TPR and even with a parameters.
low FPR. For example, when the FPR of CTIMD-TC is 0.0295, the TPR is (2) IOC: It detects malware based solely on IOCs. Specifically, it takes
the IOC matching score vector ivi as input, and trains a MLP
classifier to detect malware.
(3) Rule: It detects malware based solely on heuristic rules. Specif­
ically, it takes the rule matching score vector rvi as input, and
trains a MLP classifier to detect malware.
(4) IOC_Rule: It detects malware based solely on IOCs and heuristic
rules. Specifically, it takes the final matching score vector mvi as
input, and trains a MLP classifier to detect malware.
(5) API_IOC: It is a variant of CTIMD that does not match API
parameter values to the heuristic rules.
(6) API_Rule: It is a variant of CTIMD that does not match API
parameter values to the IOC database.

The experiment results are shown in Table 6. First, API_IOC out­


performs API. For example, the integration of IOC into TextCNN im­
proves the F1-Score from 0.952 to 0.995. It further demonstrates the
effectiveness of the extracted IOCs for providing supplemental infor­
mation to the API sequences. Furthermore, API_Rule also outperforms
API. It verifies the effectiveness of the heuristic rules. Second, API_IOC
slightly outperforms API_Rule. This might be because the heuristic rules
Fig. 7. The ROC curves of models with different classifiers.

9
T. Chen et al. Computers & Security 136 (2024) 103518

Table 6
The evaluation of different components.
API IOC Rule IOC_Rule API_IOC API_Rule CTIMD

Accuracy 0.939 0.772 0.728 0.857 0.994 0.984 0.995


Precision 0.994 0.991 0.933 0.985 0.995 0.996 0.994
Recall 0.914 0.664 0.638 0.727 0.995 0.980 0.998
F1-Score 0.952 0.795 0.758 0.837 0.995 0.988 0.996

such as multiple suffixes and sensitive file paths are weak indicators, (4) Jha et al. (2020): It uses word2vec to learn the representations of
which can only indicate the activities are sensitive but not necessarily APIs, and then applies a LSTM to train the malware detection
malicious. On the other hand, IOCs are strong indicators that can sub­ model.
stantially confirm the maliciousness of the activities. Third, CTIMD (5) Tian et al. (2010): It treats the API calls and the run-time pa­
slightly outperforms API_IOC. It indicates that the heuristic rules could rameters as separate strings, and extracts features for a sample by
provide additional information in cases when the coverage of IOCs is not considering all the strings along with their frequencies in this
sufficient. Fourth, the F1-Score of pure data-driven method (i.e., API) sample. Then, it applies machine learning techniques to train the
outperforms that of the pure knowledge-driven method (i.e., IOC_Rule) malware detection model.
by 14 %. It shows that the malicious behaviors of malware are stealthy (6) Zhang et al. (2020): It encodes API name, API category, and API
and complex, and thus it is difficult to be effectively captured by human parameters of an API call into feature vector by using a hashing
knowledge. trick, and then inputs the sequences of feature vectors into a deep
Fig. 8 shows the ROC curves of the above seven variants of CTIMD on neural network combining CNN and LSTM to train the malware
the Inter-Eva strategy. CTIMD achieves the highest AUC value, followed detection model.
by API_IOC and API_Rule, then API, finally IOC, Rule, and IOC_Rule. In (7) Chen et al. (2022): It employs several heuristic rules to assess the
summary, the integration of threat knowledge from IOCs and heuristic sensitivity of each run-time parameter and labels each API call
rules could effectively enhance the detection capability of the models. based on the sensitivity of its run-time parameters. Then, it en­
codes API calls by concatenating the embeddings of API names
5.4. Experiment 3: Comparison experiment and API labels. Finally, it feeds the sequences of API call em­
beddings into a TextCNN to train the malware detection model.
To investigate the competitive performance of CTIMD, we compare it
with the following malware detection models. The experiment results are shown in Table 7 (using the Intra-Eva
strategy) and Table 8 (using the Inter-Eva strategy). First, the deep
(1) Martín et al. (2018): It constructs a Markov state transition matrix learning based models (including Jha et al. (2020), Zhang et al. (2020),
by mining the API call sequences, and then utilizes the state Chen et al. (2022), and CTIMD) achieves superior performance as
transition matrix as features to create a malware detection model. compared to the traditional machine learning based models (including
(2) Kim et al. (2019): It discovers API sequential patterns through Martín et al., 2018; Kim et al., 2019; Amer and Zelinka, 2020; and Tian
sequential alignment and clustering algorithms, and then detects et al., 2010). The main different between these two types of models is
malware by aligning real API call sequences to the discovered that the deep learning based models apply embedding techniques (e.g.,
malicious API sequential patterns. word2vec) to encode APIs into semantic feature space. Rather than plain
(3) Amer and Zelinka (2020): It groups APIs into clusters based on a feature vectors (e.g., frequencies of APIs, transition probabilities be­
clustering algorithm, and converts API call sequences into cluster tween APIs), semantic feature vectors could capture the latent correla­
index sequences. Then, it builds the benign and malicious cluster tions between APIs, and enables higher generalization ability. Second,
index transition matrices. Finally, the malicious probability of a models considering API parameters generally outperform those ignoring
sample is computed by traveling through the API call sequence API parameters. For example, Tian et al. (2010) outperforms Martín
via the transition matrices. et al. (2018), Kim et al. (2019), and Amer and Zelinka (2020) on
F1-Score, while Zhang et al. (2020), Chen et al. (2022), and CTIMD
outperform Jha et al., 2020 on F1-Score.

Fig. 8. The ROC curves of models with different components.

10
T. Chen et al. Computers & Security 136 (2024) 103518

Table 7
The comparison with baselines using the Intra-Eva strategy.
Type Model Parameters Accuracy Precision Recall F1-Score

Machine Learning Martín et al. (2018) No 0.813 0.683 0.822 0.746


Kim et al. (2019) No 0.700 0.532 0.838 0.651
Amer and Zelinka (2020) No 0.866 0.852 0.888 0.869
Tian et al. (2010) Yes 0.928 0.939 0.840 0.887
Deep Learning Jha et al. (2020) No 0.928 0.998 0.922 0.958
Zhang et al. (2020) Yes 0.973 0.981 0.938 0.959
Chen et al. (2022) Yes 0.981 0.986 0.976 0.981
CTIMD Yes 0.983 0.978 0.970 0.984

Table 8
The comparison with baselines using the Inter-Eva strategy.
Type Model Parameters Accuracy Precision Recall F1-Score

Machine Learning Martín et al. (2018) No 0.567 0.631 0.326 0.429


Kim et al. (2019) No 0.607 0.747 0.622 0.678
Amer and Zelinka (2020) No 0.671 0.616 0.907 0.733
Tian et al. (2010) Yes 0.627 0.775 0.621 0.690
Deep Learning Jha et al. (2020) No 0.928 0.998 0.922 0.958
Zhang et al. (2020) Yes 0.916 0.966 0.905 0.935
Chen et al. (2022) Yes 0.985 0.986 0.984 0.985
CTIMD Yes 0.995 0.994 0.998 0.996

Third, Martín et al. (2018) and Kim et al. (2019) have the worst largest attention weight and getaddrinfo also has a large attention
performance as compared to other models. This might be because that weight. The function of OpenSCManager is to establish a connection to
the two models are constructed based on simple statistical methods or the service control manager and open a specific database. It is difficult to
symbolic patterns, which cannot generalize to unknown distributions tell whether it is a malicious call by looking at OpenSCManager itself or
and patterns. Specially, they have far worse performance under the a few other API calls around it based on the N-Gram features. However,
Inter-Eva strategy. Fourth, Chen et al. (2022) outperforms Zhang et al. the deep learning based classifiers could exploit the semantics of the
(2020). It indicates that the run-time parameter sensitivity inspired API context of OpenSCManager in a larger range. For example, by consid­
embedding used by Chen et al. (2022) can better assist the deep neural ering other API calls such as RegCreateKey, RegOpenKey, and Inter­
networks in malware detection than the hashing tricks used by Zhang netOpenUrl, we can infer that the ransomware program might try to re-
et al. (2020). We believe the reason is that the run-time parameter infect the system by following the instructions from a C&C server and
sensitivity is computed by leveraging the threat knowledge implied in setting the Registry to achieve self-startup.
the heuristic rules while the hashing tricks are purely data driven. In the second case, we show that IOCs contribute to the detection of
Finally, CTIMD achieves the best overall performance. It demon­ malware. Table 9 shows the snippet of the API call sequence of an Injuke
strates that the combination of API calls, run-time parameters, and Trojan program sample. In the experiment, this sample can only be
multi-source threat knowledge could significantly improve the malware detected by leveraging IOCs. The methods that only consider API se­
detection performance. Specifically, the threat knowledge from CTIs can quences fail to detect it. We analyze the experiment result and find that
help to detect existing malware with high precision, while the sequential the security-sensitive level of the file path in lines 2 and 3 is extremely
patterns of API calls and threat knowledge from heuristic rules are able high. This is because that the file path “C:\Users\Admin\AppData\Local
to detect unknown malware. Therefore, combining these data-driven \Temp\is-AJLTT.tmp\_isetup\_iscrypt.dll” can match the IOC “C:\Users
indicators and knowledge-driven indicators could maximize the capa­ \Admin\AppData\Local\Temp\is-IO99A.tmp_isetup_iscrypt.dll”.
bility of malware detection model. In the third case, we show that the heuristic rules could compensate
for the IOC matching results. Table 10 shows the snippet of the API call
sequence of a Trojan program sample. In the experiment, this sample can
5.5. Experiment 4: Case study
only be detected by CTIMD that considers both API calls and heuristic
rules. By analyzing the experiment result, we find that the file paths in
In this section, we use several real cases to illustrate why CTIMD is
lines 3 and 4 match the “Sensitive file path” rule. Together with other
effective.
In the first case, we obtain a snippet of the API call sequence of a
ransomware program sample, i.e., “InternetOpenUrl → Inter­ Table 9
The snippet of API calls of the second case.
netCloseHandle → OpenSCManager → getaddrinfo → RegCreateKey →
RegOpenKey → RegQueryValue → RegCloseKey”. In the experiment, if Line API name Parameters
we do not consider the run-time parameters, this sample cannot be 1 LdrGetProcedureAddress module: “shfolder”
detected by using traditional machine learning based classifiers, but can 2 NtCreateFile file_path: “C:\Users\Admin\AppData\Local
be detected by using deep learning based classifiers. To investigate the \Temp\is-AJLTT.tmp\_isetup\_iscrypt.dll”
3 NtWriteFile file_path: “C:\Users\Admin\AppData\Local
reason, we utilize the attention weights in ABiLSTM to conduct a visual \Temp\is-AJLTT.tmp\_isetup\_iscrypt.dll”
analysis (as shown in Fig. 9), where the API calls with larger attention 4 NtClose handle: “0 × 000000a8”
weights have higher impact on the detection result and are with deeper 5 SetErrorMode mode: “32,768″
background color. It can be clearly found that OpenSCManager has the

Fig. 9. The visual analysis of the attention weights of the API calls of the first case.

11
T. Chen et al. Computers & Security 136 (2024) 103518

Table 10 of China (Nos. 62372410, U22B2028, 62002324), the Zhejiang Pro­


The snippet of API calls of the third case. vincial Natural Science Foundation of China (Nos. LZ23F020011,
Line API name Parameters LD22F020002, LQ21F020016), the Key R&D Projects in Zhejiang
Province (No. 2021C01117), and the Fundamental Research Funds for
1 DeleteFileW file_path: “C:\ProgramData\Microsoft\RAC
\PublishedData\RacWmiDatabase.sdf” the Provincial Universities of Zhejiang (No. RF-A2023009).
2 NtSetValueKey regkey: “HKEY_LOCAL_MACHINE\SOFTWARE
\MICROSOFT\RELIABIL References
ITY ANALYSIS\RAC”
3 NtQueryAttributesFile file_path: “C:\Windows\system32
Abu, S., Selamat, S.R., Ariffin, A., Yusof, R., 2018b. Cyber threat intelligence – issue and
\SQLCEER30CH.DLL”
challenges. Indones. J. Electr. Eng. Comput. Sci. 10 (1), 371. https://doi.org/
4 NtQueryAttributesFile file_path: “C:\Windows\system32\SQLCEER30EN. 10.11591/ijeecs.v10.i1.pp371-379.
DLL” Agrawal, R., Stokes, J.W., Marinescu, M., Selvaraj, K., 2018b. Neural sequential malware
5 NtCreateFile file_path: “C:\ProgramData\Microsoft\RAC detection with parameters. In: Proceedings of the IEEE International Conference on
\PublishedData\RacWmiDatabase.sdf” Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/
icassp.2018.8461583.
Ahmadi, M., Sami, A., Rahimi, H., Yadegari, B., 2013. Malware detection by behavioural
suspicious API calls, CTIMD can achieve an accurate detection. sequential patterns. Comput. Fraud Secur. (8), 11–19. https://doi.org/10.1016/
s1361-3723(13)70072-1.
Amer, E., Zelinka, I., 2020. A dynamic Windows malware detection and prediction
6. Conclusions and future work method based on contextual understanding of API call sequence. Comput. Secur. 92,
101760 https://doi.org/10.1016/j.cose.2020.101760.
Ban, Y., Lee, S., Song, D., Cho, H., Yi, J.H., 2022. FAM: featuring android malware for
In this paper, we investigate the dynamic malware detection problem deep learning-based familial analysis. IEEE Access 10, 20008–20018. https://doi.
based on API call sequences. We propose a malware detection method org/10.1109/access.2022.3151357.
called CTIMD, which works by integrating threat knowledge and deep Blount J.J., Tauritz D.R., & Mulder S.A. (2011). Adaptive rule-based malware detection
employing learning classifier systems: a proof of concept. Paper presented at the 110-
learning technique. Specifically, the threat knowledge utilized by 115. doi:10.1109/compsacw.2011.28.
CTIMD includes IOCs extracted from CTIs and heuristic rules designed Chen, L., Sahita, R., Parikh, J., Marino, M.. STAMINA: scalable deep learning approach
by domain experts. Then, CTIMD creates a malware detection model by for malware classification. Intel Labs Whitepaper. https://www.intel.com/content/
dam/www/public/us/en/ai/documents/stamina-scalable-deep-learning-whitepaper
jointly learning the threat knowledge and the API call sequences with .pdf.
parameters. Through a series of experiments based on public datasets, Chen, X., Hao, Z., Li, L., Cui, L., Zhu, Y., Ding, Z., Liu, Y., 2022b. CruParamer: learning on
we demonstrate that CTIMD significantly outperforms methods taking parameter-augmented API sequences for malware detection. IEEE Trans. Inf.
Forensics Secur. 17, 788–803. https://doi.org/10.1109/tifs.20.
raw API call sequences as input, and outperforms existing methods that Cho, I.K., Im, E.G., 2015. Extracting representative API patterns of malware families
considers both API calls and run-time parameters. CTIMD also exhibits using multiple sequence alignments. In: Proceedings of the 2015 Conference on
well generalization in detecting new malware. Research in Adaptive and Convergent Systems. https://doi.org/10.1145/
2811411.2811543.
This work can be extended from the following directions. First, the
Christodorescu, M., & Jha, S., 2003. Static analysis of executables to detect malicious
threat knowledge explored in this paper is static indicators (e.g., IOCs), patterns. In Proceedings of the 12th USENIX Security Symposium (USENIX Security
which have very weak generalization ability. In the future, we will try to 03). doi:10.21236/ada449067.
extract dynamic indicators such as abstract attack patterns from CTIs to Gao, Y., Hasegawa, H., Yamaguchi, Y., Shimada, H., 2022. Malware detection by control-
flow graph level representation learning with graph isomorphism network. IEEE
build more robust malware detection model. Second, the model in this Access 10, 111830–111841. https://doi.org/10.1109/access.2022.3215267.
paper only outputs binary detection results. In the future, we will try to Grbovic, M., Cheng, H., 2018. Real-time personalization using embeddings for search
make deeper analysis on the samples. For example, we can diagnose the ranking at Airbnb. In: Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/
TTP (Tactic, Technique, and Procedure) used in the malware by 3219819.3219885.
exploiting techniques such as threat knowledge graph and interpretable Jha, S., Prashar, D., Long, H.V., Taniar, D., 2020. Recurrent neural network for detecting
deep learning. Third, there would usually be serious performance malware. Comput. Secur. 99, 102037 https://doi.org/10.1016/j.cose.2020.102037.
Kim, H., Kim, J., Kim, Y., Kim, I., Kim, K.J., Kim, H., 2019. Improvement of malware
degradation when applying the trained model in real application envi­ detection and classification using API call sequence alignment and visualization.
ronment. In the future, we will study on how to build more adaptive Cluster Comput. 22 (S1), 921–929. https://doi.org/10.1007/s10586-017-1110-2.
model by exploiting techniques such as meta-learning, knowledge Kim, Y., 2014. Convolutional neural networks for sentence classification. In: Proceedings
of the Conference on Empirical Methods in Natural Language Processing. https://
graph, and LLMs.
doi.org/10.3115/v1/d14-1181.
Kolosnjaji B., Zarras A., Webster G.D., & Eckert C. (2016). Deep learning for classification
CRediT authorship contribution statement of malware system call sequences. In Lecture Notes in Computer Science (pp.
137–149). doi:10.1007/978-3-319-50127-7_11.
Kramer, S., Bradfield, J.C., 2009. A general definition of malware. J. Comput. Virol. 6
Tieming Chen: Conceptualization, Methodology, Writing – original (2), 105–114. https://doi.org/10.1007/s11416-009-0137-1.
draft. Huan Zeng: Software, Data curation. Mingqi Lv: Supervision, Levenshtein, V., 1966. Binary codes capable of correcting deletions, insertions, and
reversals. Sov. Phys. 10 (8), 707–710. Doklady. https://ci.nii.ac.jp/naid/10020
Resources. Tiantian Zhu: Validation, Writing – review & editing. 212767.
Li, C., Lv, Q., Li, N., Wang, Y., Sun, D., Qiao, Y., 2022a. A novel deep framework for
dynamic malware detection based on API sequence intrinsic features. Comput.
Declaration of Competing Interest Secur. 116, 102686 https://doi.org/10.1016/j.cose.2022.102686.
Li, C., Cheng, Z., Zhu, H., Wang, L., Lv, Q., Wang, Y., Li, N., Sun, D., 2022b. DMalNet:
dynamic malware analysis based on API feature engineering and graph learning.
The authors declare that they have no known competing financial
Comput. Secur. 122, 102872 https://doi.org/10.1016/j.cose.2022.102872.
interests or personal relationships that could have appeared to influence Martín, A.J., Rodriguez-Fernandez, V., Camacho, D., 2018. CANDYMAN: classifying
the work reported in this paper. android malware families by modelling dynamic traces with Markov chains. Eng.
Appl. Artif. Intell. 74, 121–133. https://doi.org/10.1016/j.engappa.
Masud, M.M., Khan, L., Thuraisingham, B., 2007. A scalable multi-level feature
Data availability extraction technique to detect malicious executables. Inf. Syst. Front. 10 (1), 33–45.
https://doi.org/10.1007/s10796-007-9054-3.
Data will be made available on request. Mikolov, T., Chen, K., Corrado, G., Dean, J.M., 2013. Efficient Estimation of Word
Representations in Vector Space. Cornell University. arXiv. http://export.arxiv.org/
pdf/1301.3781.
Moser, A., Kruegel, C., & Kirda, E., 2007. Limits of static analysis for malware detection.
Acknowledgment In Proceedings of the Twenty-third Annual Computer Security Applications
Conference (ACSAC 2007) , pp. 421–430. doi:10.1109/acsac.2007.21.
Nataraj, L., Karthikeyan, S., Jacob, G., & Manjunath, B.S., 2011. Malware images:
This work was supported by the National Natural Science Foundation visualization and automatic classification. In Proceedings of the 8th International

12
T. Chen et al. Computers & Security 136 (2024) 103518

Symposium on Visualization for Cyber Security , pp. 1–7. doi:10.1145/ Ye, Y., Wang, D., Li, T., Dong-Yi, Y., Jiang, Q., 2008b. An intelligent PE-malware
2016904.2016908. detection system based on association mining. J. Comput. Virol. 4 (4), 323–334.
Qiao, Y., Yang, Y., He, J., Tang, C., Liu, Z., 2013. CBM: Free, Automatic Malware Analysis https://doi.org/10.1007/s11416-008-0082-4.
Framework Using API Call Sequences. Springer eBooks, pp. 225–236. https://doi. Ye, Y., Li, T., Chen, Y., Jiang, Q., 2010, July. Automatic malware categorization using
org/10.1007/978-3-642-37832-4_21. cluster ensemble. In: Proceedings of the 16th ACM SIGKDD International Conference
Rieck, K., Trinius, P., Willems, C., Holz, T., 2011. Automatic analysis of malware on Knowledge Discovery and Data Mining, pp. 95–104. https://doi.org/10.1145/
behavior using machine learning. J. Comput. Secur. 19 (4), 639–668. https://doi. 1835804.1835820.
org/10.3233/jcs-2010-0410. Ye, Y., Li, T., Adjeroh, D.A., Iyengar, S.S., 2017. A survey on malware detection using
Salehi, Z., Sami, A., Ghiasi, M., 2017. MAAR: robust features to detect malicious activity data mining techniques. ACM Comput. Surv. 50 (3), 1–40. https://doi.org/10.1145/
based on API calls, their arguments and return values. Eng. Appl. Artif. Intell. 59, 3073559.
93–102. https://doi.org/10.1016/j.engappai.2016.12. Zhang, Z., Qi, P., Wang, W., 2020. Dynamic malware analysis with feature engineering
Shaid, S.Z.M., Maarof, M.A., 2015. In memory detection of Windows API call hooking and feature learning. Proc. AAAI Conf. Artif. Intell. 34 (01), 1210–1217. https://doi.
technique. In: Proceedings of the 2015 International Conference on Computer, org/10.1609/aaai.v34i01.5474.
Communications, and Control Technology (I4CT), pp. 294–298. https://doi.org/
10.1109/i4ct.2015.7219584.
Tieming Chen received the Ph.D. degree in software engineering from Beihang University,
Tian, R., Islam, R., Batten, L., Versteeg, S., 2010. Differentiating malware from cleanware
China. He is currently a professor with the college of computer science and technology,
using behavioural analysis. In: Proceedings of the 5th International Conference on
Zhejiang University of Technology, China. His-research interests include cyberspace se­
Malicious and Unwanted Software, pp. 23–30. https://doi.org/10.1109/
curity and data mining.
malware.2010.5665796.
Tobiyama, S., Yamaguchi, Y., Shimada, H., Ikuse, T., Yagi, T., 2016. Malware detection
with deep neural network using process behavior. In: Proceedings of the IEEE 40th Huan Zeng received the B.S. degree in Computer Science and Technology from Taizhou
Annual Computer Software and Applications Conference (COMPSAC), 2, University, Taizhou, China, in 2018. He is currently pursuing the M.S. degree with the
pp. 577–582. College of Computer Science and Technology, Zhejiang University of Technology, China.
Uppal, D., Sinha, R., Mehra, V., Jain, V., 2014, September. Malware detection and His-research interests include malware analysis and applied machine learning.
classification based on extraction of API sequences. In: Proceedings of the
International Conference on Advances in Computing, Communications and
Mingqi Lv received the Ph.D. degree in Computer Science from Zhejiang University,
Informatics (ICACCI), pp. 2337–2342. https://doi.org/10.1109/
Hangzhou, China, in 2012. He is currently an associated professor with the College of
icacci.2014.6968547.
Computer Science and Technology, Zhejiang University of Technology, China. His-
Weber, M., Schmid, M., Schatz, M., Geyer,D., 2002. A toolkit for detecting and analyzing
research interests include applied machine learning and cyberspace security.
malicious software. Computer Security Applications Conference. Proceedings of the
18th Annual Computer Security Applications Conference. doi:10.1109/csac.2002.11
76314. Tiantian Zhu received the Ph.D. degree in computer science from Zhejiang University,
Ye, Y., Chen, L., Wang, D., Li, T., Jiang, Q., Min, Z., 2008a. SBMDS: an interpretable Hangzhou, China, in 2019. He is currently an associated professor with the college of
string based malware detection system using SVM ensemble with bagging. computer science and technology, Zhejiang University of Technology, China. His-research
J. Comput. Virol. 5 (4), 283–293. https://doi.org/10.1007/s11416-008-0108-y. interests include data mining, artificial intelligence, and information security.

13

You might also like