KEMBAR78
Intelligent Malware Detection System: Yanfang Ye Dingding Wang, Tao Li Dwang003, Taoli@cs - Fiu.edu Dongyi Ye | PDF | Antivirus Software | Malware
0% found this document useful (0 votes)
13 views5 pages

Intelligent Malware Detection System: Yanfang Ye Dingding Wang, Tao Li Dwang003, Taoli@cs - Fiu.edu Dongyi Ye

The document presents the Intelligent Malware Detection System (IMDS), which utilizes Objective-Oriented Association (OOA) mining to detect previously unseen and polymorphic malware by analyzing Windows API execution sequences in Portable Executable (PE) files. IMDS consists of three main modules: a PE parser, an OOA rule generator, and a rule-based classifier, and has been shown to outperform traditional signature-based and heuristic-based malware detection methods. Experimental results indicate that IMDS achieves higher accuracy and efficiency compared to popular antivirus software like Norton AntiVirus and McAfee VirusScan.

Uploaded by

rachanar1581
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

Intelligent Malware Detection System: Yanfang Ye Dingding Wang, Tao Li Dwang003, Taoli@cs - Fiu.edu Dongyi Ye

The document presents the Intelligent Malware Detection System (IMDS), which utilizes Objective-Oriented Association (OOA) mining to detect previously unseen and polymorphic malware by analyzing Windows API execution sequences in Portable Executable (PE) files. IMDS consists of three main modules: a PE parser, an OOA rule generator, and a rule-based classifier, and has been shown to outperform traditional signature-based and heuristic-based malware detection methods. Experimental results indicate that IMDS achieves higher accuracy and efficiency compared to popular antivirus software like Norton AntiVirus and McAfee VirusScan.

Uploaded by

rachanar1581
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Industrial and Government Track Short Paper

IMDS: Intelligent Malware Detection System

Yanfang Ye∗ Dingding Wang, Tao Li Dongyi Ye


Department of Computer School of Computer Science
College of Math and Computer
Science Florida International University Science
Xiamen University Miami, FL 33199, USA Fuzhou University
Xiamen, 361005, P.R.China dwang003,taoli@cs.fiu.edu Fuzhou, 350002, P.R.China
yeyanfang@yahoo.com.cn yiedy@fzu.edu.cn

ABSTRACT previously unseen viruses and polymorphic viruses which adopt


The proliferation of malware has presented a serious threat to the obfuscation techniques are more complex and difficult to detect.
security of computer systems. Traditional signature-based anti- Currently, most widely-used malware detection software uses
virus systems fail to detect polymorphic and new, previously un- signature-based and heuristic-based algorithms to recognize threats.
seen malicious executables. In this paper, resting on the analysis of Signatures are short strings of bytes which are unique to the pro-
Windows API execution sequences called by PE files, we develop grams. They can be used to identify particular viruses in executable
the Intelligent Malware Detection System (IMDS) using Objective- files, boot records, or memory with small error rate [8]. However,
Oriented Association (OOA) mining based classification. IMDS is signature based method is expensive and slow. In addition, they
an integrated system consisting of three major modules: PE parser, are not effectively against modified and unknown malicious exe-
OOA rule generator, and rule based classifier. An OOA Fast FP- cutables. Heuristic-based recognition tends to provide protection
Growth algorithm is adapted to efficiently generate OOA rules for against new and unknown threats. However, this kind of virus de-
classification. A comprehensive experimental study on a large col- tection is usually inefficient and inaccurate.
lection of PE files obtained from the anti-virus laboratory of King- Our efforts for detecting polymorphic and new, previously un-
Soft Corporation is performed to compare various malware de- seen malicious executables lead to the Intelligent Malware Detec-
tection approaches. Promising experimental results demonstrate tion System (IMDS), which applies Objective-Oriented Associa-
that the accuracy and efficiency of our IMDS system outperform tion (OOA) mining based classification [10, 14]. The IMDS rests
popular anti-virus software such as Norton AntiVirus and McAfee on the analysis of Windows Application Programming Interface
VirusScan, as well as previous data mining based detection systems (API) execution sequences which reflect the behavior of program
which employed Naive Bayes, Support Vector Machine (SVM) and code pieces. The associations among the APIs capture the underly-
Decision Tree techniques. ing semantics for the data are essential to malware detection. Our
malware detection is carried out directly on Windows Portable Ex-
ecutable (PE) code with three major steps: (1) first constructing the
Categories and Subject Descriptors API execution sequences by developing a PE parser; (2) followed
I.2 [Artificial Intelligence]: Learning by extracting OOA rules using OOA Fast FP-Growth algorithm;
(3) and finally conducting classification based on the association
General Terms rules generated in the second step. So far, we have gathered 29580
executables, of which 12214 are referred to as benign executables
Algorithms, Experimentation, Security and 17366 are malicious ones. These executables called 12964
APIs in total. The collected data in our work is significantly larger
Keywords than those used in previous studies on data mining for malware de-
Malware, PE file, Windows API sequence, OOA mining tection [13, 9, 18]. The experimental results illustrate that the ac-
curacy and efficiency of our IMDS outperform popular anti-virus
software such as Norton AntiVirus and McAfee VirusScan, as well
1. INTRODUCTION as previous systems using data mining approaches such as Naive
Malicious executables are programs designed to infiltrate or dam- Bayes, SVM and Decision Tree. Our approach for malware de-
age a computer system without the owner’s consent, which have tection has been incorporated into the scanning tool of KingSoft’s
become a serious threat to the security of computer systems. New, AntiVirus software.

The work was done while the author was working in the Anti-virus
laboratory at KingSoft Corporation. 2. RELATED WORK
Besides the traditional signature-based malware detection meth-
ods, there is some work to improve the signature-based detection
Permission to make digital or hard copies of all or part of this work for [15, 3, 12] and also a few attempts to apply data mining and ma-
personal or classroom use is granted without fee provided that copies are chine learning techniques to detect new malicious executables.
not made or distributed for profit or commercial advantage and that copies Sung et al. [15] developed a signature based malware detec-
bear this notice and the full citation on the first page. To copy otherwise, to tion system called SAVE (Static Analyzer of Vicious Executables)
republish, to post on servers or to redistribute to lists, requires prior specific which emphasized on detecting polymorphic malware. The basic
permission and/or a fee.
KDD'07, August 12-15, 2007, San Jose, California, USA. idea of this approach is to extract the signatures from the original
Copyright 2007 ACM 978-1-59593-609-7/07/0008 ...$5.00. malware with the hypothesis that all versions of the same malware

1043
Industrial and Government Track Short Paper

share a common core signature. Schultz et al. [13] applied Naive third party binary compress toll, it needs to be decompressed before
Bayes method to detect previously unknown malicious code. De- being passed to the PE parser. Through the API query database,
cision Tree was studied in [18, 9]. Kolter et al. [9] gathered 1971 the API execution sequence generated by the PE parser can be con-
benign executables and 1651 malicious executables in Windows verted to a group of 32-bit global IDs which represents the static
PE format, and examined the performance of different classifiers execution sequence of the corresponding API functions. Then we
such as Naive Bayes, support vector machine (SVM) and Decision use the API calls as the signatures of the PE files and store them in
Tree using 10-fold cross validation and plotting ROC curves [16]. the signature database, which contains 6 fields: record ID, PE file
Their results also showed that the ROC curve of the Decision Tree name, file type (“0” represents benign file while “1” is for mali-
method dominated all others. cious file), called API sequence name, called API ID, and the total
Different from earlier studies, our work is based on a large col- number of called API functions, as shown in Figure 2. After that,
lection of malicious executables collected at KingSoft Anti-Virus an OOA mining algorithm is applied to generate class association
Laboratory. In addition, we apply OOA mining technique to ex- rules which are recorded in the rule database. To finally determine
tract the characterizing frequent patterns to achieve accurate mal- whether a PE file is malicious or not, we pass the selected API calls
ware detection since frequent patterns found by association mining together with the rules generated to the malware detection module
carry the underlying semantics of the data. to perform the association rule based classification.

3. THE SYSTEM ARCHITECTURE


Our IMDS system is performed directly on Windows PE code.
PE is designed as a common file format for all flavors of Windows
operating system, and PE viruses are in the majority of the viruses
rising in recent years. Some famous viruses such as CIH, CodeRed,
CodeBlue, Nimda, Sircam, Killonce, Sobig, and LoveGate all aim
at PE files. The system consists of three major components: PE
parser, OOA rule generator, and malware detection module, as il-
lustrated in Figure 1.

Figure 2: Samples in the signature database

4. CLASSIFICATION BASED ON OOA


MINING
Both classification and association mining play important roles
in data mining techniques. Classification is “the task of learning a
target function that maps each feature set to one of the predefined
class labels” [17]. For association rule mining, there is no pre-
determined target. Given a set of transactions in the database, all
the rules that satisfied the support and confidence thresholds will
be discovered [1]. As a matter of fact, classification and associa-
tion rule mining can be integrated to association rule based classi-
fication [10, 2]. This technique utilizes the properties of frequent
patterns to solve the scalability and overfitting issues in classifica-
tion and achieves excellent accuracy [2]. In our IMDS system, we
adapted OOA mining techniques [14] to generate the rules.
4.1 OOA Definitions
In our IMDS system, the goal is to find out how a set of API calls
supports the specific objectives: Obj1 = (Group = M alicious),
and Obj2 = (Group = Benign).
Figure 1: IMDS system architecture • Definition 1 (Support and confidence) Let I = {I1 , ..., Im }
be an itemset and I → Obj(os%, oc%) be an association
The functionality of the PE parser is to generate the Windows rule in OOA mining. The support and confidence of the rule
API execution sequence for each benign/malicious executable. Since are defined as:
a virus scanner is usually a speed sensitive application, in order to os% = supp(I, Obj) =
count(I ∪ {Obj}, DB)
× 100%
improve the system performance, we developed a PE parser to con- |DB|
struct the API execution sequences of PE files instead of using a count(I ∪ {Obj}, DB)
= conf (I, Obj) = × 100%
third party disassembler. If a PE file is previously compressed by a oc%
count(I, DB)

1044
Industrial and Government Track Short Paper

where the function count(I ∪{Obj}, DB) returns the num- is malicious or not. In the experiments section, we perform a com-
ber of records in the dataset DB where I ∪ {Obj} holds. prehensive experimental study to evaluate the efficiency of different
OOA mining algorithms.
• Definition 2 (OOA frequent itemset) Given mos% as a
user-specified minimum support. I is an OOA frequent item- 4.4 Associative Classifier
set/pattern in DB if os% ≥ mos%. For OOA rule generation, we use the OOA Fast FP-Growth al-
• Definition 3 (OOA rule) Given moc% as a user-specified gorithm to obtain all the association rules with certain support and
confidence. Let I = {I1 , ..., Im } be an OOA frequent item- confidence thresholds, and two objectives: Obj1
set. I → Obj(os%, oc%) is an OOA rule if oc% ≥ moc%. = (Group = M alicious), Obj2 = (Group = Benign). Then
we apply the technique of classification based on association rules
4.2 OOA Fast FP-Growth Algorithm (CBA) to build a CBA classifier [10] as our malware detection mod-
ule. The CBA classifier is built on rules with high support and
Although Apriori algorithm can be extended to OOA mining,
confidence and uses the association between frequent patterns and
it requires many iterations to generate all of the frequent itemsets
the type of files for prediction. So, our malware detection module
before generating the association rules. An alternative OOA mining
takes the input of generated OOA rules and outputs the prediction
algorithm called OOA FP-Growth is designed based on FP-Growth
of whether an executable is malicious or not.
algorithm [6, 5]. In general, OOA FP-Growth algorithm is much
faster than OOA Apriori for mining frequent itemsets. However,
when the minimum support is small, OOA FP-Growth generates a 5. DATA COLLECTION
huge number of conditional FP-trees recursively, which is time and As stated previously, we obtained 29580 Windows PE files of
space consuming. Our malware detection relies on finding frequent which 12214 were recognized as benign executables while 17366
patterns from large collections of data, therefore, the efficiency is were malicious executables. The malicious executables mainly con-
an essential issue to our system. In our IMDS system, we extend a sisted of backdoors, worms, and Trojan horses and all of them
modified FP-Growth algorithm proposed in [4] to conduct the OOA were provided by the Anti-virus laboratory of KingSoft Corpora-
mining. This algorithm greatly reduces the costs of processing time tion. The benign executables were gathered from the system files
and memory space, and we call it OOA Fast FP-Growth algorithm. of Windows 2000/NT and XP operating system.
Similar to OOA FP-Growth algorithm, there are also two steps in
OOA Fast FP-Growth algorithm: constructing an OOA Fast FP- 6. EXPERIMENTAL RESULTS AND
tree and generating frequent patterns from the tree. But the struc-
ture of an OOA Fast FP-tree is different from that of an OOA FP-
ANALYSIS
tree in the following way: (1) The paths of an OOA Fast FP-tree We conduct three sets of experiments using our collected data.
are directed, and there is no path from the root to leaves. Thus, In the first set of experiments, we evaluate the efficiency of differ-
fewer pointers are needed and less memory space is required. (2) ent OOA mining algorithms. The second set of experiments is to
In an OOA FP-tree, each node is the name of an item, but in an compare the abilities to detect polymorphic and unknown malware
OOA Fast FP-tree, each node is the sequence number of an item, of our IMDS system with current widely-used anti-virus software.
which is determined by the support count of the item. The detailed The efficiency and false positives by using different scanners have
description can be referenced in [4]. also been examined. Finally, we compare our IMDS system with
other classification based methods. All the experiments are con-
4.3 An Illustrating Example ducted under the environment of Windows 2000 operating system
At the beginning of this section, we state that frequent patterns plus Intel P4 1Ghz CPU and 1Gb of RAM.
are essential to accurate classification. To demonstrate the effec-
tiveness of the frequent patterns, we show an example rule gener-
6.1 Evaluation of Different OOA Mining
ated by OOA Fast FP-Growth algorithm. We sample 5611 records Algorithms
from our signature database, of which 3394 records are malicious In the first set of experiments, we implement OOA Apriori,
executables and 2217 records are benign executables. One of the OOA FP-Growth, and OOA Fast FP-Growth algorithms under Mi-
rules we generated is: crosoft Visual C++ environment. We sample 5611 records from our
(2230, 398, 145, 138, 115, 77) → Obj1 = (Group = M alicious) signature database, which includes 3394 records of malicious exe-
(os = 0.296739, oc = 0.993437), cutables and 2217 records of benign executables. By using differ-
where os and oc represent the support and confidence, respectively. ent support and confidence thresholds, we compare the efficiency
After converting the API IDs to API names via our API query of the three algorithms. The results are shown in Table 3 and Fig-
database, this rule becomes: ure 3. From Table 3, we observe that the time complexity increases
(KERN EL32.DLL, OpenP rocess; CopyF ileA; CloseHandle; exponentially as the minimum support threshold decreases. How-
GetV ersionExA; GetM oduleF ileN ameA; W riteF ile; ) → ever, it shows obviously that the OOA Fast FP-Growth algorithm
Obj1 = (Group = M alicious) (os = 0.296739, oc = 0.993437). is much more efficient than the other two algorithms, and it even
After analyzing the API sequence in this rule, we know that the pro- doubles the speed of performing OOA FP-Growth algorithm. Fig-
gram actually executes the following functions: (i) returns a handle ure 3 presents a clearer graphical view of the results.
to an existing process object; (ii) copies an existing file to a new
file; (iii) closes the handle of the open object; (iv) obtains extended 6.2 Comparisons of Different Anti-virus
information about the version of the currently running operating Scanners
system; (v) retrieves the complete path of the file that contains the In this section, we examine the abilities of detecting polymor-
specified module of current process; and (vi) writes data to the file. phic malware and unknown malware of our system in comparison
with the os and oc values, we know that this sequence of API ap- with some of the popular software tools such as Norton AntiVirus
pears in 1665 malware, while only in 11 benign files. Obviously, it 2006, Dr.Web, McAfee VirusScan and Kaspersky Anti-Virus. The
is one of the essential rules for determining whether an executable efficiency and the number of false positives are also evaluated.

1045
Industrial and Government Track Short Paper

Experiment 1 2 3 4 Software N M D K IMDS


√ √ √ √ √
mos 0.355 0.35 0.34 0.294 Beagle √ √ √ √
moc 0.9 0.95 0.9 0.98 Beagle V1 √ × √ √
OOA Apriori 25 260.5 ∞ ∞ Beagle V2 × × √ √
OOA FP-Growth 8 16.14 60.5 280.2 Beagle V3 ×
√ ×
√ × √
OOA Fast FP-Growth 4.1 7.99 28.8 143.5 Beagle V4 √ √ ×
√ ×
√ √
Blaster √ √ √ √ √
Table 1: Running time of different OOA mining algorithms Blaster V1 √
(min). mos and moc represent the minimum support and mini- Blaster V2 ×
√ ×
√ ×
√ ×
√ √
mum confidence for each experiment. Lovedoor √ √
Lovedoor V1 × × × √
Lovedoor V2 × ×
√ × ×
√ √
Lovedoor V3 ×
√ √ ×
√ √ √
Mydoom √
Mydoom V1 × × × × √
Mydoom V1 × × × ?

Table 2: Polymorphic malware detection. Remark: N - Norton


AntiVirus,
√ M - MacAfee, D - Dr.Web, K - Kaspersky. In the
table, “ ” indicates successful detection, “×” indicates failure
to detect, and “?” represents only an “alert”; all the scanners
used are of most current and updated version.

Software N M D K IMDS
√ √ √ √
malware1 ×
Figure 3: Comparison on the efficiency of different OOA min- √ √ √
malware2 × ×
ing algorithms √ √
malware3 × × ×
malware4 × ×
√ × × ×

6.2.1 Polymorphic Virus Detection malware5 × ×
√ × √
malware6 × × ×
In this experiment, we sample 3000 malicious files and 2000 be- √ √
malware7 × × ×
nign files in the training data set, then we use 1500 malware and √
malware8 × × × ×
500 benign executables as the test data set. Several recent Win32 √ √ √
malware9 × ×
PE viruses are included in the test data set for analysis such as √
malware10 × × × ×
Lovedoor, My doom, Blaster, and Beagle. For each virus, we ap- √ √
malware11 × × ×
ply the obfuscation techniques described in [15] to create a set of √ √ √
malware12 × ×
polymorphic versions. Then we compare our system with current .. .. .. .. .. ..
most widely-used anti-virus software. The results shown in Table . . . . . .
√ √
4 demonstrate that our IMDS system achieves better accuracy than malware1000 × × ×
other software in polymorphic malware detection. Stat. 633 753 620 780 920
Ratio. 63.3% 75.3% 62% 78% 92%
6.2.2 Unknown Malware Detection
In order to examine the ability of identifying new and previously Table 3: Unknown malware detection
unknown malware of our IMDS system, in the test data set, we use
1000 malware analyzed by the experts in KingSoft Anti-virus labo-
ratory, while the signatures of the malware have not been recorded in the same environment. The number of false positives by using
into the virus signature database. Comparing with other anti-virus different scanners are also examined. By scanning 1500 benign ex-
software, our IMDS system performs most accurate detection. The ecutables whose signatures have not been recorded in the signature
results are listed in Table 5. database, we obtain the results shown in Figure 5. Figure 5 clearly
shows that the false positives by using our IMDS system are much
6.2.3 System Efficiency and False Positives fewer than other scanners.
In malware detection, a false positive occurs when the scanner
marks a benign file as a malicious one by error. False positives 6.3 Comparisons of Different Classification
can be costly nuisances due to the waste of time and resources to Methods
deal with those falsely reported files. In this set of experiments, In this set of experiments, we compare our system with Naive
in order to examine the system efficiency and the number of false Bayes, Support Vector Machine (SVM) and Decision Tree meth-
positives of the IMDS system, we sample 2000 executables in the ods.We randomly select 2843 executables from our data collection,
test data set, which contains 500 malicious executables and 1500 in which 1207 files are benign and 1636 executables are malicious.
benign ones. Then we convert the transactional sample data in our signature
First, we compare the efficiency of our system with different database into a relational table, in which each column corresponds
scanners including the scanner named “SAVE” [15, 20] described to an API and each row is an executable. This transformation makes
in related work and some widely-used anti-virus software. it easy to apply feature selection methods and other classification
The results in Figure 4 illustrate that our IMDS system achieves approaches.
much higher efficiency than other scanners when being executed First, we rank each API using Max-Relevance algorithm [11],

1046
Industrial and Government Track Short Paper

to improve the system effectiveness and efficiency; (3) We evaluate


our system on a large collection of executables including 12214 be-
nign samples and 17366 malicious ones; (4) We provide a compre-
hensive experimental study on various anti-virus software as well
as various data mining techniques for malware detection using our
data collection; (5) Our system has been already incorporated into
the scanning tool of KingSoft’s AntiVirus software.

Acknowledgments
The work of Tao Li is partially supported by NSF IIS-0546280. The
authors would also like to thank the members in the Anti-virus lab-
Figure 4: Efficiency of different scanners
oratory at KingSoft Corporation for their helpful discussions and
suggestions.

8. REFERENCES
[1] R. Agrawal and R. Srikant. Fast algorithms for association rule
mining. In Proceedings of VLDB-94, 1994.
[2] H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent
pattern analysis for effective classification. In ICDE-07, 2007.
[3] M. Christodorescu and S. Jha. Static analysis of executables to detect
malicious patterns. In Proceedings of the 12th USENIX Security
Symposium, 2003.
[4] M. Fan and C. Li. Mining frequent patterns in an fp-tree without
conditional fp-tree generation. Journal of Computer Research and
Development, 40:1216–1222, 2003.
Figure 5: False Positives by using different scanners [5] J. Han and M. Kamber. Data mining: Concepts and techniques, 2nd
edition. Morgan Kaufmann, 2006.
[6] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate
and then choose top 500 API calls as the features for later clas- generation. In Proceedings of SIGMOD, pages 1–12, May 2000.
sification. In the experiments, we use the Naive Bayes classifier [7] C. Hsu and C. Lin. A comparison of methods for multiclass support
and J4.8 version of Decision Tree implemented in WEKA [19], vector machines. IEEE Trans. Neural Networks, 13:415–425, 2002.
and also the SVM implemented in LIBSVM package [7]. For the [8] J. Kephart and W. Arnold. Automatic extraction of computer virus
OOA Fast FP-Growth mining, we select thresholds based on two signatures. In Proceedings of 4th Virus Bulletin International
criteria: setting moc as close to 1 as possible; and selecting a big Conference, pages 178–184, 1994.
[9] J. Kolter and M. Maloof. Learning to detect malicious executables in
mos without exceeding the maximum support in the data set. Then,
the wild. In Proceedings of KDD’04, 2004.
in the experiment, we set mos to 0.294 and moc to 0.98. Ten-fold [10] B. Liu, W. Hsu, and Y. Ma. Integrating classification and association
cross validation is used to evaluate the accuracy of each classifier. rule mining. In Proceedings of KDD’98, 1998.
Results shown in Table 6 indicate our IMDS system achieve most [11] H. Peng, F. Long, and C. Ding. Feature selection based on mutual
accurate malware detection. information: Criteria of max-dependency, max-relevance, and
min-redundancy. IEEE Trans. Pattern Analysis and Machine
Algorithms TP TN FP FN DR ACY Intelligence, 27, 2005.
Naive Bayes 1340 1044 163 296 81.91% 83.86% [12] J. Rabek, R. Khazan, S. Lewandowski, and R. Cunningham.
SVM 1585 989 218 51 96.88% 90.54% Detection of injected, dynamically generated, and obfuscated
J4.8 1574 1027 609 62 96.21% 91.49% malicious code. In Proceedings of the 2003 ACM workshop on Rapid
malcode, pages 76–82, 2003.
IMDS 1590 1056 151 46 97.19% 93.07%
[13] M. Schultz, E. Eskin, and E. Zadok. Data mining methods for
detection of new malicious executables. In Proceedings of IEEE
Table 4: Results by using different classifiers. TP, TN, FP, International Conference on Data Mining, 2001.
FN, DR, and ACY refer to True Positive, True Negative, False [14] Y. Shen, Q. Yang, and Z. Zhang. Objective-oriented utility-based
Positive, False Negative, Detection Rate, and Accuracy, respec- association mining. In Proceedings of IEEE International
tively. Conference on Data Mining, 2002.
[15] A. Sung, J. Xu, P. Chavez, and S. Mukkamala. Static analyzer of
vicious executables (save). In Proceedings of the 20th Annual
From the comparison, we observe that our IMDS outperforms Computer Security Applications Conference, 2004.
other classification methods in both detection rate and accuracy. [16] J. Swets and R. Pickett. Evaluation of diagnostic system: Methods
This is because “frequent patterns are of statistical significance and from signal detection theory. Acdemic Press, 1982.
a classifier with frequent pattern analysis is generally effective to [17] P. Tan, M. Steinbach, and V. Kumar. Introduction to data mining.
test datasets” [2]. Addison Wesley, 2005.
[18] J. Wang, P. Deng, Y. Fan, L. Jaw, and Y. Liu. Virus detection using
data mining techniques. In Proceedings of IEEE International
7. CONCLUSIONS Conference on Data Mining, 2003.
In this paper, we describe our research effort on malware detec- [19] H. Witten and E. Frank. Data mining: Practical machine learning
tion based on window API sequence. In summary, our main contri- tools with Java implementations. Morgan Kaufmann, 2005.
butions are: (1) We develop an integrated IMDS system based on [20] J. Xu, A. Sung, P. Chavez, and S. Mukkamala. Polymorphic malicous
analysis of Windows API execution sequences. The system con- executable sanner by api sequence analysis. In Proceedings of the
sists of three components: PE parser, rule generator and classifier; International Conference on Hybrid Intelligent Systems, 2004.
(2) We adapt existing association based classification techniques

1047

You might also like