Using Machine Learning in Networks Intrusion Detection Systems

Using Machine Learning in
Networks Intrusion Detection
Systems
OMAR SHAYA
Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 1

Sections
✤ Introduction
✤ Intrusion Detection Methodologies
✤ A Machine Learning Based IDS (Intrusion Detection System)
✤ Challenges of Using Machine Learning in Intrusion Detection
✤ Summary
✤ References
✤ Appendix

INTRODUCITON
IDS: Intrusion Detection System

Increasing attacks on computer networks and the need
for automated detection
• Internet and computer systems have raised numerous security
and privacy issues
• Explosive use of networks due to many reasons e.g. internet,
wireless networks, cloud computing
• Thus, malicious attacks on networks have increased year over
year
• Need to automate systems that detect these attacks
• Based on on known attacks
• But what about attacks that were not seen before
• Machine learning?
INTRODUCTION

Deﬁnition: intrusion & intrusion detection
INTRODUCTION
“Intrusion is an attempt to compromise CIA
(Conﬁdentiality, Integrity, Availability), or to bypass
the security mechanisms of a computer or network“
“Intrusion detection is the process of monitoring
the events occurring in a computer system or
network, and analyzing them for signs of intrusion”

INTRUSION DETECTION METHODOLOGIES

There are 3 main Detection Methodologies
• Signature-based Detection (SD)
• A signature is a string or pattern that corresponds to known attack or threat
• SD is a process to compare patterns against captured events for recognizing
possible intrusions
• Uses the knowledge accumulated by specific attacks and system vulnerabilities
• Also known as Knowledge-based Detection or Misuse Detection
• Anomaly-based Detection (AD)
• Anomaly is a deviation to “normal” behavior
• Profiles of normal derived from monitoring network traffic
• AD compares normal profiles with observed events to recognize attacks
• Stateful Protocol Analysis (SPA)
• SPA depends on vendor-developed generic profiles to specific protocols
• Protocols based on standards from international standard organizations
• Hybrid IDS use multiple methodologies
• SD and AD are complementary methods, former concerns with certain attacks
and the later focuses on unknown attacks

There are 3 main Detection Methodologies
• Hybrid IDS use multiple methodologies
• E.g. SD and AD are complementary methods
• SD concerns with certain attacks and AD focuses on unknown attacks
Signature-based Detection
(SD)*
Anomaly-based Detection
(AD)
Stateful Protocol Analysis
(SPA)
SD is a process to compare patterns
against captured events for
recognizing possible intrusions
AD compares normal profiles with
observed events to recognize attacks
SPA depends on vendor-developed
generic profiles to specific protocols
A signature is a string or pattern that
corresponds to known attack or threat
Anomaly is a deviation to “normal”
behavior
The stateful in SPA indicates that IDS
could know and trace the protocol
states (e.g., pairing requests with
replies)
Uses the knowledge accumulated by
specific attacks and system
vulnerabilities
Profiles of normal derived from
monitoring network traffic
Protocols based on standards from
international standard organizations
* Also known as Knowledge-based Detection or Misuse Detection

Pros and cons of Intrusion Detection Methods
Table 1: Pros and Cons of intrusion detection methodologies. Source [2]
Signature-based Detection
(SD)
Anomaly-based Detection
(AD)
Stateful Protocol Analysis
(SPA)
• Simplest and effective method to
detect attacks
• Detail contextual analysis
• Effective to detect new and
unforeseen vulnerabilities
• Less dependent on OS
• Facilitate detections of privilege
abuse
• Know and trace protocol states
• Distinguish unexpected sequences
of commands
• Ineffective with unknown attacks
and variants of known attacks
• Little understanding to states and
protocols
• Hard to keep signatures/patterns up
to date
• Time consuming to maintain the
knowledge
• Weak profiles accuracy due to
observed events
• Unavailable during rebuilding of
behavior profiles
• Difficult to trigger alerts in right time
• Resource consuming to protocol
state tracing and examination
• Unable to inspect attacks looking
like benign protocol behaviors
• Might be incompatible to dedicated
OSs or APs
PROSCONS

A MACHINE LEARNING BASED IDS

Machine learning in anomaly detection
• Easy when it is possible to characterize what is normal in the
data using simple mathematical model, e.g. normal distribution
• Most interesting real world systems have complex behavior that
doesn’t follow such distribution
• Machine learning is useful to learn the characteristics of the
system from observed data
• Feature Selection is the process of selecting a subset of relevant
features (variables, predictors) for use in model construction. Feature
selection techniques are used for three reasons:
• Simpliﬁcation of models to make them easier to interpret
• Shorter training times
• Enhanced generalization by reducing overﬁtting
• Outlier Detection: an outlier is an observation point that is distant from
other observations

Robust Feature Selection and Robust PCA for Internet
Trafﬁc Anomaly Detection
• Couples feature selection algorithm with outlier detection
method
• Uses robust statistics tools in both procedures
• Reliable results even with outliers’ presence
• Feature selection based on robust mutual estimator
• MI (Mutual Information): an information-theoretic metric that
captures both linear and non-linear dependencies
• Outlier detection on robust PCA (Principal Component Analysis)
• Mathematical procedure used to reduce dimensionality of a
problem

• Feature selection
• Important preprocessing step (ﬁlter)
• Reduce dimensionality with high-dimensional data
• Remove irrelevant data
• Increase learning accuracy
• Gives signiﬁcant performance gains  

• Robust statistics
• Reliable results even in the
presence of outliers
Example:
• In normal distribution, the inner 95%
are in “center ± 1.96 X spread”
• Center: instead of mean,  
take the median
• Spread: instead of SD (standard
deviation), take the MAD (median
absolute deviation)
Source [1]

Dataset creation for training and testing (1/2)
• Dataset collected from mirroring traffic passing the switch of:
• Private laboratory network, 17 inter-connected PCs
• 10 for users producing licit traffic
• 1 for server, 1 for measurements
• 5 for attacks
• Licit traffic
• File sharing (BitTorrent)
• Video streaming (IPTV over TCP)
• Web browsing (HTTP)
• Attacks
• Botnets
• Port-scans: identify other targets vulnerable to infections
• Snapshots: type of identity theft for stealing personal information
• Other Botnet attacks are not used e.g. spyware, malware, denial of service, and
email spam
• Happen uniquely on host level
• Can be detected by e.g. anti-virus, monitoring at router/firewalls, email scanning

Dataset creation for training and testing (2/2)
• Customer usage proﬁles
• (a) Soft browsing (HTTP only)
• (b) File sharing machine (BitTorrent only)
• (c) File sharing user (BitTorrent and HTTP)
• (d) Heavy user (HTTP, BitTorrent, and
Streaming)
• Network scenarios
• (B) Business user
• 100% (a)
• (R) Residential user
• 30% (b), 40% (c), 30% (d)
• Attack intensities
• (1) 6% (5% snapshot, 1% port-scan)
Table 2. Source [1]

Results (1/3)
• 6 types of anomaly detectors A-B
• A: feature selection method, B Outlier
detection method
• R (robust)
• NR (non-robust)
• ∅ (no-method)
• Performance measures
• Nr Ftrs: number of selected features
• Recall: probability that an observation is
classified as anomaly when in fact it is an
anomaly
• False positive rate (FPR): probability that an
observation is classified as an anomaly when
in fact it is a regular observation
• Precision: probability of having an anomalous
observation given that it is classified as an
anomaly
Table 3. Source [1]

Results (2/3)
• R-R detector achieved the best
results
• Recall is always 1
• B1, B2, B3, R3 performance is maximum
• FPR and Precision are close to their optimal
• Improvement over non-robust
version is high
• Low recall means large percentage of
anomalies are not correctly identiﬁed
• B2, B3, R3 recall improved from 0.167,
0.273, and 0.125 to 1
• Feature selection
• Feature selection reduces Nr Ftrs, improves
performance
• B3 and R3: no feature selection sometimes
better than non-robust feature selection
Table 3. Source [1]

Results (3/3)
• Compare R-NR (top) and R-R
(bottom)
• Any point with score or distance
larger than a threshold (the lines) is
considered an anomaly
• R-NR case there is confusion
around snapshots
• thus poor recall value 0.125
• proximity in behavior between snapshots and
some HTTP and BitTorrent fools the non-robust
outlier detector
• All consist of small ﬁle uploads
Source [1]
Fig. 2.

Discussion
• There are advantages of using feature selection step and
using robust statistics for both feature selection and outlier
detection
• System achieves very high performance
• The system’s anomaly detector is adaptive to different traffic conditions (licit traffic
differs significantly in the two scenarios)
• However, the dataset used was obtained from a private lab
with 17 PCs, and not necessarily representative of a real
world scenario
• Need to show proof of the effectiveness of the system in larger scale network
traffic dataset

CHALLENGES OF USING MACHINE
LEARNING IN INTRUSION DETECTION

Outliers, cost of error, semantics, and evaluation
• Outlier detection
• Hard to define normal in network traffic as the usage varies in every
session and with new applications (diversity of network traffic)
• High cost of errors
• Cost of misclassification is extremely high
• False positive: expensive analyst time
• False negative: cause serious damage to an organization
• Error in other applications of ML not expensive e.g. product
recommendations, OCR, spam detection
• Semantic gap
• Currently it is only assessment of capability to identify deviations from
normal profile (could be good or bad)
• Need to interpret results from operator point of view, what does it mean?
• Difficulties with evaluation
• Designing sound evaluation schemes can be more difficult than the
detector itself
• Lack of public data sets for assessing anomaly detection
• Hard to gain real data set for many reasons e.g. leak of personal data
• Simulated data is not accurate
CHALLENGES OF USING MACHINE LEARNING IN INTRUSION DETECTION

SUMMARY

Summary
• Introduction
• The need for automated Intrusion Detection Systems
• Deﬁnition of Intrusion and Intrusion Detection 
• Intrusion Detection Methodologies
• Signature-based Detection (SD)
• Stateful Protocol Analysis (SPA) 
• Machine Learning Based IDS
• Using feature selection and robust statistics
• Dataset creation
• Results and evaluation
• Discussion 
• Challenges of Using Machine Learning in ID
• Outlier detection, high cost of error, semantic gap, and difﬁculties with evaluation
SUMMARY

OMAR SHAYA –––––––– omar.shaya@stud.uni-goettingen.de
Thanks!

References
[1] C. Pasocal, M. Oliveira, R. Valdas, P. Filzmoser, P. Salvador and A. Pacheco. Robust Feature Selection and
Robust PCA for Internet Traffic Anomaly Detection. In Proceedings IEEE INFOCOM, pages 1755-1763, 2012
[2] H. Liao, C. Lin, Y. Lin and K. Tung. Intrusion Detection System: A Comprehensive Review. In Journal of
Network and Computer Applications, pages 16-24, 2013
[3] R. Sommer and V. Paxson. Outside the Closed World: On Using Machine Learning For Network Intrusion
Detection. In IEEE Symposium on Security and Privacy, pages 305-316, 2010
[4] Feature Selection. https://en.wikipedia.org/wiki/Feature_selection on 6 August 2015
[5] Outlier. https://en.wikipedia.org/wiki/Outlier on 6 August 2015
[6] Anomaly Detection – Using Machine Learning to Detect Abnormalities in Time Series Data. http://
blogs.technet.com/b/machinelearning/archive/2014/11/05/anomaly-detection-using-machine-learning-to-
detect-abnormalities-in-time-series-data.aspx on 6 August 2015
REFERENCES

Precision and Recall
APPENDIX
Source: Dr. Stephan Sigg’s slides from Machine Learning and Pervasive Computing course SoSe 2015

Using Machine Learning in Networks Intrusion Detection Systems

More Related Content

What's hot

Similar to Using Machine Learning in Networks Intrusion Detection Systems

Recently uploaded

In this document

Using Machine Learning in Networks Intrusion Detection Systems