KEMBAR78
Network_Intrusion_Detection_System_Team1 | PPTX
Network Analytics :
Intrusion Detection
using Machine
Learning
Intrusion Detection System(IDS)
• Combination of software and hardware that attempts to
perform intrusion detection
• Raise the alarm when possible intrusion or suspicious patterns are
observed
The
Internet
Attacker
Internal Network
Firewall
IDS
IDS
Why we need IDS?
• Unknown weakness or bugs
• Complex, unforeseen attacks
• Firewalls, security policies
• Using information detected
• Recover compromised system
• Understand the attack mechanism
• Detect novel attacks
• Defend our systems
Types of IDS
These are the main types of Intrusion Detection Systems:
• Host Based
• Network Based
• Stack Based
• Signature Based
• Anomaly Based
KDD Cup 99 Data Set
• Modification of DARPA 1998 data set
• DARPA 1998 data set
• Managed by Lincoln Lab.(under DARPA sponsorship)
• Simulated nine weeks of raw TCP dump data
• Attacks
• 38 different attacks against Unix/Linux machines
• DoS, Scan, Buffer overflow and so on.
• Normal traffic
• 1000’s of virtual hosts and 100’s of user automata
KDD Cup 99 Data Set
• Each connection ⇒ 41-dimensions vector
• Samples
5,tcp,smtp,SF,959,337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,
0.00,0.00,144,192,0.70,0.02,0.01,0.01,0.00,0.00,0.00,0.00,normal
0,tcp,http,SF,54540,8314,0,0,0,2,0,1,1,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.0
0,0.00,0.00,118,118,1.00,0.00,0.01,0.00,0.00,0.00,0.02,0.02,back.
• Numerical: 34, Categorical: 7
• Basic feature: “duration”, “protocol”…
• Statistical feature: “number of connections to the same host as the current connection in the past two
seconds”…
• Label ⇒ “normal” or “name of attacks”
FLOW:
Pre-processing of
data in R
Pre-processing of
data in Azure ML
Filter-based
Feature Selection
Model Selection
Tune Model
Parameters
Build system for
selected model
Deploy the
selected model
Build website for
ML as a Service
Data pre-processing in R
• Assign column values to the dataset
• Transformation of labels into binomial classes
• Store the Training and testing data
in the Azure cloud storage
• Specify the categorical variables
by editing the metadata
• Convert the categorical variables
into dummy numerical variables
Data pre-processing in Azure ML
Filter-based feature selection
• Total number of features = 41
• Selected number of features = 15
• Method used = Pearson Correlation
Model Selection
• We need both accuracy and good response time!
• Evaluated different models on 10% data and then evaluated each of
them.
Model Accuracy (AUC)
Logistic Regression 0.995634
Boosted Decision Tree 0.999093
Neural Network 0.996295
Support Vector Machines 0.994526
Tune Model hyper parameters
• The model's hyper parameters are the settings and values you use
when configuring and testing the model, with the aim of finding the
best combination.
• You get an accuracy report describing the different models that
were created and their parameters, plus a trained model that you
can save for re-use.
Build System for
selected model
• Boosted Decision Tree – For
its high accuracy and good
response time
• Train the data 100% of the
training data
• Build and Deploy the model
as a web service
Place your screenshot here
Machine Learning as
a Service
• Frontend : HTML5, CSS3,
Bootstrap, jQuery
• Backend : Python Flask
• DEMO!
Thank you!!

Network_Intrusion_Detection_System_Team1

  • 1.
    Network Analytics : IntrusionDetection using Machine Learning
  • 2.
    Intrusion Detection System(IDS) •Combination of software and hardware that attempts to perform intrusion detection • Raise the alarm when possible intrusion or suspicious patterns are observed The Internet Attacker Internal Network Firewall IDS IDS
  • 3.
    Why we needIDS? • Unknown weakness or bugs • Complex, unforeseen attacks • Firewalls, security policies • Using information detected • Recover compromised system • Understand the attack mechanism • Detect novel attacks • Defend our systems
  • 4.
    Types of IDS Theseare the main types of Intrusion Detection Systems: • Host Based • Network Based • Stack Based • Signature Based • Anomaly Based
  • 5.
    KDD Cup 99Data Set • Modification of DARPA 1998 data set • DARPA 1998 data set • Managed by Lincoln Lab.(under DARPA sponsorship) • Simulated nine weeks of raw TCP dump data • Attacks • 38 different attacks against Unix/Linux machines • DoS, Scan, Buffer overflow and so on. • Normal traffic • 1000’s of virtual hosts and 100’s of user automata
  • 6.
    KDD Cup 99Data Set • Each connection ⇒ 41-dimensions vector • Samples 5,tcp,smtp,SF,959,337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00, 0.00,0.00,144,192,0.70,0.02,0.01,0.01,0.00,0.00,0.00,0.00,normal 0,tcp,http,SF,54540,8314,0,0,0,2,0,1,1,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.0 0,0.00,0.00,118,118,1.00,0.00,0.01,0.00,0.00,0.00,0.02,0.02,back. • Numerical: 34, Categorical: 7 • Basic feature: “duration”, “protocol”… • Statistical feature: “number of connections to the same host as the current connection in the past two seconds”… • Label ⇒ “normal” or “name of attacks”
  • 7.
    FLOW: Pre-processing of data inR Pre-processing of data in Azure ML Filter-based Feature Selection Model Selection Tune Model Parameters Build system for selected model Deploy the selected model Build website for ML as a Service
  • 8.
    Data pre-processing inR • Assign column values to the dataset • Transformation of labels into binomial classes
  • 9.
    • Store theTraining and testing data in the Azure cloud storage • Specify the categorical variables by editing the metadata • Convert the categorical variables into dummy numerical variables Data pre-processing in Azure ML
  • 10.
    Filter-based feature selection •Total number of features = 41 • Selected number of features = 15 • Method used = Pearson Correlation
  • 11.
    Model Selection • Weneed both accuracy and good response time! • Evaluated different models on 10% data and then evaluated each of them. Model Accuracy (AUC) Logistic Regression 0.995634 Boosted Decision Tree 0.999093 Neural Network 0.996295 Support Vector Machines 0.994526
  • 12.
    Tune Model hyperparameters • The model's hyper parameters are the settings and values you use when configuring and testing the model, with the aim of finding the best combination. • You get an accuracy report describing the different models that were created and their parameters, plus a trained model that you can save for re-use.
  • 13.
    Build System for selectedmodel • Boosted Decision Tree – For its high accuracy and good response time • Train the data 100% of the training data • Build and Deploy the model as a web service
  • 14.
    Place your screenshothere Machine Learning as a Service • Frontend : HTML5, CSS3, Bootstrap, jQuery • Backend : Python Flask • DEMO!
  • 15.