ISSN : 0975-4520
A USER-CENTRIC MACHINE LEARNING FRAMEWORK FOR CYBER
SECURITY OPERATIONS CENTER
A.ANJAIAH1, BUDIME SAI KUSULU2, GIRMANNAGARI ANURAGH RAO3,
MAHESH CHANDRA4
1
Associate Professor, Department of CSE, ST.PETER'S ENGINEERING COLLEGE, Hyderabad, Telangana,
India.
2, 3, 4
UG Scholars, Department of CSE, ST.PETER'S ENGINEERING COLLEGE, Hyderabad, Telangana, India.
ABSTRACT:
In order to ensure a company's Internet security, SIEM (Security Information and Event
Management) system is in place to simplify the various preventive technologies and flag alerts
for security events. Inspectors (SOC) investigate warnings to determine if this is true or not.
However, the number of warnings in general is wrong with the majority and is more than the
ability of SCO to handle all awareness. Because of this, malicious possibility. Attacks and
compromised hosts may be wrong. Machine learning is a possible approach to improving the
wrong positive rate and improving the productivity of SOC analysts. In this article, we create a
user-centric engineer learning framework for the Internet Safety Functional Center in the real
organizational context. We discuss regular data sources in SOC, their work flow, and how to
process this data and create an effective machine learning system. This article is aimed at two
groups of readers. The first group is intelligent researchers who have no knowledge of data
scientists or computer safety fields but who engineer should develop machine learning systems
for machine safety. The second groups of visitors are Internet security practitioners that have
deep knowledge and expertise in Cyber Security, but do Machine learning experiences do
not exist and I'd like to create one by themselves. At the end of the paper, we use the account as
an example to demonstrate full steps from data collection, label creation, feature engineering,
machine learning algorithm and sample performance evaluations using the computer built in the
SOC production of Seyondike.
Key words: GBM, boosting algorithm, Heart performance.
I INTRODUCTION Cyber security incidents will cause
significant financial and reputation
impacts on enterprise. In order to
Volume XII, Issue II, June 2021 http://ijte.uk/ 72
ISSN : 0975-4520
detect malicious activities, the SIEM will be escalated to tier 2 investigation
(Security Information and Event system (e.g., Co3 System) as severe
Management) system is built in security incidents for further
companies or government. The system investigation and remediation by
correlates event logs from endpoint, Incident Response Team. However,
firewalls, IDS/IPS (Intrusion SIEM typically generates a lot of the
Detection/Prevention System), DLP alerts, but with a very high false
(Data Loss Protection), DNS (Domain positive rate. The number of alerts per
Name System), DHCP (Dynamic Host day can be hundreds of thousands,
Configuration Protocol), much more than the capacity for the
Windows/Unix security events, VPN SOC to investigate all of them.
logs etc. The security events can be Because of this, SOC may choose to
grouped into different categories [1]. investigate only the alerts with high
The logs have terabytes of data each severity or suppress the same type of
day. From the security event logs, alerts. This could potentially miss
SOC (Security Operation Center) team some severe attacks. Consequently, a
develops so-called use cases with a more intelligent and automatic system
pre-determined severity based on the is required to identify risky users. The
analysts¶ experiences. They are machine learning system sits in the
typically rule based correlating one or middle of SOC work flow,
more indicators from different logs. incorporates different event logs,
These rules can be network/host based SIEM alerts and SOC analysis results
or time/frequency based. If any pre- and generates comprehensive user risk
defined use case is triggered, SIEM score for security operation center.
system will generate an alert in real Instead of directly digging into large
time. SOC analysts will then amount of SIEM alerts and trying to
investigate the alerts to decide whether find needle in a haystack, SOC
the user related to the alert is risky (a analysts can use the risk scores from
true positive) or not (false positive). If machine learning system to prioritize
they find the alerts to be suspicious their investigations, starting from the
from the analysis, SOC analysts will users with highest risks. This will
create OTRS (Open Source Ticket greatly improve their efficiency,
Request System) tickets. After initial optimize their job queue management,
investigation, certain OTRS tickets and ultimately enhance Specifically,
Volume XII, Issue II, June 2021 http://ijte.uk/ 73
ISSN : 0975-4520
our approach constructs a framework functions and associated devices, such
of usercentric machine learning as firewalls and intrusion detection
system to evaluate user risk based on
and prevention devices, deal mainly
alert information. This approach can
provide security analyst a with network level protection.
comprehensive risk score of a user and Although still part of the overall
security analyst can focus on those
security story, such an approach has
users with high risk scores. To the best
of our knowledge, there is no previous limitations in light of the new security
research on building a complete challenges described in the previous
systematic solution for this application.
section.
The main contribution of this paper is
as follows: x An advanced user-centric Data Analysis for Network Cyber-
machine learning system is proposed Security focuses on monitoring and
and evaluated by real industry data to analyzing network traffic data, with
evaluate user risks. The system can
effectively reduce the resources to the intention of preventing, or quickly
analyze alerts manually while at the identifying, malicious activity. Risk
same time enhance enterprise security. values were introduced in an
x A novel data engineering process is
information security management
offered which integrates alert
information, security logs, and SOC system (ISMS) and quantitative
analysts¶ investigation notes to evaluation was conducted for detailed
generate features and propagate labels
risk assessment. The quantitative
for machine learning models.
evaluation showed that the proposed
II EXISTING SYSTEM countermeasures could reduce risk to
some extent. Investigation into the
Most approaches to security in
cost-effectiveness of the proposed
the enterprise have focused on
countermeasures is an important
protecting the network infrastructure
future work.It provides users with
with no or little attention to end users.
attack information such as the type of
As a result, traditional security
Volume XII, Issue II, June 2021 http://ijte.uk/ 74
ISSN : 0975-4520
attack, frequency, and target host ID
and source host ID. Ten et al.
III PROPOSED SYSTEM
proposed a cyber-security framework
of the SCADA system as a critical User-centric cyber security helps
infrastructure using real-time enterprises reduce the risk associated
monitoring, anomaly detection, and with fast-evolving end-user realities
impact analysis with an attack tree- by reinforcing security closer to end
based methodology, and mitigation users. User-centric cyber security is
strategies not the same as user security. User-
centric cyber security is about
DISADVANTAGE: answering peoples’ needs in ways that
preserve the integrity of the enterprise
1. Firewalls can be difficult to network and its assets. User security
configure correctly. can almost seem like a matter of
2. Incorrectly configured firewalls protecting the network from the
may block users from user — securing it against
performing actions on the vulnerabilities that user needs
Internet, until the firewall introduce. User-centric security has
configured correctly. the greater value for enterprises.
3. Makes the system slower than cyber-security systems are real-time
before. and robust independent systems with
4. Need to keep updating the new high performances requirements. They
software in order to keep are used in many application domains,
security up to date. including critical infrastructures, such
5. Could be costly for average user. as the national power grid,
6. The user is the only constant transportation, medical, and defense.
Volume XII, Issue II, June 2021 http://ijte.uk/ 75
ISSN : 0975-4520
These applications require the 4) Minimizes computer freezing and
crashes.
attainment of stability, performance,
reliability, efficiency, and robustness, 5) Gives privacy to users
which require tight integration of 6) Securing the user-aware network
computing, communication, and edge
control technological systems. Critical 7) Securing mobile users’
infrastructures have always been the communications ‘
8) Managing user-centric security
target of criminals and are affected by
security threats because of their IV METHODOLOGY:
complexity and cyber-security CYBER ANALYSIS
connectivity. These CPSs face security Cyber threatanalysis is a
breaches when people, processes, process in which the knowledge of
technology, or other components are internal and external information
being attacked or risk management vulnerabilities pertinent to a particular
systems are missing, inadequate, or organization is matched against real-
fail in any way. The attackers target world cyber-attacks. With respect to
confidential data. Main scope of this cyber security, this threat-oriented
project in reduce the unwanted data approach to combating cyber-attacks
for the dataset. represents a smooth transition from a
ADVANTAGES: state of reactive security to a state of
1) Protects system against viruses, proactive one. Moreover, the desired
worms, spyware and other result of a threat assessment is to give
2) Protection against data from theft. best practices on how to maximize the
protective instruments with respect to
3) Protects the computer from being
hacked. availability, confidentiality and
integrity, without turning back to
Volume XII, Issue II, June 2021 http://ijte.uk/ 76
ISSN : 0975-4520
usability and functionality conditions. define the geo role for an attribute, To
CYPER ANALYSIS.A threat could be create an attribute with additional time
anything that leads to interruption, information, To replace a dataset
meddling or destruction of any object in the dashboard
valuable service or item existing in the DATA REDUCTION
firm’s repertoire. Whether of “human” Improve storage efficiency
or “nonhuman” origin, the analysis through data reduction techniques and
must scrutinize each element that may capacity optimization
bring about conceivable security risk. using datareduplication, compression,
DATASET MODIFICATION
snapshots and thin provisioning. Data
If a dataset in reduction via simply deleting
your dashboard contains many dataset unwanted or unneeded data is the most
objects, you can hide specific dataset effective way to reduce a
objects from display in the Datasets storing’s data
panel. For example, if you decide to
import a large amount of data from a
RISKY USER DETECTION
file, but do not remove every
False alarm immunity to
unwanted data column before
prevent customer embarrassment,
importing the data into Web, you can
High detection rate to protect all kinds
hide the unwanted attributes and
of goods from theft, Wide-exit
metrics, To hide dataset objects in the
coverage offers greater flexibility for
Datasets panel, To show hidden
entrance/exit layouts, Wide range of
objects in the Datasets panel, To
attractive designs complement any
rename a dataset object, To create a
store décor, Sophisticated digital
metric based on an attribute, To create
controller technology for optimum
an attribute based on a metric, To
system performance
Volume XII, Issue II, June 2021 http://ijte.uk/ 77
ISSN : 0975-4520
ALGORITHM: than the observations themselves. The
SUPPORT VECTOR MACHINE(SVM)
inner product between two vectors is
the sum of the multiplication of each
“Support Vector Machine”
pair of input values. For example, the
(SVM) is a supervised machine
inner product of the vectors [2, 3] and
learning algorithm which can be used
[5, 6] is 2*5 + 3*6 or 28. The equation
for both classification or regression
for making a prediction for a new
challenges. However, it is mostly used
input using the dot product between
in classification problems. In this
the input (x) and each support vector
algorithm, we plot each data item as a
(xi) is calculated as follows:
point in n-dimensional space (where n
is number of features you have) with
f(x) = B0 + sum(ai * (x,xi))
the value of each feature being the
This is an equation that involves
value of a particular coordinate. Then,
calculating the inner products of a new
we perform classification by finding
input vector (x) with all support
the hyper-plane that differentiate the
vectors in training data. The
two classes very well (look at the
coefficients B0 and ai (for each input)
below snapshot). The SVM algorithm
must be estimated from the training
is implemented in practice using a
data by the learning algorithm.
kernel. The learning of the hyperplane
ARCHITECTURE
in linear SVM is done by transforming
the problem using some linear algebra,
which is out of the scope of this
introduction to SVM. A powerful
insight is that the linear SVM can be
rephrased using the inner product of
any two given observations, rather
Volume XII, Issue II, June 2021 http://ijte.uk/ 78
ISSN : 0975-4520
rule-based system. To improve the
detection precision situation, we will
examine other learning methods to
improve the data acquisition, daily
model renewal, real time estimate,
Fig.4.1. Home page. fully enhance and organizational risk
V CONCLUSION detection and management. As for
future work, let's examine other
We provide a user-centered computer
learning methods to improve detection
learning system that affects large data
accuracy.
from various security logs, awareness
REFERENCES
information, and inspector intelligence.
This method provides complete [1] SANS Technology Institute. “The 6
Categories of Critical Log Information.”2013.
configuration and solution for [2] X.Li and B.Lui “Learning to classify text
using positive and unlabeled data”, Proceedings
dangerous user detection for the of the 18th international joint conference on
Enterprise System Operating Center. Artificial intelligence, 2003
[3] A. L. Buczak and E. Guven. “A survey of
Select machine learning methods in data mining and machine learning methods for
cyber security intrusion detection”, IEEE
the SOC product environment, Communications Surveys & Tutorials 18.2
evaluate efficiency, IO, host and users (2015): 1153-1176.
[4] S. Choudhury and A. Bhowal. “Comparative
to create user-centric features. . Even analysis of machine learning algorithms along
with classifiers for network intrusion detection”,
with simple mechanical learning Smart Technologies and Management for
algorithms, we prove that the learning Computing, Communication, Controls, Energy
and Materials (ICSTM), 2015.
system can understand more insights [5] Prasadu Peddi (2019), “AN EFFICIENT
ANALYSIS OF STOCKS DATA USING
from the rankings with the most MapReduce”, ISSN: 1320-0682, Vol 6, issue 1,
unbalanced and limited labels. More pp:22-34.
[6] K. Goeschel. “Reducing false positives in
than 20% of the neurological model of intrusion detection systems using data-mining
techniques utilizing support vector machines,
modeling is 5 times that of the current
Volume XII, Issue II, June 2021 http://ijte.uk/ 79
ISSN : 0975-4520
decision trees, and naive Bayes for off-line
analysis”, SoutheastCon, 2016.
[7] M. J. Kang and J. W. Kang. “A novel
intrusion detection method using deep neural
network for in-vehicle network security”,
Vehicular Technology Conference, 2016.
[8] Prasadu Peddi (2016), Comparative study on
cloud optimized resource and prediction using
machine learning algorithm, ISSN: 2455-6300,
volume 1, issue 3, pp: 88-94.
Volume XII, Issue II, June 2021 http://ijte.uk/ 80