KEMBAR78
RTRP Project Documentation Format-2024 (AutoRecovered) | PDF | Software Testing | Support Vector Machine
0% found this document useful (0 votes)
85 views62 pages

RTRP Project Documentation Format-2024 (AutoRecovered)

The document is a project report titled 'Behavioral Biometrics Deciphered: Unveiling Identity with Machine Learning Techniques,' submitted by students of St. Martin's Engineering College for their Bachelor of Technology in Information Technology. It explores the use of behavioral biometrics for identification, highlighting its advantages over traditional methods like passwords and physical tokens, and discusses the development of a neural network-based sentence classifier to analyze suicidal tendencies. The report includes acknowledgments, an abstract, literature survey, and various chapters detailing the project's methodology and findings.

Uploaded by

featureswag83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views62 pages

RTRP Project Documentation Format-2024 (AutoRecovered)

The document is a project report titled 'Behavioral Biometrics Deciphered: Unveiling Identity with Machine Learning Techniques,' submitted by students of St. Martin's Engineering College for their Bachelor of Technology in Information Technology. It explores the use of behavioral biometrics for identification, highlighting its advantages over traditional methods like passwords and physical tokens, and discusses the development of a neural network-based sentence classifier to analyze suicidal tendencies. The report includes acknowledgments, an abstract, literature survey, and various chapters detailing the project's methodology and findings.

Uploaded by

featureswag83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

A

Real-time Research Project/Field-Based Research


Project Report on

BEHAVIORAL BIOMETRIC DECIPHERED :


UNVEILING IDENTITY WITH MACHINE LEARNING
TECHNIQUES
Submitted for partial fulfilment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY

in

INFORMATION
TECHNOLOGY

by
N. VAISHNAVI 22K81A1247
J. SHRUTHI 22K81A1227
S. SAI PRANEETH 22K81A1255
K. AJAY KUMAR 22K81A1232
1
Under the Guidance of
Dr. A.BHEEM RAJ

PROFESSOR

DEPARTMENT OF INFORMATION TECHNOLOGY

St. MARTIN'S ENGINEERING COLLEGE


UGC Autonomous
Affiliated to JNTUH, Approved by AICTE ,
Accredited by NBA & NAAC A+, ISO 9001:2008 Certified
Dhulapally, Secunderabad – 500100
www.smec.ac.in
JULY - 2024
St. MARTIN'S ENGINEERING COLLEGE
UGC Autonomous
Affiliated to JNTUH, Approved by AICTE ,
Accredited by NBA & NAAC A+, ISO 9001:2008 Certified
Dhulapally, Secunderabad - 500100
www.smec.ac.in

CERTIFICATE

This is to certify that the project entitled “behavioral biometrics deciphered: unveiling
with machine learning techiniques” is being submitted by
N.VAISHNAVI(22K81A1247), J.SHRUTHI (22K81A1227), S.SAI PRANEETH
(22K81A1255), K.AJAY KUMAR (22K81A1232) in fulfilment of the requirement for the
award of degree of BACHELOR OF TECHNOLOGY in << INFORMATION
TECHNOLOGY>> is recorded of bonafide work carried out by them. The result embodied
in this report have been verified and found satisfactory.

Internal Guide Head of theDepartment


Dr. A.BHEEM RAJ Dr.VK. Senthil Raghavan
Professor Professor and Head of Department
Department of department of
INFORMATION TECNOLOGY INFORMATION TECNOLOGY

Place: Dhulapally, Secunderabad


Date:
St. MARTIN'S ENGINEERING COLLEGE
UGC Autonomous
Affiliated to JNTUH, Approved by AICTE ,
Accredited by NBA & NAAC A+, ISO 9001:2008 Certified
DhulapallY Secunderabad - 500100
www.smec.ac.in

DEPARTMENT OF
INFORMATION TECHNOLOGY

DECLARATION

We, the students of „Bachelor of Technology in Department of INFORMATION


TECHNOLOGY’, session: 2023-2024, St. Martin’s Engineering College, Dhulapally,
Kompally, Secunderabad, hereby declare that the work presented in this project work
entitled behavioral biometrics deciphered: unveiling with machine learning techiniques
outcome of our own bonafide work and is correct to the best of our knowledge and this work
has been undertaken taking care of Engineering Ethics. This result embodied in this
project report has not been submitted in any university for award of any degree.

N. VAISHNAVI 22K81A1247

J. SHRUTHI 22K81A1227

S. SAI PRANEET 22K81A1255

K. AJAY KUMAR 22K81A1232


ACKNOWLEDGEMENT

The satisfaction and euphoria that accompanies the successful completion of


any task would be incomplete without the mention of the people who made it possible
and whose encouragement and guidance have crowded our efforts with success.

First and foremost, we would like to express our deep sense of gratitude
and indebtedness to our College Management for their kind support and permission to
use the facilities available in the Institute.

We especially would like to express our deep sense of gratitude andindebtedness


to Dr. P. SANTOSH KUMAR PATRA, Professor and Group Director, St
St. Martin‟s Engineering College, Dhulapally, for permitting us to undertake this project.

We wish to record our profound gratitude to Dr. M. SREENIVAS RAO, Principal,


St.Martin‟s Engineering College, for his motivation and encouragement.

We are also thankful to Dr.VK. Senthil Raghavan, Head of the Department,


Department of INFORMATION TECHNOLOGY,St. Martin‟s Engineering
College, Dhulapally, Secunderabad, for his support and guidance throughout our project
as well as Project Coordinator Dr.A.BHEEM RAJ, Professor,
Department of INFORMATION TECHNOLOGY for his valuablesupport.
We would like to express our sincere gratitude and indebtedness to our project
supervisor Mr.G.Sathish Assistant professor, DepartmenInformation Technology,
St. Martins Engineering College, Dhulapally, for his support and guidance
throughout our project.
Finally, we express thanks to all those who have helped us successfully
completing this project. Furthermore, we would like to thank our family and
friends for the immoral support and encouragement. We express thanks to all
those who have helped us in successfully completing the project.
N. VAISHNAVI 22K81A1247

J. SHRUTHI 22K81A1227

S. SAI PRANEETH 22K81A1255

K. AJAI KUMAR 22K81A1232


I
ABSTRACT

This project describes content analysis of text with to identify suicidal


tendencies and types. This article also describes how to make a sentence classifier that
uses a neural network created using various libraries created for machine learning in
the Python programming language. Attention is paid to the problem of teenage suicide
and «groups of death» in social networks, the search for ways to stop the propaganda
of suicide among minors. Analysis of existing information about so-called groups of
death and its distribution on the Internet.

II
LIST OF FIGURES

Figure No. Figure Title Page No.

4.3 Architecture diagram 11

6.1 User registration from 47

6.2 Admin login 47

6.3 Admin login page 47

6.4 Admin home 48

6.5 User details 48

6.6 Home page 48

6.7 Active user 48

6.8 Train model 49

6.9 Prediction form 50

6.10 Result 50

III
LIST OF TABLES

Table No. Table Name Page No.


4.1 Data Set View 07

IV
LIST OF ACRONYMS AND DEFINITIONS

S.No. Acronym Definition

1. DNN Deep Neural Network

2. LSTM Long Short-Term Memory

3. MDP Markov Decision Process

4. NB Naïve Bayes

5. PTSD Post-traumatic Stress Disorder


6. RF Random Forest

7. RNN Recurrent Neural Network

8. SVM Support Vector Machine

9. UML Unified Modelling Language

V
CONTENTS

ACKNOWLEDGEMENT I
ABSTRACT II
LIST OF FIGURES III
LIST OF TABLES IV
LIST OF ACRONYMS AND DEFINITIONS V
CHAPTER 1 INTRODUCTION 01
CHAPTER 2 LITERATURE SURVEY 02
CHAPTER 3 SYSTEM ANALYSIS AND DESIGN 05
3.1 Existing System 05
3.2 Proposed System 05
CHAPTER 4 SYSTEM REQUIREMENTS & SPECIFICATIONS 07
4.1 Database 07
4.2 DNN & SVM Algorithm 09
4.3 Design 11
4.3.1 System Architecture 11
4.3.2 Architecture diagrams 11
4.3.3 input and output design 16
4.4 Modules 18
4.4.1 Modules Description 18
4.5 System Requirements 20
4.5.1 Hardware Requirements 20
4.5.2 Software Requirements 20
4.5.3 System study 21
4.6 Testing 23
4.6.1 Unit Testing 23
4.6.2 Integration Testing 23
4.6.3 Functional Testing 24
4.6.4 System Testing 24
4.6.5 White Box Testing 24
4.6.6 Black Box Testing 25
4.6.7 Unit Testing 25
4.6.8 Integration Testing 25
4.6.9 Acceptance Testing 26
CHAPTER 5 SOURCE CODE 28
CHAPTER 6 EXPERIMENTAL RESULTS 47
CHAPTER 7 CONCLUSION & FUTURE ENHANCEMENT 51
7.1 CONCLUSION 51
7.2 FUTURE ENHANCEMENT 51
REFERENCES 52
Patent/Publication
CHAPTER 1

INTRODUCTION

Identification has been a highly-valued research topic, as it involves every aspect of


human life, from national security to unlocking personal accounts. At present, the main
methods of identification are passwords and physical tokens like fingerprints and iris.
However, with the development of artificial intelligence and machine learning, behavioral
biometrics emerge as an alternative way of identification, and some voices claim that it
outweighs the traditional methods. Nevertheless, it is clear that passwords and physical
tokens “can easily be stolen or duplicated” and therefore are not the best way to ensure
security. Despite fingerprints and iris “can sometimes be difficult to use”, they do have
concrete merits. First, the datasets can be easily obtained by scanning a fingertip or an iris.
Second, this data-collecting process usually takes less than a minute. Third, the sizes of
the data are usually small enough for normal calculation and storage. These advantages
together make fingerprints and iris the most widely used security measures across the
globe. Compared to them, behavioral biometrics have not gained many advantages, let
alone that some people may not wear smartwatches or put their phones in their pockets.
One feature of behavioral biometrics is still notable though – it can carry out the
identification processes through subjects' movements without many environmental
requirements, and therefore make the process more convenient. Comprehensively
speaking, to make this identification method stand out among its peers, we need to
simplify the data collected, shorten the training time, and improve the accuracy of the
model. Therefore, this essay will touch on these topics by proposing a new model used for
behavioral biometrics.

~1~
CHAPTER 2

LITERATURE SURVEY

‘Smartphone and Smartwatch-Based Biometrics Using Activities of


Daily Living'
1)AUTHORS: Weiss, G., Yoneda, K. and Hayajneh,T:
Smartphones and smartwatches, which include powerful sensors, provide a readily
available platform for implementing and deploying mobile motion-based behavioral
biometrics. However, the few studies that utilize these commercial devices for motion-
based biometrics are quite limited in terms of the sensors and physical activities that they
evaluate. In many such studies, only the smartwatch accelerometer is utilized and only one
physical activity, walking, is investigated. In this study we consider the accelerometer and
gyroscope sensor on both the smartphone and smartwatch, and determine which
combination of sensors performs best. Furthermore, eighteen diverse activities of daily
living are evaluated for their biometric efficacy and, unlike most other studies, biometric
identification is evaluated in addition to biometric authentication. The results presented in
this article show that motion-based biometrics using smartphones and/or smartwatches
yield good results, and that these results hold for the eighteen activities. This suggests that
zero-effort continuous biometrics based on normal activities of daily living is feasible, and
also demonstrates that certain easy-to-perform activities, such as clapping, may be a viable
alternative (or supplement) to gait-based biometrics.

“Sensor-Based Continuous Authentication of Smartphones' Users Using


Behavioral Biometrics:,”
2)AUTHORS:Abuhamad,M.,Abusnaina,A., Nyang, D. and Mohaisen, D.

Mobile devices and technologies have become increasingly popular, offering comparable storage
and computational capabilities to desktop computers allowing users to store and interact with
sensitive and private information. The security and protection of such personal information are
becoming more and more important since mobile devices are vulnerable to unauthorized access or
theft. User authentication is a task of paramount importance that grants access to legitimate users
at the point-of-entry and continuously through the usage session. This task is made possible with
today's smartphones' embedded sensors that enable continuous and implicit user authentication by
capturing behavioral biometrics and traits. In this paper, we survey more than 140 recent
behavioral biometric-based approaches for continuous user authentication, including motion-based
methods (28 studies), gait-based methods (19 studies), keystroke dynamics-based methods (20
studies), touch gesture-based methods (29 studies), voice-based methods (16 studies), and
multimodal-based methods (34 studies). The survey provides an overview of the current state-of-
the-art approaches for continuous user authentication using behavioral biometrics captured by
smartphones' embedded sensors, including insights and open challenges for adoption, usability,
and performance.

~2~
A Systematic Review on Gait Based Authentication System
3)AUTHORS: Divya, R. and Lavanya, R.
Bio-metric frameworks are getting to be progressively important, since they are more
reliable and proficient for identity confirmation. One such biometric is gait. The pattern by
which an individual walks is mentioned as gait. It's a locomotion that's achieved through
the movement of a person's limb. Unlike several approaches gait is a behavioral biometric,
that is taken into consideration for user authentication as it shows distinct patterns for
every individual. Also, less obtrusion of user has made this biometric method to be more
advantageous compared to others. During this survey we tend to concentrate on varied gait
approaches, applications and various machine learning techniques which will be used for
classification of gait features and its applications.

“Gait Authentication and Identification Using Wearable


Accelerometer Sensor”
4) AUTHORS: Gafurov, D., Snekkenes, E. and Bours, P.
This paper describes gait recognition using a body worn sensor. An accelerometer sensor
(placed in the trousers pocket) is used for collecting gait features. From the acceleration
signal of the person, cycles have been detected and analysed for recognition. We have
applied four different methods (absolute distance, correlation, histogram, and higher order
moments) to evaluate performance of the system both in authentication and identification
modes. Our data set consists of 300 gait sequences collected from 50 subjects. Absolute
distance metric has shown the best performance in terms of EER, which is equal to 7.3%
(recognition rate is 86.3%). Furthermore, we have also analysed recognition performance
when subjects were carrying a backpack.
5) “Identifying users of portable devices from gait pattern with accelerometers”
AUTHORS: Mantyjarvi, J., Lindholm, M., Vildjiounaite, E., Makela, S.
Identifying users of portable devices from gait signals acquired with three-dimensional
accelerometers was studied. Three approaches, correlation, frequency domain and data
distribution statistics, were used. Test subjects (N=36) walked

~3~
“Identifying users of portable devices from gait pattern with
accelerometers”

5) AUTHORS: Mantyjarvi, J., Lindholm, M., Vildjiounaite, E., Makela, S

Identifying users of portable devices from gait signals acquired with three-dimensional
accelerometers was studied. Three approaches, correlation, frequency domain and data
distribution statistics, were used. Test subjects (N=36) walked with fast,normal and slow
walking speeds in enrolment and test sessions on separate days wearing the
accelerometer device on their belt, at back. It was shown to be possible to identify users
with this novel gait recognition method. Best equal error rate (EER=7%) was achieved
with the signal correlation method, while the frequency domain method and two
variations of the data distribution statistics method produced EER of 10%, 18% and
19%, respectively.

~4~
CHAPTER 3

SYSTEM ANALYSIS AND DESIGN

3.1 EXISTING SYSTEM:

Related works on privacy-preserving biometric identification are provided in this section.


Recently, some efficient biometric identification schemes have been proposed. proposed a
privacy-preserving face recognition scheme. Specifically, a face recognition method is
designed by measuring the similarity between sorted index numbers vectors. a privacy
preserving biometric matching protocol for iris codes verification. In their protocol, it is
computationally infeasible for a malicious user to impersonate as an honest user

DISADVANTAGES OF EXISTING SYSTEM:


The system doesn’t implement Biometric Identification Scheme.
There is no an affective privacy preserving encryption techniques in this
system.

3.2 PROPOSED SYSTEM:we use the WISDM datasets posted on UCI


Machine Learning Repository. These datasets include 18 distinct activities
performed by 51 subjects and recorded by phone accelerometer, phone
gyroscope, watch accelerometer, and watch gyroscope respectively. The data
takes the form of subject labels, timestamps, activity labels, and three-
dimensional coordinates recorded at a given time Principle Components
Analysis (PCA) is a well-known method used to reduce the dimensionality of a
large dataset. The general principle of this method is to extract the main
information from the data, so that only part of the original dataset needs to be
analyzed. This can sometimes significantly reduce the time complexity of a
model since PCA essentially shrinks the dataset into a smaller one and
excludes extraneous variables that may misguide the result. However, it also
leads to an inevitable loss in accuracy, as some important information might be
filtered out during the process.Therefore , choosing a suitable dimension for
PCA means finding a balance between time complexity and accuracy of a
model.

~5~
ADVANTAGES OF PROPOSED SYSTEM:

1.SVM and DNN show greater capability in the experiment.


2.These parameter settings contribute to the overall accuracy of the DNN model and help it
reach an accuracy of 95.06%..

Algorithms: SVM, Behavioral Biometrics; identification; machine learning; phone


accelerometer.

~6~
CHAPTER 4

SYSTEM REQUIREMENTS AND SPECIFICATIONS

4.1DATABASE:

Unveiling identity with machine learning techniques generally involves using


databases that contain labeled data relevant to identity features. These databases can
include various types of data, such as:

 Biometric Data: This includes fingerprints, facial recognition data, iris scans, and

voice recognition data.


Examples:

Labeled Faces in the Wild (LFW): A database for studying the problem
unconstrained face recognition.

CASIA Iris Database: Contains iris images for biometric research.

 Textual Data: This can be used for natural language processing tasks like
authorship identification or text-based identity verification.

Examples:

Enron Email Dataset: A large set of emails for NLP tasks.

PAN Author Identification datasets: Used for evaluating author identification


systems.

 Behavioral Data: Data capturing user behavior, such as typing patterns, mouse
movements, and smartphone usage patterns.

Examples:

MIT Mouse Dynamics Challenge Dataset: Contains mouse movement data for
behavioral biometrics research.

~7~
Aalto University Dataset for Mobile Behavioral Biometrics: Contain smartphone usage
data.

 Demographic Data: Includes data about individuals' age, gender, location, etc.,
often used in conjunction with other data types for identity verification.
Examples:

UCI Adult Dataset: Contains demographic information for machine learning tasks.

 Multimodal Data: Combines multiple data types, such as video, audio, and text,
to improve the accuracy of identity verification systems.

Examples:

Chalearn LAP Constrained Face Dataset (CFD): Contains multimodal data


including RGB, depth, and thermal images.

DATABASE NAME DESCRIPTION DATA TYPE LINK

Aalto University Smartphone usage


Smartphone Dataset patterns for Behavioral Data Aalto Dataset
behavioral
biometrics.

BIOMDATA Multimodal data


including face, voice,
Face, Voice, Iris BIOMDATA
and iris for
multimodal biometric
research.

CASIA Iris Dataset Iris images for Iris Images CASIA Iris
biometric research.

Mouse Dynamics Mouse movement


Challenge Dataset data for behavioral Behavioral Data Aalto Dataset
biometric research.

~8~
4.2 DNN & SVM ALGORITHUM:

Deep Neural Networks (DNNs):


Deep Neural Networks (DNNs) are artificial neural networks with multiple hidden
layers between the input and output layers. They can model complex, non-linear
relationships and are capable of learning hierarchical representations from the data.
Key Components

1. Input Layer:
o The initial layer that receives the input data.
2. Hidden Layers:
o Multiple layers where each neuron applies a weighted sum of its inputs followed
by a non-linear activation function (like ReLU, sigmoid, or tanh).
3. Output Layer:
o The final layer that produces the prediction or classification. In classification
tasks, the softmax activation function is commonly used to output probabilities.

Training Process

1. Forward Propagation:
o Input data is passed through the network, layer by layer, with each layer
transforming the data using weights and activation functions.
2. Loss Calculation:
o The network's output is compared to the true labels using a loss function (e.g.,
cross-entropy for classification tasks).
3. Backpropagation:
o Gradients of the loss with respect to each weight are calculated using the chain
rule, and weights are updated using optimization algorithms like stochastic
gradient descent (SGD).
4. Iteration:
o The process is repeated for many epochs until the model converges to an optimal
set of weights.

Support Vector Machines (SVMs):

Support Vector Machines (SVMs) are supervised learning models used for
classification and regression tasks. SVMs work by finding the hyperplane that best
separates the classes in the feature space.
Key Concepts

1. Hyperplane:
o A decision boundary that separates different classes in the feature space. The
optimal hyperplane maximizes the margin between the closest points of the
classes (support vectors).
2. Support Vectors:
o The data points closest to the hyperplane, which are critical in defining the
position and orientation of the hyperplane.
~9~
3. Kernel Trick:
o SVMs can be extended to non-linear classification using kernel functions (e.g.,
linear, polynomial, RBF). The kernel trick maps the input features into higher-
dimensional space where a linear separator is possible.

Training Process

1. Define Objective:
o The goal is to find the hyperplane that maximizes the margin between classes.
This involves solving a quadratic optimization problem.
2. Optimization:
o Use optimization techniques to find the support vectors and the optimal
hyperplane.
3. Prediction:
o For a new data point, the SVM predicts the class based on which side of the
hyperplane it falls.

Using DNN and SVM for Identity Unveiling

Combining DNNs and SVMs

In some scenarios, combining the strengths of DNNs and SVMs can yield better results.
For example:

1. Feature Extraction with DNNs:


o Use a DNN to extract high-level features from raw data (e.g., images).
2. Classification with SVMs:
o Use the extracted features as input to an SVM for classification. This approach
leverages the representation learning capability of DNNs and the effective
classification performance of SVMs.

Example Workflow for Identity Unveiling

1. Data Preprocessing:
o Collect and preprocess the data (e.g., images, text) to be used for identity
recognition.
2. Feature Extraction:
o Train a DNN to extract meaningful features from the data. For images, this could
involve using convolutional layers followed by fully connected layers.
3. Feature Selection:
o Select the most relevant features from the DNN output, potentially reducing
dimensionality with techniques like PCA.
4. Training SVM:
o Train an SVM using the extracted features to classify identities.
5. Evaluation and Tuning:
o Evaluate the model's performance using metrics like accuracy, precision, recall,
and F1-score. Tune hyperparameters for both the DNN and SVM to improve
performance.

~ 10 ~
4.3 DESIGN

4.3.1 SYSTEM ARCHITECTURE:

4.3.2 ARCHITECTURE DIAGRAM:

1. The DFD is also called as bubble chart. It is a simple graphical formalism that can
be used to represent a system in terms of input data to the system, various
processing carried out on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is
used to model the system components. These components are the system process,
the data used by the process, an external entity that interacts with the system and
the information flows in the system.
3. DFD shows how the information moves through the system and how it is modified
by a series of transformations. It is a graphical technique that depicts information
flow and the transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at
any level of abstraction. DFD may be partitioned into levels that represent
increasing information flow and functional detail.

~ 11 ~
UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-


purpose modeling language in the field of object-oriented software engineering. The
standard is managed, and was created by, the Object Management Group.

The goal is for UML to become a common language for creating models of object
oriented computer software. In its current form UML is comprised of two major
components: a Meta-model and a notation. In the future, some form of method or process
may also be added to; or associated with, UML.

The Unified Modeling Language is a standard language for specifying, Visualization,


Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems.

The UML represents a collection of best engineering practices that have proven successful
in the modeling of large and complex systems.

The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.

~ 12 ~
GOALS:

The Primary goals in the design of the UML are as follows:


1. Provide users a ready-to-use, expressive visual modeling Language so that they
can develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
USE CASE DIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type of
behavioral diagram defined by and created from a Use-case analysis. Its purpose is to
present a graphical overview of the functionality provided by a system in terms of actors,
their goals (represented as use cases), and any dependencies between those use cases. The
main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted.

~ 13 ~
CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language (UML) is a
type of static structure diagram that describes the structure of a system by showing the
system's classes, their attributes, operations (or methods), and the relationships among the
classes. It explains which class contains information.

SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction
diagram that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams.

~ 14 ~
ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities and
actions with support for choice, iteration and concurrency. In the Unified Modeling
Language, activity diagrams can be used to describe the business and operational step-by-
step workflows of components in a system. An activity diagram shows the overall flow of
control.

~ 15 ~
4.3.3 INPUT AND OUTPUT DESIGN:

INPUT DESIGN
The input design is the link between the information system and the user. It
comprises the developing specification and procedures for data preparation and those steps
are necessary to put transaction data in to a usable form for processing can be achieved by
inspecting the computer to read data from a written or printed document or it can occur by
having people keying the data directly into the system. The design of input focuses on
controlling the amount of input required, controlling the errors, avoiding delay, avoiding
extra steps and keeping the process simple. The input is designed in such a way so that it
provides security and ease of use with retaining the privacy. Input Design considered the
following things:

 What data should be given as input?


 How the data should be arranged or coded?
 The dialog to guide the operating personnel in providing input.
 Methods for preparing input validations and steps to follow when error occur.

OBJECTIVES
1.Input Design is the process of converting a user-oriented description of the
input into a computer-based system. This design is important to avoid errors in the data
input process and show the correct direction to the management for getting correct
information from the computerized system.
2. It is achieved by creating user-friendly screens for the data entry to handle
large volume of data. The goal of designing input is to make data entry easier and to be
free from errors. The data entry screen is designed in such a way that all the data
manipulates can be performed. It also provides record viewing facilities.
3.When the data is entered it will check for its validity. Data can be entered
with the help of screens. Appropriate messages are provided as when needed so that the
user will not be in maize of instant. Thus the objective of input design is to create an input
layout that is easy to follow

OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user and
presents the information clearly. In any system results of processing are communicated to
the users and to other system through outputs. In output design it is determined how the
information is to be displaced for immediate need and also the hard copy output. It is the
most important and direct
source information to the user. Efficient and intelligent output design improves the
system’s relationship to help user decision-making.
1. Designing computer output should proceed in an organized, well thought
out manner; the right output must be developed while ensuring that each output element is
designed so that people will find the system can use easily and effectively. When analysis
design computer output, they should Identify the specific output that is needed to meet the
requirements.

2.Select methods for presenting information.

~ 16 ~
3.Create document, report, or other formats that contain information
produced by the system.

The output form of an information system should accomplish one or more of the
following objectives.

 Convey information about past activities, current status or projections of the


 Future.
 Signal important events, opportunities, problems, or warnings.
 Trigger an action.
 Confirm an action.

~ 17 ~
4.4 MODULES:

 User

 Admin
 Data Preprocessing
 Machine Learning
MODULES DESCRIPTION:

User:
The User can register first. While registering he required a valid user email and mobile for
further communications. Once the user register then admin can activate the user. Once
admin activated the user then user can login into our system. User can upload the dataset
based on our dataset column matched. For algorithm execution data must be in int or float
format. Here we took
Adacel Technologies Limited dataset for testing purpose. User can also add the new data
for existing dataset based on our Django application. User can click the Data Preparations
in the web page so that the data cleaning process will be starts. The cleaned data and its
required graph will be displayed.

Admin:
Admin can login with his login details. Admin can activate the registered users. Once he
activate then only the user can login into our system. Admin can view Users and he can
view overall data in the browser and he load the data. Admin can view the training data
list and test data list. Admin can load the data and view forecast results.

Data Preprocessing:
A dataset can be viewed as a collection of data objects, which are often also called as a
Sentiment tweets like positive, negative and neutral. Data objects are described by a
number of features that capture the basic characteristics of an object, such as the mass of a
physical object or the time at which an event occurred, etc.

Features are often called as variables, characteristics, fields, attributes, or dimensions

~ 18 ~
The study is based on a pipeline that involves preprocessing, sentiment analysis, topic
modeling, natural language processing and statistical analysis of Twitter data extracted in
the form of tweets. We use Tweets and Sentiment amount of data.

Machine learning:
Based on the split criterion, the cleaned data is split into 80% training and 20% test, then
the dataset is subjected to one machine learning classifier such as Natural Language
Process(NLP). Sentiment analysis by fine tuning auto encoding models like BERT and
ALBERT to achieve a comprehensive understanding of public sentiment. Thus, we have
analyzed the results of our experiment and methodology using the contextual information
and verified the insights.

~ 19 ~
4.5 SYSTEM SPECIFICATION

4.1.5 HARDWARE REQUIREMENTS:

 System :Intel i3

 Hard Disk :1 TB.

 Monitor :14’ Colour Monitor.

 Mouse :Optical Mouse.

 Ram : 4GB.

SOFTWARE REQUIREMENTS:

 Operating system : Windows 10.

 Coding Language : Python.

 Front-End : Html. CSS

 Designing : Html,css,javascript.

 Data Base : SQLite.

~ 20 ~
4.5.3 SYSTEM STUDY:

FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is
put forth with a very general plan for the project and some cost estimates. During
system analysis the feasibility study of the proposed system is to be carried out. This is
to ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are, 

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system will have
on the organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus the
developed system as well within the budget and this was achieved because most of the
technologies used are freely available. Only the customized products had to be purchased .

TECHNICAL FEASIBILITY

This study is carried out system. Any system developed must not have a high demand on
the available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed
system must have a modest requirement, as only minimal or null changes are required for

implementing this system.

~ 21 ~
SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user must
not feel threatened by the system, instead must accept it as a necessity. The level of
acceptance by the users solely depends on the methods that are employed to educate the
user about the system and to make him familiar with it. His level of confidence must be
raised so that he is also able to make some constructive criticism, which is welcomed, as
he is the final user of the system.

~ 22 ~
4.6 TESTING
SYSTEM TESTING:

The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the Software system meets
its requirements and user expectations and does not fail in an unacceptable manner. There
are various types of test. Each test type addresses a specific testing requirement.

TYPES OF TESTS

4.6.1 Unit testing


Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid outputs. All
decision branches and internal code flow should be validated. It is the testing of individual
software units of the application .it is done after the completion of an individual unit
before integration. This is a structural testing, that relies on knowledge of its construction
and is invasive. Unit tests perform basic tests at component level and test a specific
business process, application, and/or system configuration. Unit tests ensure that each
unique path of a business process performs accurately to the documented specifications
and contains clearly defined inputs and expected results.

4.6.2 Integration testing


Integration tests are designed to test integrated software components
to determine if they actually run as one program. Testing is event driven and is more
concerned with the basic outcome of screens or fields. Integration tests demonstrate that
although the components were individually satisfaction, as shown by successfully unit
testing, the combination of components is correct and consistent. Integration testing is
specifically aimed at exposing the problems that arise from the combination of
components.

~ 23 ~
4.6.3 Functional test

Functional tests provide systematic demonstrations that functions tested are


available as specified by the business and technical requirements, system documentation,
and user manuals.

Functional testing is centered on the following items:


Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements,


key functions, or special test cases. In addition, systematic coverage pertaining to identify
Business process flows; data fields, predefined processes, and successive processes must
be considered for testing. Before functional testing is complete, additional tests are
identified and the effective value of current tests is determined.

4.6.4 System Test


System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An example
of system testing is the configuration oriented system integration test. System testing is
based on process descriptions and flows, emphasizing pre-driven process links and
integration points.

4.6.5 White Box Testing


White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at least its
purpose. It is purpose. It is used to test areas that cannot be reached from a black box
level.

~ 24 ~
4.6.6 Black Box Testing
Black Box Testing is testing the software without any knowledge of the
inner workings, structure or language of the module being tested. Black box tests, as most
other kinds of tests, must be written from a definitive source document, such as
specification or requirements document, such as specification or requirements document.
It is a testing in which the software under test is treated, as a black box .you cannot “see”
into it. The test provides inputs and responds to outputs without considering how the
software works.
4.6.7 Unit Testing
Unit testing is usually conducted as part of a combined code and unit test
phase of the software lifecycle, although it is not uncommon for coding and unit testing to
be conducted as two distinct phases.
Test strategy and approach
Field testing will be performed manually and functional tests will be
written in detail.
Test objectives

 All field entries must work properly.


 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed.

Features to be tested

 Verify that the entries are of the correct format


 No duplicate entries should be allowed
 All links should take the user to the correct page.
4.6.8 Integration Testing
Software integration testing is the incremental integration testing of two or
more integrated software components on a single platform to produce failures caused by
interface defects.
The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company
level – interact without error.

~ 25 ~
4.6.9 Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.

~ 26 ~
4.7 SIMPLE TEST CASES

Remarks(IF
S.no Test Case Excepted Result Result
Fails)
If already user
1 User Register If User registration successfully. Pass email exist then
it fails.
If Username and password is Un Register
2 User Login correct then it will getting valid Pass Users will not
page. logged in.
The request will
be not accepted
The request will be accepted by the
3 svm Pass by the svm
svm
otherwise its
failed
The request will
be accepted by
The request will be accepted by the
4 Naive Bayes Pass the Naive Bayes
Naive Bayes
otherwise its
failed
View dataset by Data set will be displayed by the Results not true
5 Pass
user user failed

Results not true


6 User classification Display reviews with true results Pass
failed

Calculate
macro avg and
accuracy macro macro avg and weighted avg
7 Pass weighted avg not
avg and weighted calculated
displayed failed
avg
Result will be cyberbulling or not
8 prediction pass Otherwise fail
cyberbulling
Admin can login with his login Invalid login
9 Admin login credential. If success he get his Pass details will not
home page allowed here
Admin can If user id not
Admin can activate the register
10 activate the Pass found then it
user id
register users won’t login.

~ 27 ~
CHAPTER 5

SOURCE CODE

User side views:


from django.shortcuts import render, HttpResponse
from .forms import UserRegistrationForm
from django.contrib import messages
from .models import UserRegistrationModel

# Create your views here.


def UserRegisterActions(request):
if request.method == 'POST':
form = UserRegistrationForm(request.POST)
if form.is_valid():
print('Data is Valid')
form.save()
messages.success(request, 'You have been successfully registered')
form = UserRegistrationForm()
return render(request, 'UserRegistrations.html', {'form': form})
else:
messages.success(request, 'Email or Mobile Already Existed')
print("Invalid form")
else:
form = UserRegistrationForm()
return render(request, 'UserRegistrations.html', {'form': form})

def UserLoginCheck(request):
if request.method == "POST":
loginid = request.POST.get('loginname')
pswd = request.POST.get('pswd')
print("Login ID = ", loginid, ' Password = ', pswd)
try:
check = UserRegistrationModel.objects.get(loginid=loginid, password=pswd)
status = check.status
print('Status is = ', status)
if status == "activated":
request.session['id'] = check.id
request.session['loggeduser'] = check.name
request.session['loginid'] = loginid
request.session['email'] = check.email
print("User id At", check.id, status)
return render(request, 'users/UserHome.html', {})

~ 28 ~
else:
messages.success(request, 'Your Account has not been activated by Admin.')
return render(request, 'UserLogin.html')
except Exception as e:
print('Exception is ', str(e))
pass
messages.success(request, 'Invalid Login id and password')
return render(request, 'UserLogin.html', {})

def UserHome(request):
return render(request, 'users/UserHome.html', {})

def TrainModel(request):
import os
import tensorflow as tf
import pandas as pd
import numpy as np
from django.conf import settings
from matplotlib import pyplot as plt
activity_codes_mapping = {'A': 'walking',
'B': 'jogging',
'C': 'stairs',
'D': 'sitting',
'E': 'standing',
'F': 'typing',
'G': 'brushing teeth',
'H': 'eating soup',
'I': 'eating chips',
'J': 'eating pasta',
'K': 'drinking from cup',
'L': 'eating sandwich',
'M': 'kicking soccer ball',
'O': 'playing catch tennis ball',
'P': 'dribbling basket ball',
'Q': 'writing',
'R': 'clapping',
'S': 'folding clothes'}

activity_color_map = {activity_codes_mapping['A']: 'lime',


activity_codes_mapping['B']: 'red',
activity_codes_mapping['C']: 'blue',
activity_codes_mapping['D']: 'orange',

~ 29 ~
activity_codes_mapping['E']: 'yellow',
activity_codes_mapping['F']: 'lightgreen',
activity_codes_mapping['G']: 'greenyellow',
activity_codes_mapping['H']: 'magenta',
activity_codes_mapping['I']: 'gold',
activity_codes_mapping['J']: 'cyan',
activity_codes_mapping['K']: 'purple',
activity_codes_mapping['L']: 'lightgreen',
activity_codes_mapping['M']: 'violet',
activity_codes_mapping['O']: 'limegreen',
activity_codes_mapping['P']: 'deepskyblue',
activity_codes_mapping['Q']: 'mediumspringgreen',
activity_codes_mapping['R']: 'plum',
activity_codes_mapping['S']: 'olive'}

def show_accel_per_activity(device, df, act, interval_in_sec = None):


''' Plots acceleration time history per activity '''

df1 = df.loc[df.activity == act].copy()


df1.reset_index(drop = True, inplace = True)

df1['duration'] = (df1['timestamp'] - df1['timestamp'].iloc[0])/1000000000 #


nanoseconds --> seconds

if interval_in_sec == None:
ax = df1[:].plot(kind='line', x='duration', y=['x','y','z'], figsize=(25,7), grid = True)
# ,title = act)
else:
ax = df1[:interval_in_sec*20].plot(kind='line', x='duration', y=['x','y','z'],
figsize=(25,7), grid = True) # ,title = act)

ax.set_xlabel('duration (sec)', fontsize = 15)


ax.set_ylabel('acceleration (m/sec^2)',fontsize = 15)
ax.set_title('Acceleration: Device: ' + device + ' Activity: ' + act, fontsize = 15)
# plt.show()

def show_ang_velocity_per_activity(device, df, act, interval_in_sec = None):


''' Plots angular volocity time history per activity '''

df1 = df.loc[df.activity == act].copy()


df1.reset_index(drop = True, inplace = True)

df1['duration'] = (df1['timestamp'] - df1['timestamp'].iloc[0])/1000000000 #


nanoseconds --> seconds

if interval_in_sec == None:

~ 30 ~
ax = df1[:].plot(kind='line', x='duration', y=['x','y','z'], figsize=(25,7), grid = True)
# ,title = act)
else:
ax = df1[:interval_in_sec*20].plot(kind='line', x='duration', y=['x','y','z'],
figsize=(25,7), grid = True) # ,title = act)

ax.set_xlabel('duration (sec)', fontsize = 15)


ax.set_ylabel('angular velocity (rad/sec)',fontsize = 15)
ax.set_title('Angular velocity: Device: ' + device + ' Activity: ' + act,
fontsize=15)
datasetpath = os.path.join(settings.MEDIA_ROOT,'wisdm-dataset')

#accel_phone

raw_par_10_phone_accel = pd.read_csv(datasetpath + '/' +


'raw/phone/accel/data_1610_accel_phone.txt', names = ['participant_id' , 'activity_code' ,
'timestamp', 'x', 'y', 'z'], index_col=None, header=None)
print('-'*100)
print(raw_par_10_phone_accel)
raw_par_10_phone_accel.z = raw_par_10_phone_accel.z.str.strip(';')
raw_par_10_phone_accel.z = pd.to_numeric(raw_par_10_phone_accel.z)

raw_par_10_phone_accel['activity'] =
raw_par_10_phone_accel['activity_code'].map(activity_codes_mapping)

raw_par_10_phone_accel = raw_par_10_phone_accel[['participant_id', 'activity_code',


'activity', 'timestamp', 'x', 'y', 'z']]

print(raw_par_10_phone_accel)

for key in activity_codes_mapping:


show_accel_per_activity('Phone', raw_par_10_phone_accel,
activity_codes_mapping[key], 10)

#accel_watch

raw_par_20_watch_accel = pd.read_csv(datasetpath + '/' +


'raw/watch/accel/data_1620_accel_watch.txt', names = ['participant_id' , 'activity_code' ,
'timestamp', 'x', 'y', 'z'], index_col=None, header=None)

raw_par_20_watch_accel.z = raw_par_20_watch_accel.z.str.strip(';')
raw_par_20_watch_accel.z = pd.to_numeric(raw_par_20_watch_accel.z)

raw_par_20_watch_accel['activity'] =
raw_par_20_watch_accel['activity_code'].map(activity_codes_mapping)

raw_par_20_watch_accel = raw_par_20_watch_accel[['participant_id', 'activity_code',


'activity', 'timestamp', 'x', 'y', 'z']]

~ 31 ~
print(raw_par_20_watch_accel)
for key in activity_codes_mapping:
show_accel_per_activity('Watch', raw_par_20_watch_accel,
activity_codes_mapping[key], 50)

#gyro_phone
raw_par_35_phone_ang_vel = pd.read_csv(datasetpath + '/' +

'raw/phone/gyro/data_1635_gyro_phone.txt', names = ['participant_id' , 'activity_code' ,


'timestamp', 'x', 'y', 'z'], index_col=None, header=None)

raw_par_35_phone_ang_vel.z = raw_par_35_phone_ang_vel.z.str.strip(';')
raw_par_35_phone_ang_vel.z = pd.to_numeric(raw_par_35_phone_ang_vel.z)

raw_par_35_phone_ang_vel['activity'] =
raw_par_35_phone_ang_vel['activity_code'].map(activity_codes_mapping)

raw_par_35_phone_ang_vel = raw_par_35_phone_ang_vel[['participant_id',
'activity_code', 'activity', 'timestamp', 'x', 'y', 'z']]

print(raw_par_35_phone_ang_vel)

for key in activity_codes_mapping:


show_ang_velocity_per_activity('Phone', raw_par_35_phone_ang_vel,
activity_codes_mapping[key])

#gyro_watch
raw_par_45_watch_ang_vel = pd.read_csv(datasetpath + '/' +
'raw/watch/gyro/data_1635_gyro_watch.txt', names = ['participant_id' , 'activity_code' ,
'timestamp', 'x', 'y', 'z'], index_col=None, header=None)

raw_par_45_watch_ang_vel.z = raw_par_45_watch_ang_vel.z.str.strip(';')
raw_par_45_watch_ang_vel.z = pd.to_numeric(raw_par_45_watch_ang_vel.z)

raw_par_45_watch_ang_vel['activity'] =
raw_par_45_watch_ang_vel['activity_code'].map(activity_codes_mapping)

raw_par_45_watch_ang_vel = raw_par_45_watch_ang_vel[['participant_id',
'activity_code', 'activity', 'timestamp', 'x', 'y', 'z']]

print(raw_par_45_watch_ang_vel)

for key in activity_codes_mapping:


show_ang_velocity_per_activity('Watch', raw_par_45_watch_ang_vel,
activity_codes_mapping[key])

~ 32 ~
features = ['ACTIVITY',
'X0', # 1st bin fraction of x axis acceleration distribution
'X1', # 2nd bin fraction ...
'X2',
'X3',
'X4',
'X5',
'X6',
'X7',
'X8',
'X9',
'Y0', # 1st bin fraction of y axis acceleration distribution
'Y1', # 2nd bin fraction ...
'Y2',
'Y3',
'Y4',
'Y5',
'Y6',
'Y7',
'Y8',
'Y9',
'Z0', # 1st bin fraction of z axis acceleration distribution
'Z1', # 2nd bin fraction ...
'Z2',
'Z3',
'Z4',
'Z5',
'Z6',
'Z7',
'Z8',
'Z9',
'XAVG', # average sensor value over the window (per axis)
'YAVG',
'ZAVG',
'XPEAK', # Time in milliseconds between the peaks in the wave associated with
most activities. heuristically determined (per axis)
'YPEAK',
'ZPEAK',
'XABSOLDEV', # Average absolute difference between the each of the 200
readings and the mean of those values (per axis)
'YABSOLDEV',
'ZABSOLDEV',
'XSTANDDEV', # Standard deviation of the 200 window's values (per axis)
***BUG!***
'YSTANDDEV',
'ZSTANDDEV',
'XVAR', # Variance of the 200 window's values (per axis) ***BUG!***
'YVAR',
'ZVAR',
'XMFCC0', # short-term power spectrum of a wave, based on a linear cosine
transform of a log power spectrum on a non-linear mel scale of frequency (13 values per

~ 33 ~
axis)
'XMFCC1',
'XMFCC2',
'XMFCC3',
'XMFCC4',
'XMFCC5',
'XMFCC6',
'XMFCC7',
'XMFCC8',
'XMFCC9',
'XMFCC10',
'XMFCC11',
'XMFCC12',
'YMFCC0', # short-term power spectrum of a wave, based on a linear cosine
transform of a log power spectrum on a non-linear mel scale of frequency (13 values per
axis)
'YMFCC1',
'YMFCC2',
'YMFCC3',
'YMFCC4',
'YMFCC5',
'YMFCC6',
'YMFCC7',
'YMFCC8',
'YMFCC9',
'YMFCC10',
'YMFCC11',
'YMFCC12',
'ZMFCC0', # short-term power spectrum of a wave, based on a linear cosine
transform of a log power spectrum on a non-linear mel scale of frequency (13 values per
axis)
'ZMFCC1',
'ZMFCC2',
'ZMFCC3',
'ZMFCC4',
'ZMFCC5',
'ZMFCC6',
'ZMFCC7',
'ZMFCC8',
'ZMFCC9',
'ZMFCC10',
'ZMFCC11',
'ZMFCC12',
'XYCOS', # The cosine distances between sensor values for pairs of axes (three
pairs of axes)
'XZCOS',
'YZCOS',
'XYCOR', # The correlation between sensor values for pairs of axes (three pairs of
axes)

~ 34 ~
'XZCOR',
'YZCOR',
'RESULTANT', # Average resultant value, computed by squaring each matching
x, y, and z value, summing them, taking the square root, and then averaging these values
over the 200 readings
'PARTICIPANT'] # Categirical: 1600 -1650

import glob

#the duplicate files to be ignored; all identical to 1600


duplicate_files = [str(i) for i in range(1611, 1618)] # '1611',...'1617'

# path = r'media/wisdm-dataset/arff_files/phone/accel'
path = datasetpath + '/' + 'arff_files/phone/accel'
all_files = glob.glob(path + "/*.arff")

list_dfs_phone_accel = []

for filename in all_files:

if any(dup_fn in filename for dup_fn in duplicate_files):


continue #ignore the duplicate files
df = pd.read_csv(filename, names = features, skiprows = 96, index_col=None,
header=0)
list_dfs_phone_accel.append(df)

all_phone_accel = pd.concat(list_dfs_phone_accel, axis=0, ignore_index=True,


sort=False)

print(all_phone_accel)

print(all_phone_accel.info())

all_phone_accel_breakpoint = all_phone_accel.copy()
# all_phone_accel['ACTIVITY'].map(activity_codes_mapping).value_counts()

#_=
all_phone_accel['ACTIVITY'].map(activity_codes_mapping).value_counts().plot(kind =
'bar', figsize = (15,5), color = 'purple', title = 'row count per activity', legend = True,
fontsize = 15)

all_phone_accel.drop('PARTICIPANT', axis = 1, inplace = True)

from sklearn.model_selection import train_test_split

~ 35 ~
y = all_phone_accel.ACTIVITY
X = all_phone_accel.drop('ACTIVITY', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y,


train_size = 0.75,
test_size = 0.25,
shuffle = True,
stratify = all_phone_accel.ACTIVITY)
print('-----X_train-------')
print(X_train)
print('-----Y Train-------')
print(y_train)

X_train.insert(0, 'Y', y_train)


print('-----X_train-------')
print(X_train)

y_train = X_train['Y']
print('-----Y Train-------')
print(y_train)

X_train.drop(['Y'], axis = 1, inplace = True)

from sklearn.preprocessing import MaxAbsScaler

scaling_transformer = MaxAbsScaler().fit(X_train[['XAVG', 'YAVG', 'ZAVG',


'XPEAK', 'YPEAK', 'ZPEAK', 'XABSOLDEV', 'YABSOLDEV', 'ZABSOLDEV',
'RESULTANT']])
X_train[['XAVG', 'YAVG', 'ZAVG', 'XPEAK', 'YPEAK', 'ZPEAK', 'XABSOLDEV',
'YABSOLDEV', 'ZABSOLDEV', 'RESULTANT']] =
scaling_transformer.transform(X_train[['XAVG', 'YAVG', 'ZAVG', 'XPEAK', 'YPEAK',
'ZPEAK', 'XABSOLDEV', 'YABSOLDEV', 'ZABSOLDEV', 'RESULTANT']])
X_test = X_test.copy()
X_test[['XAVG', 'YAVG', 'ZAVG', 'XPEAK', 'YPEAK', 'ZPEAK', 'XABSOLDEV',
'YABSOLDEV', 'ZABSOLDEV', 'RESULTANT']] =
scaling_transformer.transform(X_test[['XAVG', 'YAVG', 'ZAVG', 'XPEAK', 'YPEAK',
'ZPEAK', 'XABSOLDEV', 'YABSOLDEV', 'ZABSOLDEV', 'RESULTANT']])

print('-------X_test-------')
print(X_test)

X_train.reset_index(drop = True, inplace = True)


print('-------X_train--------')
print(X_train)

X_test.reset_index(drop = True, inplace = True)


print('-------X_test-------')

~ 36 ~
print(X_test)
print('len - ',len(X_test))
print('type - ',type(X_test))
X_test.to_csv('TestDataFrame.csv')

y_train.reset_index(drop = True, inplace = True)


print('-----Y Train-------')
print(y_train)

y_test.reset_index(drop = True, inplace = True)


print('-----Y test----------')
print(y_test)

import pandas as pd
import matplotlib.pyplot as plt
import os
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
import pickle

from sklearn.model_selection import StratifiedShuffleSplit

_ = y_train.value_counts(sort = False).plot(kind = 'bar', figsize = (15,5), color = 'red',


title = 'row count per activity', legend = True, fontsize = 15)
# plt.show()

my_cv = StratifiedShuffleSplit(n_splits=5, train_size=0.7, test_size=0.3)

dt_classifier = DecisionTreeClassifier()

my_param_grid = {'min_samples_leaf': [6, 10, 20, 40],


'min_weight_fraction_leaf': [0.01, 0.02, 0.05],
'criterion': ['entropy'],
'min_impurity_decrease': [1e-2, 7e-3]}
dt_model_gs = GridSearchCV(estimator=dt_classifier,
param_grid=my_param_grid,
cv=my_cv,
scoring='accuracy',
verbose = 0,
return_train_score = True)

~ 37 ~
dt_model_gs.fit(X_train, y_train)
print('-------Fit Done---------')

print(dt_model_gs.best_params_)
dt_best_classifier = dt_model_gs.best_estimator_
pickle.dump(dt_best_classifier, open('Ajmodel.pkl', 'wb'))
print('-------Pickling Model Dumped------')

y_test_pred = dt_best_classifier.predict(X_test)

classification_report =
classification_report(y_true=y_test,y_pred=y_test_pred,output_dict=True)

classification_report = pd.DataFrame(classification_report).transpose().to_html()

return render(request, 'users/TrainModel.html',


{'classification_report':classification_report})

def Predict(request):
if request.method == 'POST':
activity_codes_mapping = {'A': 'walking',
'B': 'jogging',
'C': 'stairs',
'D': 'sitting',
'E': 'standing',
'F': 'typing',
'G': 'brushing teeth',
'H': 'eating soup',
'I': 'eating chips',
'J': 'eating pasta',
'K': 'drinking from cup',
'L': 'eating sandwich',
'M': 'kicking soccer ball',
'O': 'playing catch tennis ball',
'P': 'dribbling basket ball',
'Q': 'writing',
'R': 'clapping',
'S': 'folding clothes'}

import os
from django.conf import settings
import pickle
import pandas as pd

~ 38 ~
index_no = request.POST.get('index_no')
print(index_no)
print('type ----> ',type(index_no))
modelPath = os.path.join(settings.MEDIA_ROOT,'Ajmodel.pkl')
testDataPath = os.path.join(settings.MEDIA_ROOT,'TestDataFrame.csv')

pickled_model = pickle.load(open(modelPath, 'rb'))


testData = pd.read_csv(testDataPath)
pred_val = testData.iloc[int(index_no)]
print(pred_val)
pred_result = pickled_model.predict([pred_val])
SampleTestData = testData.head(100).to_html
print('type of pred_result --> ', type(pred_result))
print(pred_result)
print('type ---> ',type(activity_codes_mapping))

activity = activity_codes_mapping.get(pred_result[0])

return render(request, 'users/prediction.html',


{'testData':SampleTestData(index=False),'activity':activity})

else:
return render(request, 'users/prediction.html', {})
Base.html:
{% load static %}
<!DOCTYPE html>

<html lang="en">

<head>
<meta charset="utf-8">
<meta content="width=device-width, initial-scale=1.0" name="viewport">

<title>offensive langauge</title>
<meta content="" name="description">
<meta content="" name="keywords">

<!-- Favicons -->


<link href="{% static 'img/favicon.png'%}" rel="icon">
<link href="{% static 'img/apple-touch-icon.png'%}" rel="apple-touch-icon">

<!-- Google Fonts -->


<link
href="https://fonts.googleapis.com/css?family=Open+Sans:300,300i,400,400i,600,600i,70
0,700i|Muli:300,300i,400,400i,500,500i,600,600i,700,700i|Poppins:300,300i,400,400i,500
,500i,600,600i,700,700i" rel="stylesheet">

~ 39 ~
<!-- Vendor CSS Files -->
<link href="{% static 'vendor/animate.css/animate.min.css'%}" rel="stylesheet">
<link href="{% static 'vendor/aos/aos.css'%}" rel="stylesheet">
<link href="{% static 'vendor/bootstrap/css/bootstrap.min.css'%}" rel="stylesheet">
<link href="{% static 'vendor/bootstrap-icons/bootstrap-icons.css'%}" rel="stylesheet">
<link href="{% static 'vendor/boxicons/css/boxicons.min.css'%}" rel="stylesheet">
<link href="{% static 'vendor/glightbox/css/glightbox.min.css'%}" rel="stylesheet">
<link href="{% static 'vendor/swiper/swiper-bundle.min.css'%}" rel="stylesheet">

<!-- Template Main CSS File -->


<link href="{% static 'css/style.css'%}" rel="stylesheet">

<!-- =======================================================
* Template Name: Flattern
* Updated: May 30 2023 with Bootstrap v5.3.0
* Template URL: https://bootstrapmade.com/flattern-multipurpose-bootstrap-template/
* Author: BootstrapMade.com
* License: https://bootstrapmade.com/license/
======================================================== -->
</head>

<body>
<!-- ======= Header ======= -->
<header id="header" class="d-flex align-items-center">
<div class="container d-flex justify-content-between">

<div class="logo">
<h1 class="text-light"><a href="/index">OFFENSIVE LANGUAGE
DETECTION</a></h1>
<!-- Uncomment below if you prefer to use an image logo -->

<!-- <a href="index.html"><img src="assets/img/logo.png" alt="" class="img-


fluid"></a>-->
</div>

<nav id="navbar" class="navbar">


<ul>
<li><a href="{% url 'index'%}">Home</a></li>
<li><a href="{% url 'UserLogin' %}">User-Login</a></li>
<li><a href="{% url 'AdminLogin' %}">Admin-Login</a></li>
<li><a href="{% url 'UserRegister' %}">User-Register</a></li>
</ul>
<i class="bi bi-list mobile-nav-toggle"></i>
</nav><!-- .navbar -->

~ 40 ~
</div>
</header><!-- End Header -->

<!-- ======= Hero Section ======= -->


<section id="hero">
<div id="heroCarousel" data-bs-interval="5000" class="carousel slide carousel-fade"
data-bs-ride="carousel">

<div class="carousel-inner" role="listbox">

<!-- Slide 1 -->


<div class="carousel-item active" style="background-image:
url('../static/img/slide/slide-1.jpg');">
<div class="carousel-container">
<div class="carousel-content animate__animated animate__fadeInUp">
<h2>Offensive Language Detection on Social<span>Media Based on Text
Classification</span></h2>
<p>Detecting offensive language can be a challenging task due to the evolving
nature of language and the context-dependent nature of offensiveness. However, there are
some common approaches and techniques that can be used to identify offensive language.
Here are a few methods.</p>
<div class="text-center"><a href="" class="btn-get-started">Read
More</a></div>
</div>
</div>
</div>

<!-- Slide 2 -->


<div class="carousel-item" style="background-image: url('../static/img/slide/slide-
2.jpg');">
<div class="carousel-container">
<div class="carousel-content animate__animated animate__fadeInUp">
<h2>Offensive language</h2>
<p>It's important to note that offensive language detection is a complex task, and
no single method can be fully effective due to the nuances and context-dependency of
offensive language. Additionally, the definition of offensive language may vary across
cultures and contexts, making it essential to consider these factors when designing an
offensive language detection system.</p>
<div class="text-center"><a href="" class="btn-get-started">Read
More</a></div>
</div>
</div>
</div>

<!-- Slide 3 -->


<div class="carousel-item" style="background-image: url('../static/img/slide/slide-
3.jpg');">

~ 41 ~
<div class="carousel-container">
<div class="carousel-content animate__animated animate__fadeInUp">
<h2>Machine learning approach: </h2>
<p>Train a machine learning model using labeled data to classify text as
offensive or non-offensive. This typically involves extracting features from the text, such
as n-grams, word embeddings, or syntactic features, and using algorithms like Guassian
Naive Bayes,Decision Tree(DT),Support Vector Machines (SVM),Random Forest (RF),
Logistic Regresiiopn(LR),Multi Layer Preceptron (MLP), Gradient Bossting(GB),Ada
Boost.</p>
<div class="text-center"><a href="" class="btn-get-started">Read
More</a></div>
</div>
</div>
</div>

</div>

<a class="carousel-control-prev" href="#heroCarousel" role="button" data-bs-


slide="prev">
<span class="carousel-control-prev-icon bx bx-left-arrow" aria-
hidden="true"></span>
</a>

<a class="carousel-control-next" href="#heroCarousel" role="button" data-bs-


slide="next">
<span class="carousel-control-next-icon bx bx-right-arrow" aria-
hidden="true"></span>
</a>

<ol class="carousel-indicators" id="hero-carousel-indicators"></ol>

</div>

</section><!-- End Hero -->


{% block contents %}

{% endblock %}
<main id="main">

<!-- ======= Services Section ======= -->


<section id="services" class="services">
<div class="container">

<div class="row">
<div class="col-lg-4 col-md-6">
<div class="icon-box" data-aos="fade-up">

~ 42 ~
<div class="icon"><i class="bi bi-briefcase"></i></div>
<h4 class="title"><a href="">Machine Learning</a></h4>
<p class="description">Machine Learning is a program that analyses data and
learns to predict the outcome.Machine Learning is making the computer learn from
studying data and statistics.</p>
</div>
</div>
<div class="col-lg-4 col-md-6">
<div class="icon-box" data-aos="fade-up" data-aos-delay="100">
<div class="icon"><i class="bi bi-card-checklist"></i></div>
<h4 class="title"><a href="">Data preprocessing</a></h4>
<p class="description">It is a crucial step in data analysis and machine learning
tasks. It involves preparing the raw data to make it suitable for further analysis or model
training.</p>
</div>
</div>
<div class="col-lg-4 col-md-6">
<div class="icon-box" data-aos="fade-up" data-aos-delay="200">
<div class="icon"><i class="bi bi-bar-chart"></i></div>
<h4 class="title"><a href="">Handling Missing Values</a></h4>
<p class="description"> Identify and handle missing values in the dataset. This
can involve techniques such as imputation (replacing missing values with estimated
values) or deletion (removing rows or columns with missing values).</p>
</div>
</div>
<div class="col-lg-4 col-md-6">
<div class="icon-box" data-aos="fade-up" data-aos-delay="200">
<div class="icon"><i class="bi bi-binoculars"></i></div>
<h4 class="title"><a href="">Training</a></h4>
<p class="description">The training set is used to train the machine learning
model. It typically consists of a large portion of the available data, usually around 70-
80%.</p>
</div>
</div>
<div class="col-lg-4 col-md-6">
<div class="icon-box" data-aos="fade-up" data-aos-delay="300">
<div class="icon"><i class="bi bi-brightness-high"></i></div>
<h4 class="title"><a href="">Testing Set</a></h4>
<p class="description">
The testing set is used to evaluate the performance of the trained model. It
should be independent of the training set and should not be used during the training
process.</p>
</div>
</div>

~ 43 ~
<div class="col-lg-4 col-md-6">
<div class="icon-box" data-aos="fade-up" data-aos-delay="400">
<div class="icon"><i class="bi bi-calendar4-week"></i></div>
<h4 class="title"><a href="">After Traing and Testing</a></h4>
<p class="description">The predicted values are compared with the actual target
values in the testing set to assess the model's accuracy, precision, recall, F1 score, or other
performance metrics, depending on the specific task.</p>
</div>
</div>
</div>

</div>
</section><!-- End Services Section -->

</main><!-- End #main -->

<!-- ======= Footer ======= -->


<footer id="footer">

<div class="footer-top">
<div class="container">
<div class="row">

</div>
</div>
</div>

<div class="container d-md-flex py-4">

<div class="me-md-auto text-center text-md-start">


<div class="copyright">
© Copyright <strong><span>Offensive Language Detection</span></strong>. All
Rights Reserved
</div>
<div class="credits">
<!-- All the links in the footer should remain intact. -->
<!-- You can delete the links only if you purchased the pro version. -->
<!-- Licensing information: https://bootstrapmade.com/license/ -->
<!-- Purchase the pro version with working PHP/AJAX contact form:
https://bootstrapmade.com/flattern-multipurpose-bootstrap-template/ -->
Designed by <a href="https://bootstrapmade.com/">Alex</a>
</div>
</div>
</div>
</footer><!-- End Footer -->

~ 44 ~
<a href="#" class="back-to-top d-flex align-items-center justify-content-center"><i
class="bi bi-arrow-up-short"></i></a>
<!-- Vendor JS Files -->
<script src="{% static 'vendor/aos/aos.js' %}"></script>
<script src="{% static 'vendor/bootstrap/js/bootstrap.bundle.min.js' %}"></script>
<script src="{% static 'vendor/glightbox/js/glightbox.min.js' %}"></script>
<script src="{% static 'vendor/isotope-layout/isotope.pkgd.min.js' %}"></script>
<script src="{% static 'vendor/swiper/swiper-bundle.min.js' %}"></script>
<script src="{% static 'vendor/waypoints/noframework.waypoints.js' %}"></script>
<script src="{% static 'vendor/php-email-form/validate.js' %}"></script>

<!-- Template Main JS File -->


<script src="{% static 'js/main.js' %}"></script>

</body>

</html>
Admin side views:
from django.shortcuts import render, HttpResponse
from django.contrib import messages
from users.models import UserRegistrationModel

# Create your views here.


def AdminLoginCheck(request):
if request.method == 'POST':
usrid = request.POST.get('loginid')
pswd = request.POST.get('pswd')
print("User ID is = ", usrid)
if usrid == 'admin' and pswd == 'admin':
return render(request, 'admins/AdminHome.html')

else:
messages.success(request, 'Please Check Your Login Details')
return render(request, 'AdminLogin.html', {})

def AdminHome(request):
return render(request, 'admins/AdminHome.html')

def RegisterUsersView(request):
data = UserRegistrationModel.objects.all()
return render(request,'admins/viewregisterusers.html',{'data':data})

def ActivaUsers(request):
if request.method == 'GET':

~ 45 ~
id = request.GET.get('uid')
status = 'activated'
print("PID = ", id, status)
UserRegistrationModel.objects.filter(id=id).update(status=status)
data = UserRegistrationModel.objects.all()
return render(request,'admins/viewregisterusers.html',{'data':data})
def DeleteUsers(request):
if request.method == 'GET':
id = request.GET.get('uid')
status = 'activated'
print("PID = ", id, status)
UserRegistrationModel.objects.filter(id=id).delete()
data = UserRegistrationModel.objects.all()
return render(request,'admins/viewregisterusers.html',{'data':data})

~ 46 ~
CHAPTER 6
EXPERIMENTAL RESULTS
REGISTER FORM

ADMIN LOGIN PAGE

~ 47 ~
ADMIN HOME PAGE

ACTIVE USER

~ 48 ~
USER LOGIN PAGE

TRAIN MODEL

~ 49 ~
PREDICTION

PREDICTION RESULTS

~ 50 ~
CHAPTER 7

CONCLUSION & FUTURE ENHANCEMENT

7.1 CONCLUSION:

In this paper, a multi-machine learning method is proposed to conduct behavioral


biometrics identification.According to the experiment results, the proposed method could
achieve a satisfying result, which has shown a quite promising future. Since phone
accelerometer has been prevalently used and has proven to be effective in behavior
recognition, behavioral biometrics may appear as a better alternative to current
identification methods.
This research has laid a foundation for further field-oriented study. We may examine
whether behavioral biometrics can successfully be applied to real-life situations. For
example, further research could focus on improving and specializing machine learning
models in fields of medical science, national security, psychology, etc

7.2 FUTURE ENHANCEMENT:


To improve model generalization for behavioral biometric systems, more diverse and
extensive training datasets should be collected and augmented through techniques like
data synthesis and sampling. Robust data preprocessing is essential to clean and
standardize the raw input data. Feature engineering tailored to the specific biometric
modality, such as dynamic time warping for keystroke dynamics, can enhance the learning
process. Deep learning methods like autoencoders and convolutional neural networks
should be explored for automated feature learning. Testing various machine learning
algorithms, including SVM, decision trees, and neural networks, as well as ensemble
methods, can determine the optimal model for the problem. Hyperparameter optimization
through grid search or Bayesian methods further improves performance. For deep
networks, trying different architectures, layer sizes, and activation functions is important.
Regularization techniques like dropout and batch normalization prevent overfitting.

~ 51 ~
REFERENCES

[1] Weiss, G., Yoneda, K. and Hayajneh, T. (2019) ‘Smartphone and Smartwatch-
Based Biometrics Using Activities of Daily Living', IEEE Access, 7, pp. 133190-133202.
[2] Abuhamad, M., Abusnaina, A., Nyang, D. and Mohaisen, D. (2021) ‘Sensor-Based
Continuous Authentication of Smartphones' Users Using Behavioral Biometrics: A
Contemporary Survey', IEEE Internet of Things Journal, 8(1), pp. 65-84.
[3] Divya, R. and Lavanya, R. (2020) ‘A Systematic Review on Gait Based
Authentication System', 2020 6th International Conference on Advanced Computing and
Communication Systems (ICACCS), pp. 505-509.
[4] Gafurov, D., Snekkenes, E. and Bours, P. (2007) ‘Gait Authentication and
Identification Using Wearable Accelerometer Sensor', 2007 IEEE Workshop on
Automatic Identification Advanced Technologies, pp. 220-225.
[5] Mantyjarvi, J., Lindholm, M., Vildjiounaite, E., Makela, S. and Ailisto,
H. (2005) ‘Identifying users of portable devices from gait pattern with accelerometers',
IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, pp.973-
976.

~ 52 ~

You might also like