• Aura – Emotion Recognition System
Aura - Emotion Recognition System
Heta Shah1, Kirtan Jhaveri1, Lakshita Shetty1, Mihikaa Debray1, Tanaji Biradar2
1
U. G. Student, Department of Electronics and Telecommunication, D.J.Sanghvi College of Engineering, Vile Parle (W), Mumbai- 400056
2
Associate Professor, Electronics and Telecommunication Department D.J.Sanghvi College of Engineering, Vile Parle (W), Mumbai-
400056
E-mail: 1hetashah1345@gmail.com, 1kirtan12kj@gmail.com, 1lakshi276@gmail.com, 1 mihikaa.b202@gmail.com,
2
tanaji.biradar@djsce.ac.in
Abstract—All our life we have felt and recognized emotions or dynamic elements of these deformations of face
involuntarily, something that comes naturally. But the inability pigmentation. A person's facial expressions are divided into
to interact or communicate due to lack of development during basic emotions such as anger, disgust, fear, happiness,
the early years of life is a serious issue. Autistic children are sadness, and surprise using this method. The main purpose
unable to recognize emotions or sentiments leading to special of this technology is to use eye gazing, facial expressions,
needs. In the current day and age, understanding emotions and cognitive modelling, and other techniques to enable
expressions is a skill that is not a want but a need. Be it having efficient interaction between humans and robots. In this
a regular conversation at the supermarket or an important
situation, facial expression recognition and categorization
meeting, these recognition skills are necessary. At a very
tender age, autistic children usually face more difficulty in
can be employed as a natural form of human-machine
development as compared to their peers. There are certain red connection. The system's intensity varies by person, as well
flags that hold them back from engaging with others. as by age, gender, facial size, and shape, and even within
Nonverbal cues, such as facial expression, play a vital part in the same person, expressions do not remain stable over
interpersonal relationships. The Facial Expression Recognition time. Recognizing expressions is difficult due to the
system is a method of determining a person's emotional state. inherent variability of facial pictures generated by various
In this system, the captured image is compared to a trained factors such as illumination, position, alignment, and
dataset stored in the database, and the image's emotional state occlusions. Some facial feature representations for face
is presented. An autistic kid when unaware of an individual's recognition and expression analysis surveys go into detail
emotions can use the software Live and make informed regarding these challenges and possible solutions. Common
decisions. This will not only develop their interpersonal skills methods for extracting speech emotion features can only
but at the same time protect them from unwanted situations. represent limited context information and cannot fully
We have designed expression recognition using Tensor flow exploit the qualities that human emotion fluctuates slowly
and Keras which ensures adaptability and smooth functioning. and is dependent on semantic settings. Deep learning
We also made use of CNN which is convolutional neural algorithms, on the other hand, can determine the underlying
network. For the basic function of Speech-to-text conversion,
internal relationship between data and extract intrinsic
we went ahead with Google’s API which is very trusted.
characteristics from large-scale training data, resulting in
Keywords— facial recognition, emotion recognition, python, more accurate emotion classification. We transform voice to
JavaScript, visual studio, CSS3, HTML, Tensor flow, OpenCV, text to detect emotion since applying machine learning to an
Machine Learning, Autism. audio stream and detecting emotions would be a tiresome
operation. As a result, our technology allows the user to
I. INTRODUCTION upload a preset audio track to make the procedure easier.
Facial expression is a visual reflection of an individual's The file is then processed using Python tools and Google's
affective state, cognitive activity, intention, personality, and speech recognition API. Why do we use Google's API
psychopathology, and it is used to communicate in instead of creating our own model? Since Google has been
interpersonal relationships. It has been studied for many utilizing a voice assistant that can translate speech to text
years and has progressed significantly in recent decades. better than other models, it has a larger dataset and superior
Despite substantial progress, accurately understanding facial accuracy
emotions remains difficult because to the complexity and II. LITERATURE SURVEY
variety of facial expressions. Nonverbal clues like gestures,
facial expressions, and instinctive languages can be utilised In everyday life, emotion is important. People can
to convey intents and emotions in general. This technique communicate and comprehend each other more easily when
has the potential to be a powerful nonverbal communication they express their emotions. Emotion recognition is a
tool for people. It's how well the system recognises or technology that allows machines to understand human
extracts facial expressions from photographs that counts. emotions and has a wide range of applications. It is not
Because of its potential, the system is gaining traction. necessary that these interpersonal skills are well developed
Identifying features presented as part of a facial expression in individuals. Autistic children usually face the issue of
allows humans to perceive emotions on a regular basis. A interaction and communication. At a very tender age, their
grin or upward movement of the lips' corners, for example, development is comparatively slower than others. There are
is intrinsically tied to happiness. Other emotions are also certain red flags that hold them back from engaging with
identified by deformations that are specific to that others. They have difficulty comprehending what is going
manifestation. Automatic facial emotion identification study on around them, such as what other people are saying or
addresses the problem of expressing and categorising static communicating nonverbally. And due to these
• Proceedings for DJS STRIKE 2023-2024 ISBN: 978-93-6076-096-0
• Aura – Emotion Recognition System
developmental delays, they must begin with a steady course scanning feature wherein you can scan the emotion of the
of daily therapies. Therapy becomes a daily routine for indivial live and the results instantly which is dynamic. The
these kids wherein their minds are developed. To help them special person can then make an informed decision.
practically make decisions in day-to-day life, we wanted to
Our Application also provides the feature of audio detection
come up with a solution that acts as a bridge for them
which uses Google’s text to speech convertor. Currently it
between therapy and practical execution. When autistic
would be using pre-installed audio but later can be
children can manage their own challenging behavior, they
developed into a real time feature.
can learn and get along better with others. We wanted to
develop a model which not only can help children but can
be used in multiple domains. Emotion, In human-computer
interaction (HCI), for example, permits the robot to deliver
appropriate feedback based on the user's emotional state,
thus boosting the contact quality. In the medical world,
having a better emotional understanding of how people with
mental illness live their life assists doctors to diagnose and
treat the problem more successfully. Because of the rapid
advancement of communication technology and the
growing popularity of smartphones and social media,
Internet users upload a large amount of video data to
express their thoughts. Furthermore, different points of view
might result in differences in emotional expression. We can
Fig.1 Home screen of Aura’s Interface
gain a more accurate and comprehensive understanding of
netizens' attitudes toward network events and products by IV. IMPLEMENTATION
using emotional recognition of users' multimedia films.
Emotion identification has come a long way in recent years.
However, most researchers are focused on unimodal Facial Expression Recognition
emotion identification. Speech and facial expressions, for The facial expression recognition system is trained by
example, share a lot of the same thematic and temporal taking photos of various facial expressions and utilising a
properties and are frequently seen in human emotional supervised learning approach. Image acquisition, face
interactions, therefore they're getting a lot of attention in the detection, image preprocessing, feature extraction, and
emotion computing field. External factors such as occlusion classification are all part of the system's training and testing
and lighting, on the other hand, may have an impact on the phase. Face detection and feature extraction are performed
accuracy of face expression recognition. The outcome of on face photos, and the results are then categorised into six
speech emotion identification will be influenced by classes that correspond to six basic expressions. Our system
variances in the voices of different participants as well as uses Convolutional Neural Networks for facial recognition
surrounding noise. By combining more emotional factors to process image data. CNN is a unique type of model that
and making full use of complimentary emotional has been designed for working with image data in 2D.
information, the recognition result is more accurate, and the Similar to a normal Neural Network, convolution is
recognition performance has evident advantages over single performed i.e multiplication between the input data with a
modal recognition. 2D array of weights. So we use CNN to train our model to
detect emotions. We have used the FER2013 dataset
III. PROPOSED DESIGN
available with around 27000 images to train and validate
Speech and facial expressions are used to represent human our model. The accuracy of our model is about 67% in
emotions and intentions, and generating an efficient and theory however when we tested it in live conditions it
effective feature is a key component of our system. Face turned out to be pretty accurate. When the user clicks the
recognition is critical for interpreting facial expressions in video button on the home page they are redirected to the
applications including intelligent man-machine interfaces video.html web page where a frame loads and displays
and communication, intelligent visual surveillance, whatever is visible to the camera. Then the system tries to
teleconferencing, and real-time animation from live motion find a face and once that is done the image data is sent to
images. For efficient interaction, face expressions are python in the backend where the model analyses and
useful. The majority of facial expression recognition classifies the expression with one of the 7 emotions. Then
research and systems are limited to six basic expressions the emotion is sent to the HTML page wherein it is
(happy, sad, anger, disgust, fear, surprise). It is discovered displayed near the face of the user.
that describing all facial expressions is insufficient, and
these expressions are grouped based on facial activities. Image Acquisition
Facial expression recognition is a difficult task. As a result, Static images or image sequences are utilised to recognise
we'll need more information derived from spoken face expressions. Face images can be obtained with a
conversation to help us with this by increasing the database. camera.
Whenever a person with special needs or anyone who
requires help with respect to recognition of emotions, is in
need of any device of practical use, then they can use Aura.
Simply by opening the API, they can use of Live video
• Proceedings for DJS STRIKE 2023-2024 ISBN: 978-93-6076-096-0
• Aura – Emotion Recognition System
way that they span the entire field of vision, the partial
image processing difficulty of traditional neural networks is
avoided. A CNN uses a multilayer perceptron-like
technology that is optimised for low processing
requirements. A CNN's layers consist of an input layer, an
output layer, and a hidden layer, along with numerous
convolutional layers, pooling layers, fully connected layers,
and normalising layers. The removal of limitations and
advances in image processing efficiency result in an image
processing and NLP system that is considerably more
effective and easier to train.
Fig.2 Flowchart of the proposed API
Face detection
Face detection is useful for identifying a person's face. The
Haar classifier is used to detect faces in the training dataset,
and Opencv is used to implement it. The difference in
average intensity in different sections of the image is Fig.3 System Diagram
encoded using Haar like features, which are made up of
black and white connected rectangles with the value of the
feature being the difference of the sum of pixel values in Speech Emotion Recognition
black and white regions
Now to detect emotion from speech we convert the speech
to text because it would be a tedious task to apply machine
Feature Extraction learning to an audio signal and detect emotions. Hence, to
simplify the process our system gives the user an option to
The feature is chosen. In a pattern classification task, the upload a prerecorded audio file. Then we process their file
vector is the most significant component. After pre- using some python packages and Google's speech
processing, the image of the face is used to extract the recognition API. We use Google's API, instead of designing
relevant features. Scale, pose, translation, and fluctuations a model ourselves because Google has a larger dataset and
in illumination level are all fundamental challenges in higher accuracy since it has been using a voice assistant that
picture classification. The CNN algorithm, which is detailed can transcribe speech to text better than most models. For
below, is used to extract the key features. emotion recognition from text, we have used a python
package called "text2emotion" in python. It processes any
Convolutional Neural Network textual message and can recognize the underlying emotions
from it. It currently has a limitation of only recognizing 5
A convolutional neural network (CNN) is a type of artificial emotions namely Happy, Sad, Angry, Surprise, and Fear. It
neural network that is used in image recognition and starts by removing any verbiage from the text in the
processing. It is specifically designed to process pixel input. message. Next, it performs NLP techniques to find specific
CNNs are image processing, artificial intelligence (AI) words that best describe emotions or feelings. After that, it
systems that use deep learning to do both generative and would match the word found in the message to its emotion
descriptive tasks, such as image and video recognition, category. It would also keep a count on how many words
recommender systems, and natural language processing were found in a particular emotion category. Lastly, it
(NLP). A neural network is a hardware and/or software provides the output in a dictionary format and displays the
system modelled after the way neurons in the human brain probability of each emotion category.
work. Traditional neural networks aren't designed for image
processing and must be fed images in smaller chunks. V. RESULTS AND CONCLUSION
CNN's "neurons" are more such as in the frontal lobe, the Various CNN algorithms were utilized in this project to
temporal lobe for processing information in humans and
anticipate a person's real-time expression. The model can
other species. By stacking the layers of neurons in such a
3
• Proceedings for DJS STRIKE 2023-2024 ISBN: 978-93-6076-096-0
• Aura – Emotion Recognition System
make predictions using only a few parameters and has a 67 Fig.12 Home Page
percent accuracy rate. However, research is being done to
integrate subjective characteristics in the model, such as
gestures and speech tone, in order to improve the model's
accuracy. The goal of this study is to use external
parameters to understand a person's emotion and how our
minds interpret that emotion. This is important for people
who have trouble comprehending emotions, as well as those
who want to analyse expressions as part of market demand
analysis. This project provides a method for recognising
facial expressions as a category. Robotic vision, video
surveillance, digital cameras, security, and human-computer
interaction are all applications that benefit from face
detection and emotion extraction from facial pictures. The Fig.13 Audio Page
purpose of this research was to develop a computer vision-
based facial expression recognition system while also VI. SOCIAL IMPACT & FUTURE
increasing advanced feature extraction and classification in Over the last decade, emotional expression recognition
face expression recognition. systems have vastly improved. The emphasis has shifted
away from posed expression recognition and toward
spontaneous expression recognition. Under face registration
mistakes, rapid processing time, and a high correct
recognition rate (CRR), promising results can be obtained,
as well as considerable performance improvements in our
system. The system is entirely automated and capable of
working with image feeds. It can recognize natural
expressions. Our method can be utilized in digital cameras
to record images only when the subject grins. Our method
can be utilized in digital cameras to record images only
when the subject grins. In security systems that can identify
a person, he can show himself in any manner. When a
person enters a room in a house, the lights and television
can be adjusted to their preferences. Doctors can utilize the
Fig.4 Accuracy and Loss in Training over 30 Epochs technique to determine the severity of a deaf patient's
discomfort or disease. Our technology can be used to
identify and track a user's mood, as well as at mini-marts
The Experimental demonstration is as follows: and shopping centers to view client feedback in order to
improve business, and so on. It can recognize people in a
Seven different facial expressions of different people's crowd and track residents' identities, age, gender, and
photos from various datasets were examined in this present emotional state to look for suspicious behavior. It
experiment. This research entails preprocessing collected can be used to prevent criminals and prospective terrorists
facial photos for facial expressions. Facial expression from committing crimes. The automated extraction of
recognition is a difficult problem to solve. For critical images of suspects from criminal records, which can help
applications, more effort should be put into improving police narrow down possible suspects rapidly, is a valuable
categorization performance. application of face recognition. This is a primary application
which uses the sketches of the accused and matches them
OUR WEBSITE INTERFACE IS BELOW with the photos in the federal database of the offenders.
Fig.11 Video Page Applications in fields like police departments, video
monitoring, banking, and security device access
authentication are in high demand. In recent times,
automatic face recognition has gotten a lot of press. due to
its ease of use and advantages over other measures. The
benefits of facial identification over other approaches, such
as fingerprint recognition, are largely due to the fact that it
does not need the cooperation of those being tested. Face
recognition systems are often more user-friendly and cost
effective, so recognition outcomes can be reversed in
ambiguous situations by people with little to no experience.
With the digitization of the world, everything is switching
online. In this scenario, it is of utmost importance that a
suitable face identification method is incorporated,
4
• Proceedings for DJS STRIKE 2023-2024 ISBN: 978-93-6076-096-0
• Aura – Emotion Recognition System
particularly in sectors like online banking (For instance in
digital banking customer onboarding process). Our system
with further enhancements and modifications will be
suitable in such platforms in the future.
ACKNOWLEDGMENT
This project and technical paper were greatly supported by
Professor Tanaji Biradar. His expert coaching and
comprehensive assistance were invaluable to us at every
level of this project, and this paper would not have been
possible without his contributions. We are grateful to our
colleagues who contributed their knowledge and experience
to the research and assisted us throughout the process.
REFERENCES
[1] Emotion recognition using facial expressions, International
Conference on Computational Science, ICCS 2017, 12-14 June 2017,
Zurich, Switzerland
[2] Multi-Modal Emotion Recognition From Speech and Facial
Expression Based on Deep Learning, ISBN:978-1-7281-7688-8, 29
January 2021, IEEE
[3] Dynamics of facial expression: recognition of facial actions and their
temporal segments from face profile image sequences, ISSN: 1083-
4419, 13 March 2006, IEEE
[4] Emotion Recognition Using Feature-level Fusion of Facial
Expressions and Body Gestures, IEEE Conference Record # 45898,
ISBN: 978-1-7281-1261-9
[5] Website:https://www.grandviewresearch.com/industryanalysis/
gesturerecognition-market
[6] Loic Kessous, Ginevra Castellano, George Caridakis, “Multimodal
emotion recognition in speech-based interaction using facial
expression,body gesture and acoustic analysis”, Journal on
Multimodal User Interfaces, vol. 3, Number 1-2, pp 33, 2010.
[7] F. Abdat, C. Maaoui and A. Pruski, “Bimodal system for emotion
recognition from facial expressions and physiological signals using
feature-level fusion,” 5th European Symposium on Computer
Modelingand Simulation, 2011.
[8] H. Gunes, M. Piccardi, “Bi-modal emotion recognition from
expressive face and body gestures”, Computer Vision Research
Group:Jounal of Network and Computer applications, vol. 30 Issue4,
pp 1334-1345, Nov 2007.
• Proceedings for DJS STRIKE 2023-2024 ISBN: 978-93-6076-096-0