A Google Glass Based Real-Time Scene Analysis For
A Google Glass Based Real-Time Scene Analysis For
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Blind and Visually Impaired People (BVIP) are likely to experience difficulties with tasks that
involve scene recognition. Wearable technology has played a significant role in researching and evaluating
systems developed for and with the BVIP community. This paper presents a system based on Google Glass
designed to assist BVIP with scene recognition tasks, thereby using it as a visual assistant. The camera
embedded in the smart glasses is used to capture the image of the surroundings, which is analyzed using
the Custom Vision Application Programming Interface (Vision API) from Azure Cognitive Services by
Microsoft. The output of the Vision API is converted to speech, which is heard by the BVIP user wearing
the Google Glass. A dataset of 5000 newly annotated images is created to improve the performance of the
scene description task in Indian scenarios. The Vision API is trained and tested on this dataset, increasing
the mean Average Precision (mAP) score from 63% to 84%, with an IoU > 0.5. The overall response time of
the proposed application was measured to be less than 1 second, thereby providing accurate results in real-
time. A Likert scale analysis was performed with the help of the BVIP teachers and students at the "Roman
Catherine Lobo School for the Visually Impaired" at Mangalore, Karnataka, India. From their response,
it can be concluded that the application helps the BVIP better recognize their surrounding environment in
real-time, proving the device effective as a potential assistant for the BVIP.
INDEX TERMS Google Glass, Human Computer Interaction, Azure Cognitive Services, Microsoft Vision
API, Ubiquitous computing, Visual assistant
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
perimented, testing whether a group of sighted individuals inside an indoor environment. When a BVIP user holding the
and visually impaired individuals experience a difference in smartphone moves through the building, the user will receive
physical and mental demands when given directions to spe- auditory information about the nearest point of interest. A
cific landmarks. Battaglia et al. [5] developed an integrated, very similar system was developed by Bie et al. [22] for
modular, and expandable open-source package called Blind an outdoor setting. Finally, Guerreiro et al. [23] developed
Assistant to show that it is possible to produce effective a smartphone based virtual-navigation application that helps
and affordable aids for the BVIP. Meza-de-Luna et al. [6] the BVIP gain route knowledge and familiarize themselves
designed a social-aware assistant using a pair of smart glasses with their surroundings before visiting a particular location.
and a haptic belt to enhance the face-to-face conversations Lupu et al. [24] presented an experimental framework to
of the BVIP by providing them with vibrational cues from assess the brain cortex activation and affective reactions of
the belt. Chang et al. [7] [8] [9] proposed a wearable the BVIP to stimuli provided by a sensory substitution device
smart-glasses-based drug pill recognition system using deep- used for navigation in real-world scenarios. The test was
learning for the BVIP to enable them to improve their med- done in 5 different types of experimental scenarios. It was
ication use safety. The system consists of a pair of wearable focused on evaluating working memory load, visual cortex
smart glasses, an artificial intelligence (AI)-based drug pill activation, and emotional experience when visually impaired
recognition box, and a mobile phone app. The smart glasses people perceive audio, haptic, and multimodal stimuli. Chang
are used to capture images of the drugs to be consumed, and et al. [25] proposed a wearable assistive system comprising a
the AI-based drug recognition box is used to identify the pair of smart glasses, a waist-mounted intelligent device, and
drugs in the image. The mobile app is used to track drug an intelligent cane to help BVIP consumers safely use zebra
consumption and also to provide timely reminders to the user. crossings. They used artificial intelligence (AI) based edge
Zientara et al. [10] proposed a shopping assistant system for computing techniques to help the BVIP users to utilize the
the BVIP called the ‘Third Eye’ that aids in navigation and zebra crossings.
identification of various products inside a shop. Similarly, Other researchers have focused on the design of assistive
Pintado et al. [11] designed a wearable object detection systems which help in scene description and analysis. Ye
device in eyewear that helps to recognize items from the et al. [26] analyzed how different devices can help the
produce section of a grocery store. BVIP in their daily lives and concluded that smartphones
In addition to shopping assistants, researchers have also play a significant role in their daily activities. Pēgeot et al.
developed Electronic Travel Aids (ETA) and obstacle de- [27] proposed a scene text tracking system used for finding
tection systems to assist navigation. Quinones et al. [12] and tracking text regions in video frames captured by a
performed a needs-finding study to assist in navigation of wearable camera. Gonzāles-Delgado et al. [28] proposed a
familiar and unfamiliar routes taken daily among the BVIP. smart gloves system that helps in meeting some of the daily
They concluded that a device that can act as an assistant is needs of the BVIP, such as face recognition, automatic mail
needed for better navigation. El-taher et al. [13] have done reading, automatic detection of objects, among other func-
a comprehensive review of research directly in, or relevant tions. Memo et al. [29] developed a head-mounted gesture
to, outdoor assistive navigation for the BVIP. They also recognition system. Their system uses a depth camera and
provided an overview of commercial and non-commercial an SVM classifier to identify the different gestures during a
navigation applications targeted at assisting the BVIP. Lee human conversation. Barney et al. [30] developed a sensory
et al. [14] implemented a guidance system that uses map- glass system that detects obstacles and informs the user of
matching algorithms and ultrasonic sensors to guide users 3D sound waves. The glasses were fitted with five ultrasonic
to their chosen destination. Tapu et al. [15] implemented sensors placed on the left, upper-left, front, right, and upper-
an autonomous navigation system for the BVIP based on right parts. Shishir et al. [31] designed an Android app
computer vision algorithms. Similarly, Vyavahare et al. [16] that can capture images and analyze them for image and
used a combination of ultrasonic sensors and computer vision text recognition. B. Jiang et al. [32] designed a wearable
techniques to build a wearable assistant that can perform assistance system based on binocular sensors for the BVIP.
obstacle detection and image description. Laubhan et al. [17] The binocular vision sensors were used to capture images in
and Trent et al. [18] designed a wearable Electronic Travel a fixed frequency, and the informative images were chosen
Aid for the blind, which uses an array of ultrasonic sensors based on stereo image quality assessment (SIQA). Then the
to survey the scene. Bai et al. [19] proposed a depth image informative images were sent to the cloud for further com-
and multi-sensor-based algorithm to solve the problem of putations. Bogdan et al. [33] proposed a system composed
transparent and small obstacle avoidance. Their system uses of a pair of smart glasses with an integrated microphone
three primary audible cues to guide completely blind users and camera, a smartphone connected with the smart glasses
to move safely and efficiently. Nguyen et al. [20] developed through a host application, and a server that serves the
a way-finding system on a mobile robot helping the BVIP purpose of a computational unit. Their system was capable
user in an indoor setting. Avila et al. [21] developed a smart- of detecting obstacles in the nearest surrounding, providing
phone application that helps in localization within an indoor an estimation of the size of an object, face recognition, auto-
setting. In this system, 20 Bluetooth beacons were placed matic text recognition, and question answering of a particular
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
input image. Pei et al. [34] proposed a visual image aid for Android OS.
vocalizing the information of objects near the user. Since its release, researchers have used the device to
Some researchers have designed their smart glasses to design systems to solve many real-life problems. Jiang et
develop applications that assist visually impaired people. al. [39] proposed a Google Glass application that is used
Chang et al. [35] and Chen et al. [36] proposed an assis- for food nutrition information retrieval and visualization. On
tive system comprising wearable smart-glasses, an intelligent similar grounds, Li et al. [40] developed a Google Glass
walking stick, a mobile device app, and a cloud-based in- application that can be used to assess the uniqueness and aes-
formation management platform used to achieve the goals thetics of a food dish by analyzing its image for visual appeal,
of aerial obstacle avoidance and fall detection goals for the color combinations, and appearance. A few of the researchers
BVIP. The intelligent walking stick provides feedback to the have used the device in the medical field to treat children with
user with the help of vibrations to warn the user of obstacles. Autism Spectrum Disorder (ASD). For instance, Washington
Furthermore, when the user experiences a fall event, an et al. [41] [42] developed a Google Glass-based system for
urgent notification is immediately sent to family members automatic facial expression recognition, delivering real-time
or caregivers. In the realm of wearable intelligent glasses, social cues to children with ASD, thus improving their social
Chang et al. [37] and Chen et al. [38] have also proposed a behavior.
drowsiness-fatigue-detection system to increase road safety. Lv et al. [43] developed a touch-less interactive augmented
The system consists of wearable smart-glasses, an in-vehicle reality game using Google Glass. Wang et al. [44] presented
infotainment telematics platform, an onboard diagnostics-II- a navigation strategy for NAO humanoid robots via hand
based automotive diagnostic bridge, a rear light alert mech- gestures based on global and local live videos displayed
anism in an active vehicle, and a cloud-based management on Google Glass. Similarly, Wen et al. [45] developed a
platform. The system is used to detect drowsiness and fatigue Google Glass-based system to achieve hands-free remote
in a driver in real-time. When detected, the active vehicle real control of humanoid robots. Xu et al. [46] used the device
light alert mechanism will automatically be flickered to alert to facilitate intelligent substation inspection by using virtual
following vehicles, and warning messages will be played to video and real-time data demonstration. Widmer et al. [47]
alert the driver. developed a medical information search system on Google
Although many systems have been proposed and devel- Glass by connecting it to a content-based medical image
oped to assist the visually impaired, their practical usabil- retrieval system. The device takes a photo and sends it along
ity is very limited due to the application’s wearability and with keywords associated with the image to a medical image
portability. In this era of high-end consumer electronics, retrieval system to retrieve similar cases, thus helping the user
where multiple sensors are embedded in light, highly portable make an informed decision.
smart glasses such as the Google Glass, it is possible to Devices such as Microsoft Kinect and Google Glass have
design an application that addresses the usability concerns also been used to help visually impaired people. For instance,
faced by previous applications while also providing real-time Lausegger et al. [48] developed a Google Glass application
responses to complex problems such as scene recognition and to help people with color vision deficiency or color blindness.
object detection. Therefore, in this paper, a Google Glass Anam et al. [49] developed a dyadic conversation aid using
based real-time visual assistant is proposed for the BVIP. Google Glass for the visually impaired. Hwang et al. [50]
The rest of the paper is organized as follows. Section II implemented an augmented vision system on Glass, which
describes related work done by other researchers on Google overlays edge information over the wearer’s real-world view,
Glass to solve real-world social problems. In Section III, the to provide contrast-improved central vision to the user. They
proposed application is presented, along with explaining the used a combination of positive and negative laplacian filters
different design choices. Here, the merits of the proposed for edge enhancement. Neto et al. [51] proposed a wearable
application are explained in detail. The various steps involved face recognition system to aid the visually impaired in real-
in using the application are also provided. In Section IV, the time. Their system uses a Kinect sensor to acquire an RGB-D
results of the proposed work and the feedback obtained by image and run an efficient face recognition algorithm. Simi-
the BVIP users are presented. Finally, the conclusion is given larly, Takizawa et al. [52] proposed Kinect cane - an assistive
in Section V. system for the visually impaired based on the concept of
object recognition.
II. RELATED WORK Kim et al. [53] performed a systematic review of the ap-
Google Glass is a brand of smart glasses with a prism plications of smart glasses in various applied sciences, such
projector for display, a bone conduction transducer, a mi- as healthcare, social science, education, service, industry, and
crophone, accelerometer, gyroscope, magnetometer, ambient computer science. Their study shows a remarkable increase
light sensor, proximity sensor, a touchpad, and a camera. It in the number of published papers on the application of
can connect to other devices using a Bluetooth connection, a smart glasses since the release of Google Glass. Further,
micro USB, or a Wi-Fi connection. Application development they claimed that the research has been steadily increasing
for the device can be done using the Android development as of 2021. With this, it can be concluded that Google Glass
platform and toolkit available for mobile devices running has been extensively used for designing applications to solve
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
this dataset. Finally, the Vision API’s precision and accuracy of a cloud-based API, complex vision algorithms can
were compared against other state-of-the-art models run on a be run on the image with almost real-time responses
cloud-based intelligent server. Based on the performance of since the algorithm runs on powerful machines on
the Vision API and the superior usability of Google Glass, the cloud. The use of cloud-based APIs prevents the
the proposed application was designed using them. need to carry a bulky computer for processing, thereby
boosting the system’s portability. The API can catego-
A. PROPOSAL rize the image into 86 categories and can be trained
Some of the significant issues that restrict the usability of on custom datasets. It can further assign tags to the
most wearable assistance systems were identified during the image and generate captions describing the contents in
literature survey. Firstly, the size and weight of the sensors human-readable sentences.
used in the system directly impact the long-term wearability, A comparison of the proposed approach with existing as-
portability, and hence, the usability of the system without sistive systems for the BVIP is shown in Table 1. The usabil-
causing health hazards to the user. El-taher et al. [13] empha- ity and functionality provided by the various applications are
sized the importance of portability or weight and wearability also shown. There are no applications that use Google Glass
of the device used for assisting the visually impaired person for scene description tasks in real-time on Indian scenarios.
in their review of urban navigation systems for the visually Further, the proposed application provides better portability
impaired. Secondly, one of the most critical factors that must and wearability in scene description tasks while providing a
be considered while designing a system for the disabled is an real-time response and a completely hands-free interaction
intuitive human-computer interaction interface. The system interface. The key contributions of the proposed work are,
must be designed such that it is easy to use with minimal
• The development of an augmented reality application
user training. Finally, the response time from the source of
for real-time scene description using Google Glass as
computation must be close to real-time. Achieving real-time
an edge device and Azure Vision API for the BVIP.
performance on a smart glass is very challenging since the al-
• The creation of an annotated image dataset consisting
gorithm’s complexity directly impacts the device’s response
of objects used by the BVIP in Indian scenarios and
time unless the algorithm runs on a powerful machine, which
environments. The annotations correspond to the 86
is heavy and bulky and hence not portable. On the other
class labels supported by the Vision API.
hand, reducing the complexity of the algorithm leads to less
• Optimizing the performance of the Vision API by using
accurate results. Therefore, it is essential to consider using
the newly created annotated image dataset and using the
cloud computing platforms with a fast response time for such
custom vision4 option provided by the Vision API.
systems. The following design choices are used to address the
problems mentioned above, thereby improving the usability Fig. 2 gives an overview of the proposed approach and the
of the proposed visual assistant system. various components involved in it. The BVIP user wearing
Google Glass captures the image of his/her surrounding by
1) Google Glass is selected as the core of the visual
using the camera present on the device with the help of the
assistant system. The camera present in the device
voice command - "OK, Glass; Describe Scene." The captured
captures images of the surroundings, which are sent
image is compressed and sent via a Wi-Fi connection to the
to a mobile app for further processing. Most of the
smartphone device of the user. Upon receiving the image,
previous applications that serve the purpose of visual
the smartphone app decompresses the image and invokes
assistants have bulky sensors and cameras attached,
the Vision API to generate captions and identify the various
which are difficult to wear and are not portable. Given
objects in the image. The smartphone app then processes
the superior portability, wearability, and flexibility of
the API’s response to extract the captions and the objects
the device, the use of Google Glass will significantly
identified. This text response is sent back to Google Glass via
improve the usability of such systems.
the same wifi connection. Finally, Android’s text to speech
2) The application is designed to have a very intuitive
API is used to convert the text response into sound using
interaction interface. Users can interact with it using
the bone conduction transducer present in the device. In the
a voice command that triggers the camera to capture
following subsections, the proposed application development
an image and send it to the mobile app and the Vision
methodology and the user-system interaction design is de-
API for further processing. The result from the Vision
scribed in detail.
API is sent back to the Google Glass device, which
is then converted to sound using the bone conduction
B. PROPOSED APPLICATION DEVELOPMENT
transducer. The completely hands-free, voice-activated
METHODOLOGY
approach leads to superior user-system interaction and
helps keep the user as unrestricted as possible while According to the official documentation by Google5 , the
using the application. three major design patterns for developing software on
3) The Custom Vision API provided by Azure Cognitive 4 https://azure.microsoft.com/en-us/services/cognitive-services/custom-
Services is used for performing the necessary computa- vision-service/overview
tion on the image captured by the device. With the help 5 https://developers.google.com/glass/design/patterns
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
TABLE 1. Comparison of the proposed application with existing assistive applications for the BVIP
Literature Source of Sensors Used Functionality Provided Usability (portability, wearability and output inter-
Compute face)
Mauro et al. Smartphone Bluetooth Beacons Auditory information is communicated Highly portable and wearable since the system
2015 [21] about the nearest point of interest when the comprises only a smartphone. Auditory informa-
user is close to a Bluetooth beacon placed tion is communicated using earplugs worn by the
in different points of interest. Helps in nav- user
igation and providing spatial awareness in
indoor settings
Barney et al. Arduino Ultrasound Sensors 3D sound is generated to give the user a Moderately portable and wearable as the system
2017 [30] sense of the distance of the objects around comprises an ultrasound sensor on a smart-glass,
him/her. Useful for navigation in indoor an Arduino for computing the distance of the
settings surrounding objects, and a smartphone. Earphones
are used to render the generated sound.
Jiang et al. Cloud 2 sets of CCD cam- Object detection using convoluted neural Moderate level of portability and wearability as
2019 [32] eras and a semiconduc- networks running on a cloud-based plat- the system requires to be calibrated for effective
tor laser form binocular image acquisition. Moving the setup
around might require re-calibration.
Bai et al. 2017 CPU and Mi- Eyeglasses, depth Obstacle avoidance in an indoor environ- Low portability and wearability since the user
[19] croprogrammed camera, ultrasonic ment with the help of depth and ultrasonic must carry the CPU and MCU everywhere. The
Control Unit rangefinder and AR sensors user is provided with auditory cues to avoid obsta-
(MCU) glasses cles.
Neto et al. Laptop computer Microsoft Kinect, gy-
Face detection and recognition using an Low portability and wearability due to a laptop
2016 [51] roscope, compass sen-
efficient face recognition algorithm based computer and a Microsoft Kinect, both of which
sor, IR depth sensor,
on HOG, PCA, and K-NN. 3D audio is are heavy and bulky. Good response interface with
stereo headphones generated on face recognition as the user- the help of 3D sound in the direction of the person
response. Microsoft Kinect is used to cap- identified in the image.
ture the RGB-D image of the person.
Pintado et al. Raspberry Pi Raspberry Pi Camera Shopping Assistant- Object recognition Moderate portability and wearability since the
2019 [11] Module V2 and price extraction using convolutional user must carry a Raspberry Pi used as the com-
neural networks (CNN) running on a puting source for running the CNN. It has very
Raspberry Pi high latency, which can significantly reduce the
practical usage of the application
Pēgeot et al. Laptop computer Head mounted color Scene text detection and tracking using Low portability and wearability since the user
2012 [27] camera Optical Character Recognition must carry a laptop computer for running the OCR
algorithm. The user also needs to wear a head-
mounted color camera for capturing images. The
identified text is output as an audio signal using a
text-to-speech library.
Takizawa et al. Laptop computer Microsoft Kinect and a To recognize a pre-trained set of fixed 3D Low long-term portability and wearability since
2019 [52] tactile feedback device objects in the surrounding. The system the user must carry a Microsoft Kinect and a
on a cane also provides the user with instructions to laptop computer for processing. Vibratory cues are
find the 3D object provided to the user to help with finding the 3D
object.
Chang et al. Intelligent waist Camera, time-of-flight Zebra crossing safety for the visually im- Moderate long-term portability and wearability
2021 [25] mounted device laser-ranging module, paired since the user must carry a waist-mounted device
6-axis motion sensor, everywhere. Audio feedback is provided to the
GPS module, LPWAN user with the help of Bluetooth earphones.
module
Chang et al. IR sensors, 6 axis IR sensors, vibration Aerial object detection using IR sensor Highly portable and wearable as the compute is
2020 [35] and gyroscope and motor, LPWAN mod- data by calculating distance using the tri-
performed by sensors on the smart glasses and
Chen et al. accelerometer ule, 6 axis gyroscope angulation method, fall detection using the
intelligent cane. High reliability due to the pres-
2019 [36] in smart glasses and accelerometer six-axis gyroscope and accelerometer in ence of a notification mechanism in case of fall
and intelligent smart glasses, and intelligent cane and no-
detection. Vibratory cues signal the presence of
cane tification system in case of fall detection
aerial obstacles in front of the user.
Chang et al. AI-based Camera on smart Drug pill recognition for the visually im-
Moderately portable as the user must carry the
2019 and 2020 intelligent drug glasses, drug pill paired intelligent drug pill recognition box. High wear-
[7] [8] [9] pill recognition recognition box with ability as the images are captured with the help of
box with a pre- wifi-capabilities smart glasses. Audio signals are generated to pro-
trained deep vide reminders to the user and inform the correct
learning model or incorrect identification of drugs.
Proposed Azure Vision Google Glass having Image captioning and object detection in Highly portable and wearable as the only de-
Method API used a camera, a micro- real time vices that the user must carry is a smartphone
to generate phone, a bone conduc- and Google Glass. Audio output is produced with
captions and tion transducer, Wi-Fi the help of a bone conduction transducer which
identify objects capability and more prevents the obstruction of external sound. Com-
on the image pletely hands-free application with voice com-
mand capabilities
6 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
Google Glass, also called Glassware, are Ongoing Task, is received in a JSON format by the Cognitive Services
Periodic Notifications, and Immersions. Ongoing tasks are API interface built on the smartphone app. JSON stands
long-running applications that remain active even when users for JavaScript Object Notation. It is an open standard data
switch focus to a different application within the device. A interchange format that uses human-readable text to store and
stopwatch app is an excellent example of an ongoing task. transmit data objects consisting of attribute-value pairs and
Users can switch to a different application while running arrays. The smartphone app processes the JSON response to
the stopwatch app without stopping the timer. The Periodic extract the captions and the objects identified in the image.
Notifications design pattern is used to develop applications The processed response is then sent back to the client-side ap-
where the user is notified of any new information to be plication on Google Glass over the same socket connection.
displayed. Examples of applications that use the Periodic Finally, on receiving the text response from the smartphone,
Notification design pattern include a news app, an SMS the app on the Google Glass device uses a text to speech
reader, or an email reader. The Immersion design pattern is API provided by Android to convert the text to audio signals,
used whenever the application requires complete control of which is rendered as sound by using the bone conduction
the user experience. These applications stop when the user speaker present on the device. The BVIP user hears this
switches focus to a different app. Any gaming application is sound output.
an excellent example of an Immersion Pattern. The proposed The version of Glass used in developing the proposed
visual assistant requires complete control of the user expe- system is the Glass Explorer Edition. It comes with a cus-
rience, and hence the Immersion Pattern is chosen to design tom Glass OS and Software Development Kit developed by
the application. Google. Glass OS or Google XE is a version of Google’s
The system design diagram is shown in Fig. 3. The system Android operating system designed for Google Glass. The
can be divided into three major sections: the app on the operating system version on the Explorer Edition device was
Google Glass device, the smartphone, and the Vision API. upgraded from XE 12 to XE 23 since Android Studio, the in-
The BVIP user interacts directly with the app on Google tegrated development environment (IDE) used for developing
Glass. On receiving a user voice command, the camera image the app, supports XE 23, and the SDK documentation avail-
handler built into the app uses the camera present on the able online is also for XE18+. The OS version was upgraded
smart glasses to capture the image of the user’s surroundings. by flashing the device, which was done by programming the
This image is compressed and then sent to the smartphone bootloader of the Glass.
using a socket connection over the internet. The image is Kivy6 , an open-source python library, was used for de-
compressed to reduce the size of the data to be sent over veloping the socket server application on the smartphone.
the internet, thereby reducing the application’s response time. It is a cross-platform library for the rapid development of
Socket programming is a way of connecting two nodes. One applications that make use of innovative user interfaces. It
node (server) listens on a particular port at an IP, while the can run on Windows, OS X, Android, iOS, and Raspberry
other node (client) reaches out to the server node on the same Pi. Hence, the server-side of the application can be started on
port to form a connection. In this system, the application on any smartphone, laptop computer, or Raspberry Pi. However,
the Google Glass is the client, and the application on the to increase portability and ease of use, smartphones were
smartphone forms the server side of the socket connection. chosen for the proposed system. The Azure Vision API used
to identify the various objects and generate captions of the
Upon receiving the image from Google Glass, the server- captured image provides excellent results in real-time. It can
side application on the smartphone decompresses the image. be used to categorize objects into 86 different categories.
The captions of the decompressed image are then gener-
ated by using the Vision API. The response from the API 6 https://kivy.org/#home
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
TABLE 2. Evaluation metrics of the Azure Vision API against Flickr8K and Microsoft COCO datasets
configured such that it automatically turns on whenever the image of the surroundings in front of the user. The
user wears the device. The Home screen shown in Fig. 5 captured image is shown in Fig. 8.
is displayed as soon as the user wears the device. Once the
device is worn, the following steps are to be followed.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
C. DATA AUGMENTATION TABLE 6. COCO object detection results comparison using different
frameworks and network architectures vs Azure Custom Vision API. mAP is
The collected data for training contains 86 different class reported with COCO primary challenge metric (AP at IoU=0.50:0.05:0.95)
labels with a minimum of 3 images for each category, and
for a few class labels such as keys, remote, medicine, mobile Vision Framework Model Used mAP Billion Million Pa-
phone, prescription glasses, and umbrella, more than 50 im- Mult-Adds rameters
Azure Custom Vi- - 26.33% 116 37.43
ages per class are collected and annotated. Though the total sion API
number of images considered is more than 540, the variants deeplab-VGG 21.10% 34.9 33.1
of images considered are fewer as the images are taken from SSD 300 Inception V2 22.00% 3.8 13.7
MobileNet 19.30% 1.2 6.8
cameras with three different angles, i.e., front view, side view, VGG 22.90% 64.3 138.5
and top view. Only these three angles were considered since Faster-RCNN 300 Inception V2 15.40% 118.2 13.3
all other variants can be generated using data augmentation. MobileNet 16.40% 25.2 6.1
VGG 25.70% 149.6 138.5
So, data augmentation is used to increase the training data Faster-RCNN 600 Inception V2 21.90% 129.6 13.3
tenfold, thereby increasing its robustness [58], [59]. The data MobileNet 19.80% 30.5 6.1
augmentation techniques used are given below. Furthermore,
the augmentation values used for the data augmentation are TABLE 7. Comparison of Accuracy Results of different models vs Azure
given in Table 5. After data augmentation, the total number Custom Vision API
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
Hypothesis 4:
Null Hypothesis: A visually impaired person would not
prefer to use the application.
Alternate Hypothesis: A visually impaired person would
prefer to use this application every day.
1) What do you use to walk in and around your neighbor-
hood? Cane, Guide, Other.
2) Would you prefer a guide or would you rather walk
alone?
3) Do you have access to the internet in your area?
4) How would you rate the response time of the applica-
tion?
5) How likely are you to use this application every day?
6) How comfortable are you with the audio-based inter- FIGURE 13. Hypothesis 2 Question 2
face?
7) Are you able to hear the output from the device?
8) How well do you think the description given by the
application matches the actual scene (As described by
the guide)?
9) Would you prefer voice-based or touch-based naviga-
tion?
A Likert Scale Analysis on the usability of the applica-
tion was performed using user feedback and responses to
the above questions. The following pie charts depict the
responses to some of the questions received from the users.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
use the device, application’s impact in scene understanding personalized such that the voice output has a regional accent.
and the inclination to use this application every day. Also, The users explained that this would help make the appli-
there was no significant difference in severity of blindness cation feel more personalized to the user, given that Indian
for the four hypothesis categories (χ2 = 10.24, d.f. = 2, accents are now available on various electronic gadgets. One
P = 0.15). However, we found a significant difference for BVIP user commented on the audio output being affected
the intellectual level (χ2 = 36.11, d.f. = 3, P = 0.001) as in boisterous environments, such as noisy traffic junctions
the students with borderline and mental retardation found or construction sites. However, this problem was mitigated
it difficult to understand and use the application. For this by switching to Bluetooth earphones instead of the bone
complete statistical analysis, the significance level was set conduction transducer, in which case the audio output was
with P < 0.05. not affected by external sounds. The BVIP users explained
that the application on Google Glass provides more comfort
F. SIGNIFICANCE and usability when compared with smartphone apps for the
The proposed approach was compared with existing state- visually impaired.
of-the-art solutions, and the details are shown in Table 1.
Previously, devices like smartphones, Bluetooth beacons, and V. CONCLUSION
Raspberry Pi were used to develop diverse solutions. The The use of Google Glass to assist the BVIP community is
proposed approach attempts to tackle the problem by using demonstrated by developing an application that acts as a
Google Glass. Keeping BVIP users in mind, the application visual assistant. The system is designed to be highly portable,
was designed to be wholly audio-based, and the user does not easy to wear, and works in real-time. The experimental
require any visual cues to use the device. Another significant results of the Azure Vision API show a mean Average Pre-
improvement is that the user does not need to carry any bulky cision value (mAP) of 29.33% on the MS COCO dataset and
hardware while using the proposed system. The hardware an accuracy of 73.1% on the ImageNet dataset. A dataset
used here is Google Glass, which is very similar to any of 5000 newly annotated images is created to improve the
regular reading glasses in size and shape, and a smartphone performance of scene description in Indian scenarios. The
which makes the application highly portable and easy to use. Custom Vision API is trained and tested on the newly created
Hence, the proposed system is highly wearable, portable, and dataset, and it is observed that it increases the overall mAP
provides accurate results in real-time. from 63% to 84% with IoU > 0.5 for the created dataset.
A Likert Scale Analysis on the usability of the applica- The overall response time of the proposed application was
tion was performed. Positive feedback and response were measured and is less than 1 second, thereby providing accu-
received from the users, as shown in the charts in Figures rate results in real-time. The proposed application describes
12, 13, and 14. It can be concluded from the response that the scene and identifies the various objects present in front
the application can be used effortlessly on a daily basis to of the user. It was tested on the BVIP, and their response
understand the BVIP user’s surroundings. It can be further and feedback were recorded, and a Likert scale analysis was
concluded that the BVIP require minimal to no training to performed. From the analysis, it can be concluded that the
use the device, and they prefer to use the application as a proposed system has an excellent potential to be used as an
visual assistant. assistant for the BVIP.
The computer vision API from Azure Cognitive Services
G. LIMITATIONS can add more functionalities to the proposed application.
While testing, certain limitations of the application were The capabilities of other APIs can be explored to add more
identified. Firstly, the proposed system is highly dependent functionalities such as text extraction and reading using Read
on a strong internet connection and works if and only if there API and face detection and recognition using Face Service8 .
is an internet connection available in the area. The latency The application can be enhanced by adding more features,
of the application was found to vary significantly due to such as lane detection, fall detection, pit detection, obstacle
fluctuations in the network speed. Secondly, the device is avoidance, and shopping assistant, thereby creating a one-
relatively expensive in developing countries and is not easily stop assistant for the BVIP. Google Glass has embedded
affordable. Finally, the battery on the Google Glass was able sensors that can achieve these functionalities with little to
to run only for 4 hours per charge while using the application no need for external sensors. Further, there exists a possi-
continuously. However, the short runtime problem can be bility of moving the application entirely to Google Glass
overcome by adding an external power pack. by removing the dependency on the smartphone. Currently,
A few other improvements were identified while collecting the smartphone device is used to process the captured image
feedback from the BVIP students and teachers. For instance, before making the API calls to the Custom Vision API, which
a few users commented on the ability of the device to un- can be avoided by using the Android SDK for Vision API9
derstand regional accents, stating that the voice command directly on Google Glass.
was not recognized in certain instances. The statistics are
shown in Fig. 13. Another feedback received on very similar 8 https://azure.microsoft.com/en-us/services/cognitive-services/face/
grounds was that the audio output from the device can be 9 https://github.com/microsoft/Cognitive-Vision-Android
VOLUME 4, 2016 15
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
DECLARATION
The experimental procedure and the entire setup, including
Google Glass given to the participants, were approved by
the Institutional Ethics Committee (IEC) of NITK Surathkal,
Mangalore, India. The participants were also informed that
they had the right to quit the experiment at any time. The
collected data, i.e., video recordings, audio, and the written
feedback of the subjects, was taken only after they gave
written consent for the use of their collected data for the
research experiment.
.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
D. HYPOTHESIS 4
Null Hypothesis: A visually impaired person would not
prefer to use the application.
Alternate Hypothesis: A visually impaired person would FIGURE 23. Hypothesis 4 Question 3
prefer to use this application every day.
The final set of questions were asked to determine if
the visually impaired person would prefer to use the ap-
plication. Various questions were asked to evaluate this
hypothesis as can be seen from Figs. 21 to 29. The ques-
tions were asked to determine the current lifestyle of the
visually impaired individuals and if the use of the applica-
tion would help them in better scene analysis. From their
responses, it can be concluded that the majority of the users
found the application effective, portable, and easy to use.
VOLUME 4, 2016 17
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
REFERENCES
[1] Eric R. Jensen. Brain-based learning: The new paradigm of teaching. 2008.
[2] R. S. FIXOT. American journal of ophthalmology. 1957.
[3] Van C. Lansingh. Vision 2020: The right to sight in 7 years? 2013.
[4] Nicholas Bradley and Mark Dunlop. An experimental investigation
into wayfinding directions for visually impaired people. Personal and
Ubiquitous Computing, 9:395–403, 11 2005.
[5] F. Battaglia and G. Iannizzotto. An open architecture to develop a
handheld device for helping visually impaired people. IEEE Transactions
on Consumer Electronics, 58(3):1086–1093, 2012.
[6] María Meza-de Luna, Juan Terven, Bogdan Raducanu, and Joaquín Salas.
18 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
A social-aware assistant to support individuals with visual impairments [28] Luis González, Luis Serpa, Kevin Calle, A. Guzhnay-Lucero, Vladimir
during social interaction: A systematic requirements analysis. Interna- Robles-Bykbaev, and M. Mena-Salcedo. A low-cost wearable support
tional Journal of Human-Computer Studies, 122, 08 2018. system for visually disabled people. pages 1–5, 11 2016.
[7] Wan-Jung Chang, Liang-Bi Chen, Chia-Hao Hsu, Jheng-Hao Chen, Tzu- [29] Alvise Memo and Pietro Zanuttigh. Head-mounted gesture controlled
Chin Yang, and Cheng-Pei Lin. Medglasses: A wearable smart-glasses- interface for human-computer interaction. Multimedia Tools and Appli-
based drug pill recognition system using deep learning for visually im- cations, 77, 12 2016.
paired chronic patients. IEEE Access, 8:17013–17024, 2020. [30] Michael Barney, Gilmar Brito Jonathan Kilner, Aida Araújo, and Meuse
[8] Wan-Jung Chang, Yue-Xun Yu, Jhen-Hao Chen, Zhi-Yao Zhang, Sung- Nogueira. Sensory glasses for the visually impaired. pages 1–2. ACM, 04
Jie Ko, Tsung-Han Yang, Chia-Hao Hsu, Liang-Bi Chen, and Ming-Che 2017.
Chen. A deep learning based wearable medicines recognition system [31] Md Shishir, Shahariar Fahim, Fairuz Habib, and Tanjila Farah. Eye
for visually impaired people. In 2019 IEEE International Conference assistant : Using mobile application to help the visually impaired. pages
on Artificial Intelligence Circuits and Systems (AICAS), pages 207–208, 1–4, 05 2019.
2019.
[32] B. Jiang, J. Yang, Z. Lv, and H. Song. Wearable vision assistance system
[9] Wan-Jung Chang, Liang-Bi Chen, Chia-Hao Hsu, Cheng-Pei Lin, and Tzu-
based on binocular sensors for visually impaired users. IEEE Internet of
Chin Yang. A deep learning-based intelligent medicine recognition system
Things Journal, 6(2):1375–1383, April 2019.
for chronic patients. IEEE Access, 7:44441–44458, 2019.
[33] Oleksandr Bogdan, Oleg Yurchenko, Oleksandr Bailo, François Rameau,
[10] P. A. Zientara, S. Lee, G. H. Smith, R. Brenner, L. Itti, M. B. Rosson, J. M.
Donggeun Yoo, and Inso Kweon. Intelligent Assistant for People with
Carroll, K. M. Irick, and V. Narayanan. Third eye: A shopping assistant
Low Vision Abilities, pages 448–462. 02 2018.
for the visually impaired. Computer, 50(2):16–24, 2017.
[11] D. Pintado, V. Sanchez, E. Adarve, M. Mata, Z. Gogebakan, B. Cabuk, [34] Soo-Chang Pei and Yu-Ying Wang. Census-based vision for auditory
C. Chiu, J. Zhan, L. Gewali, and P. Oh. Deep learning based shopping depth images and speech navigation of visually impaired users. IEEE
assistant for the visually impaired. In 2019 IEEE International Conference Transactions on Consumer Electronics - IEEE TRANS CONSUM ELEC-
on Consumer Electronics (ICCE), pages 1–6, 2019. TRON, 57:1883–1890, 11 2011.
[12] Pablo-Alejandro Quinones, Tammy Greene, Rayoung Yang, and Mark [35] Wan-Jung Chang, Liang-Bi Chen, Ming-Che Chen, Jian-Ping Su, Cheng-
Newman. Supporting visually impaired navigation: A needs-finding study. You Sie, and Ching-Hsiang Yang. Design and implementation of an
pages 1645–1650, 05 2011. intelligent assistive system for visually impaired people for aerial obstacle
[13] Fatma El-zahraa El-taher, Ayman Taha, Jane Courtney, and Susan Mc- avoidance and fall detection. IEEE Sensors Journal, 20(17):10199–10210,
keever. A systematic review of urban navigation systems for visually 2020.
impaired people. Sensors, 21(9), 2021. [36] Liang-Bi Chen, Jian-Ping Su, Ming-Che Chen, Wan-Jung Chang, Ching-
[14] J.-H Lee, Dongho Kim, and B.-S Shin. A wearable guidance system with Hsiang Yang, and Cheng-You Sie. An implementation of an intelligent
interactive user interface for persons with visual impairment. Multimedia assistance system for visually impaired/blind people. In 2019 IEEE
Tools and Applications, 75, 11 2014. International Conference on Consumer Electronics (ICCE), pages 1–2,
[15] Ruxandra Tapu, Bogdan Mocanu, and Titus Zaharia. A computer vision- 2019.
based perception system for visually impaired. Multimedia Tools and [37] Wan-Jung Chang, Liang-Bi Chen, and Yu-Zung Chiou. Design and im-
Applications, 76, 05 2016. plementation of a drowsiness-fatigue-detection system based on wearable
[16] P. Vyavahare and S. Habeeb. Assistant for visually impaired using smart glasses to increase road safety. IEEE Transactions on Consumer
computer vision. In 2018 1st International Conference on Advanced Electronics, 64(4):461–469, 2018.
Research in Engineering Sciences (ARES), pages 1–7, June 2018. [38] Liang-Bi Chen, Wan-Jung Chang, Jian-Ping Su, Ji-Yi Ciou, Yi-Jhan Ciou,
[17] Kevin Laubhan, Michael Trent, Blain Root, Ahmed Abdelgawad, and Cheng-Chin Kuo, and Katherine Shu-Min Li. A wearable-glasses-based
Kumar Yelamarthi. A wearable portable electronic travel aid for blind. drowsiness-fatigue-detection system for improving road safety. In 2016
pages 1999–2003, 03 2016. IEEE 5th Global Conference on Consumer Electronics, pages 1–2, 2016.
[18] Michael Trent, Ahmed Abdelgawad, and Kumar Yelamarthi. A smart [39] Haotian Jiang, James Starkman, Menghan Liu, and Ming-Chun Huang.
wearable navigation system for visually impaired. pages 333–341, 07 Food nutrition visualization on google glass: Design tradeoff and field
2017. evaluation. IEEE Consumer Electronics Magazine, 7:21–31, 05 2018.
[19] J. Bai, S. Lian, Z. Liu, K. Wang, and D. Liu. Smart guiding glasses for [40] Ying Li and Anshul Sheopuri. Applying image analysis to assess food
visually impaired people in indoor environment. IEEE Transactions on aesthetics and uniqueness. In 2015 IEEE International Conference on
Consumer Electronics, 63(3):258–266, 2017. Image Processing (ICIP), pages 311–314. IEEE, 2015.
[20] Quoc-Hung Nguyen, Vu Hai, Thanh-Hai Tran, and Quang-Hoan Nguyen. [41] Peter Washington, Catalin Voss, Nick Haber, Serena Tanaka, Jena Daniels,
Developing a way-finding system on mobile robot assisting visually im- Carl Feinstein, Terry Winograd, and Dennis Wall. A wearable social
paired people in an indoor environment. Multimedia Tools and Applica- interaction aid for children with autism. In Proceedings of the 2016 CHI
tions, 76, 01 2016. Conference Extended Abstracts on Human Factors in Computing Systems,
[21] Thomas Kubitza Mauro Avila. Assistive wearable technology for visually pages 2348–2354. ACM, 2016.
impaired. pages 940–943. ACM, 04 2014.
[42] Peter Washington, Dennis Wall, Catalin Voss, Aaron Kline, Nick Haber,
[22] Joey van der Bie, Britte Visser, Jordy Matsari, Mijnisha Singh, Timon Van
Jena Daniels, Azar Fazel, Titas De, Carl Feinstein, and Terry Winograd.
Hasselt, Jan Koopman, and Ben J A Kröse. Guiding the visually impaired
Superpowerglass: A wearable aid for the at-home therapy of children with
through the environment with beacons. pages 385–388. ACM, 09 2016.
autism. Proceedings of the ACM on Interactive, Mobile, Wearable and
[23] João Guerreiro, Daisuke Sato, Dragan Ahmetovic, Eshed Ohn-Bar, Kris
Ubiquitous Technologies, 1:1–22, 09 2017.
Kitani, and Chieko Asakawa. Virtual navigation for blind people: Trans-
[43] Zhihan Lv, Alaa Halawani, Shengzhong Fen, Shafiq Réhman, and Haibo
ferring route knowledge to the real-world. International Journal of Human-
Li. Reprint: Touch-less interactive augmented reality game on vision based
Computer Studies, page 102369, 10 2019.
wearable device. Personal and Ubiquitous Computing, 04 2015.
[24] Robert-Gabriel Lupu, Oana Mitrut, , Andrei Stan, Florina Ungureanu, Kyr-
iaki Kalimeri, and Alin Moldoveanu. Cognitive and affective assessment [44] Zibo Wang, Xi Wen, Song yu, Xiaoqian Mao, Wei Li, and Genshe Chen.
of navigation and mobility tasks for the visually impaired via electroen- Navigation of a humanoid robot via head gestures based on global and
cephalography and behavioral signals. Sensors, 20(20), 2020. local live videos on google glass. pages 1–6, 05 2017.
[25] Wan-Jung Chang, Liang-Bi Chen, Cheng-You Sie, and Ching-Hsiang [45] Xi Wen, Yu Song, Wei Li, Genshe Chen, and Bin Xian. Rotation vector
Yang. An artificial intelligence edge computing-based assistive system for sensor-based remote control of a humanoid robot through a google glass.
visually impaired pedestrian safety at zebra crossings. IEEE Transactions In 2016 IEEE 14th International Workshop on Advanced Motion Control
on Consumer Electronics, 67(1):3–11, 2021. (AMC), pages 203–207. IEEE, 2016.
[26] Hanlu Ye, Meethu Malu, Uran Oh, and Leah Findlater. Current and future [46] C.F. Xu, Y.F. Gong, W. Su, J. Cao, and F.B. Tao. Virtual video and real-
mobile and wearable device use by people with visual impairments. pages time data demonstration for smart substation inspection based on google
3123–3132. ACM, 08 2015. glasses. pages 5 .–5 ., 01 2015.
[27] Faustin Pégeot and Hideaki Goto. Scene text detection and tracking for a [47] Antoine Widmer, Roger Schaer, Dimitrios Markonis, and Henning Müller.
camera-equipped wearable reading assistant for the blind. volume 7729, Facilitating medical information search using google glass connected to a
pages 454–463, 11 2012. content-based medical image retrieval system. volume 2014, 08 2014.
VOLUME 4, 2016 19
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3135024, IEEE Access
Hafeez et al.: A Google Glass Based Real-Time Scene Analysis for the Visually Impaired
[48] Georg Lausegger, Michael Spitzer, and Martin Ebner. Omnicolor – a SANJEEV U RAO received his B.Tech degree in
smart glasses app to support colorblind people. International Journal of information technology from National Institute of
Interactive Mobile Technologies (iJIM), 11:161–177, 07 2017. Technology Karnataka, Surathkal, India in 2019.
[49] Asm Anam, Shahinur Alam, and M. Yeasin. Expression: A dyadic He is currently working as a software developer
conversation aid using google glass for people with visual impairments. in CitiBank, Pune. His research interests include
UbiComp 2014 - Adjunct Proceedings of the 2014 ACM International deep learning, big data, and the internet of things.
Joint Conference on Pervasive and Ubiquitous Computing, pages 211–214,
01 2014.
[50] Alex D. Hwang and Eli Peli. An augmented-reality edge enhancement ap-
plication for google glass. Optometry and Vision Science, 91:1021–1030,
2014.
[51] L. B. Neto, F. Grijalva, V. R. M. L. Maike, L. C. Martini, D. Florencio,
M. C. C. Baranauskas, A. Rocha, and S. Goldenstein. A kinect-based
wearable face recognition system to aid visually impaired users. IEEE
Transactions on Human-Machine Systems, 47(1):52–64, Feb 2017.
[52] Hotaka Takizawa, Shotaro Yamaguchi, Mayumi Aoyagi, Nobuo Ezaki, and
Shinji Mizuno. Kinect cane: An assistive system for the visually impaired SWAROOP RANGANATH received his B.Tech
based on three-dimensional object recognition. volume 19, pages 740– degree in information technology from National
745, 12 2012. Institute of Technology Karnataka, Surathkal, In-
[53] Dawon Kim and Yosoon Choi. Applications of smart glasses in applied dia in 2019. He is currently working at Wipro,
sciences: A systematic review. Applied Sciences, 11(11), 2021. Bangalore as a Data Engineer. His research inter-
[54] Micah Hodosh, Peter Young, and Julia Hockenmaier. Flickr8k dataset. ests include Deep Learning, Reinforcement Learn-
[55] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross ing, AI applications in IoT systems, advanced
Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zit- analytics in business insights, and Explainable AI.
nick, and Piotr Dollár. Microsoft coco: Common objects in context, 2015.
[56] Jacob Whitehill, Zewelanji Serpell, Yi-Ching Lin, Aysha Foster, and
Javier R. Movellan. The faces of engagement: Automatic recognition
of student engagementfrom facial expressions. IEEE Transactions on
Affective Computing, 5(1):86–98, 2014.
[57] TS Ashwin and Ram Mohana Reddy Guddeti. Affective database for e-
learning and classroom environments using indian students’ faces, hand
gestures and body postures. Future Generation Computer Systems,
108:334–348, 2020. ASHWIN T S received his B.E. degree from
[58] TS Ashwin and Ram Mohana Reddy Guddeti. Automatic detection of Visveswaraya Technological University, Belgaum,
students’ affective states in classroom environment using hybrid con- India, in 2011, and his M.Tech. degree from Ma-
volutional neural networks. Education and Information Technologies, nipal University, Manipal, India, in 2013. He re-
25(2):1387–1415, 2020. ceived his Ph.D. degree from National Institute
[59] TS Ashwin and Ram Mohana Reddy Guddeti. Unobtrusive behavioral of Technology Karnataka Surathkal, Mangalore,
analysis of students in classroom environment using non-verbal cues. India. He is currently working as a postdoctoral
IEEE Access, 7:150693–150709, 2019. fellow at IIT Bombay, India. He has more than
[60] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei- 35 reputed and peer-reviewed international confer-
Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE ences and journal publications including 5 book
Conference on Computer Vision and Pattern Recognition, pages 248–255,
chapters. His research interests include Affective Computing, Human-
2009.
Computer Interaction, Educational Data Mining, Learning Analytics, and
[61] Sinan Chen, Sachio Saiki, and Masahide Nakamura. Toward flexible and
efficient home context sensing: Capability evaluation and verification of Computer Vision applications. He is a Member of AAAC, ComSoc, IEEE,
image-based cognitive apis. Sensors, 20(5):1442, 2020. and ACM.
20 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/