2024 JETS
NAME: CHIKONDI FAITH KABUKA
AGE: 16 YEARS OLD
DATE: 18th JUNE, 2024.
CATEGORY: COMPUTER SCIENCE
SCHOOL: NCHANGA SECONDARY TRUST
SCHOOL
PROJECT TITLE: SIGN LANGUAGE DETECTION
SYSTEM
MENTOR: MR MASEBO
Table of Contents
Abstract
Introduction
Hypothesis
Statement of problem
Aims
Methodology
Results
Conclusion
Acknowledgement
Citation and Reference
Interpretations
Implications
Limitations
Recommendations
1.
Sign Language Detection Project Report
(i) Abstract
This project aims to develop a Python-based sign language detection system using the scikit-learn
(sklearn) library. The problem addressed is the need for effective communication tools for individuals
who use sign language, particularly focusing on the recognition of words and letters such as A, B, and C.
The study design involves capturing image data of these signs, preprocessing the images, and training a
machine learning model to classify them accurately. Data analysis utilizes various sklearn models,
including SVM and Random Forest, to evaluate their performance. Major findings indicate that the
model achieves a high accuracy rate for recognizing the specified letters. The implications of this work
include enhanced communication aid for the deaf community. The project concludes that machine
learning can effectively be applied to sign language detection, with future expansions to include more
letters and signs.
Keywords: Sign Language Detection, Machine Learning, sklearn, Image Classification, Communication
Aid
(ii) Introduction
The advancement of machine learning has opened new avenues for improving communication aids and
promoting inclusiveness for individuals who rely on sign language. This project focuses on developing a
system to detect and classify three specific sign language letters: A, B, and C. Leveraging Python and the
sklearn library, this work aims to lay the groundwork for a more comprehensive sign language
recognition system.
Communication is a fundamental human need, and for individuals who are deaf or hard of hearing, sign
language serves as a primary means of interaction. However, the ability to understand and
communicate using sign language is not universally shared, creating a significant communication barrier.
This project seeks to bridge this gap by employing machine learning techniques to recognize and
translate sign language into readable text, thereby facilitating better communication between sign
language users and non-users.
(iii) Hypothesis/Rationale
The hypothesis is that a machine learning model, trained on a dataset of images representing the letters
A, B, and C in sign language, can accurately classify these signs. The rationale behind this study is to
explore the feasibility and accuracy of using sklearn for such a classification task, providing a foundation
for more complex sign language recognition systems. The underlying assumption is that the distinct
visual patterns in sign language gestures can be captured and interpreted accurately by a well-trained
model, thus enabling effective recognition and translation.
The use of machine learning for sign language recognition is driven by the need for automated systems
that can assist in real-time communication. Traditional methods of learning and interpreting sign
language can be time-consuming and require significant effort. In contrast, machine learning models,
once trained, can offer quick and accurate recognition of signs, making them highly useful in practical
applications such as real-time communication aids and educational tools.
(iv) Statement of the Problem
Individuals who use sign language often face communication barriers, especially in environments where
their language is not understood. Developing an automated system to recognize sign language can
significantly enhance their ability to communicate with non-signers, promoting inclusivity and
understanding. The lack of effective tools for translating sign language into text or speech limits the
interactions of deaf individuals in many social and professional contexts. By creating a reliable sign
language detection system, this project aims to address these challenges and contribute to a more
inclusive society.
The communication gap between sign language users and non-users can lead to misunderstandings,
social isolation, and limited opportunities for deaf individuals. In educational settings, for example,
students who rely on sign language may struggle to keep up with their peers due to the lack of effective
translation tools. Similarly, in the workplace, communication barriers can hinder the professional growth
of deaf employees. An automated sign language detection system can help mitigate these issues by
providing an accessible means of communication that bridges the gap between different language users.
(v) Aims/Objectives
The primary objective is to develop and evaluate a machine learning model capable of recognizing the
sign language letters A, B, and C with high accuracy. Specific aims include:
- Capturing a dataset of images representing the letters A, B, and C.
- Preprocessing the data to ensure it is suitable for training.
- Training various sklearn models and evaluating their performance.
- Identifying the most effective model for this classification task.
The long-term goal of this project is to create a scalable and robust sign language recognition system
that can be expanded to include a broader range of signs and gestures. By focusing initially on a small
set of letters, we can develop and refine our methodologies before applying them to a more
comprehensive dataset. This iterative approach ensures that the system's accuracy and reliability are
maintained as it scales up.
(vi) Process/Methodology
1. Data Collection
Images of hands forming the letters A, B, and C in sign language are collected. The Python code captures
and saves a specified number of images from the webcam for a given number of classes, organizing the
images into separate directories for each class. This step is crucial for creating a diverse and
representative dataset that the machine learning models can learn from.
In practice, data collection involves setting up a controlled environment where participants can form the
specified letters clearly and consistently. High-quality images are essential for the accuracy of the
subsequent steps. Lighting, background, and camera resolution are carefully managed to ensure the
clarity of the captured images. Participants are instructed on how to form each letter, and multiple
images are taken to account for variations in hand positioning and individual differences.
2. Data Preprocessing
Images are resized, normalized, and converted to grayscale to reduce computational complexity. The
script processes images of hands stored in the data directory, detects hand landmarks using MediaPipe,
and saves the extracted landmark coordinates and their corresponding labels into a pickle file for later
use in tasks like machine learning or data analysis.
Preprocessing is a critical step that transforms raw image data into a format suitable for machine
learning. Resizing ensures that all images have uniform dimensions, which simplifies the training
process. Normalization adjusts the pixel values to a standard range, improving the model's ability to
learn from the data. Converting images to grayscale reduces the amount of data the model needs to
process, focusing on the essential features of the hand gestures.
Landmark detection using MediaPipe is particularly important as it identifies key points on the hand that
correspond to joints and finger tips. These landmarks provide a detailed representation of the hand's
shape and orientation, which are crucial for accurately distinguishing between different signs.
3. Model Training
Various machine learning models (e.g., SVM, Random Forest) from sklearn are trained on the
preprocessed dataset. The script captures video from a webcam, detects hand landmarks in real-time,
and uses a pre-trained model to recognize hand gestures. It displays the detected hand with a bounding
box and the predicted gesture on the video feed.
Model training involves selecting appropriate algorithms and tuning their parameters to achieve the
best performance. The dataset is split into training and testing subsets to evaluate the model's accuracy.
Cross-validation techniques are used to ensure that the model generalizes well to new, unseen data.
The training process includes several iterations of adjusting model parameters, retraining, and
evaluating performance. Metrics such as accuracy, precision, recall, and F1-score are used to assess the
effectiveness of each model. These metrics provide a comprehensive view of the model's performance,
highlighting areas for improvement.
4. Evaluation
Models are evaluated using accuracy, precision, recall, and F1-score to determine the best performing
classifier. Evaluation metrics are crucial for understanding how well the model performs on the given
task. Accuracy measures the proportion of correctly classified instances, while precision and recall
provide insights into the model's performance in detecting true positives and minimizing false positives
and false negatives. The F1-score, which is the harmonic mean of precision and recall, offers a balanced
view of the model's performance.
(vii) Findings/Results
The study found that the SVM model outperformed other classifiers, achieving an accuracy of over 95%
in recognizing the letters A, B, and C. Random Forest and other models also performed well but with
slightly lower accuracy rates. These results demonstrate the potential of sklearn-based models for sign
language detection tasks.
The high accuracy achieved by the SVM model suggests that it is particularly well-suited for this type of
classification problem. SVMs are effective in high-dimensional spaces and are particularly robust in cases
where the number of dimensions exceeds the number of samples. This makes them a good fit for image
classification tasks, where each pixel can be considered a dimension.
The Random Forest model also showed promising results, indicating that ensemble methods can
effectively handle the variability in hand gestures. Random Forests combine the predictions of multiple
decision trees, reducing the risk of overfitting and improving generalization.
Overall, the findings highlight the potential of machine learning for sign language recognition and set the
stage for future work involving larger datasets and more complex models.
(ix) Conclusion
The project successfully developed a machine learning model capable of accurately detecting three sign
language letters. This work represents a step towards creating more comprehensive sign language
recognition systems. Future work will involve expanding the dataset to include more letters and signs,
improving model accuracy, and integrating the system into real-time applications.
The success of this project demonstrates the feasibility of using machine learning for sign language
detection. The high accuracy achieved with a relatively small dataset suggests that with more data and
further refinement, the model can be extended to recognize a wider range of signs. This would
significantly enhance its practical applications, making it a valuable tool for communication and
education.
Future work will focus on several key areas. Expanding the dataset to include more letters and signs will
improve the model's versatility. Enhancing the preprocessing and feature extraction techniques will
further boost accuracy. Additionally, integrating the system into real-time applications, such as mobile
apps or web-based tools, will make it more accessible to users.
(x) Acknowledgements
We would like to thank the community of sign language users who participated in the data collection
process. Their contributions were invaluable in creating a diverse and representative dataset.
Additionally, appreciation is extended to the developers and contributors of the scikit-learn library for
providing the tools necessary for this research. Their work in developing and maintaining this
open-source library has made advanced machine learning techniques accessible to researchers and
practitioners worldwide.
(xi) Citation and Reference
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011).
Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
- American Sign Language University. (n.d.). ASL Manual Alphabet. Retrieved from
https://www.lifeprint.com/asl101/pages-layout/manualalphabet.htm
(xii) Interpretations
The high accuracy achieved suggests that machine learning is a viable approach for sign language
recognition, even with a limited dataset. The success of this project indicates that machine learning
models can effectively learn and generalize from visual patterns in sign language gestures, making them
suitable for practical applications.
(xiii) Implications
This system can significantly aid communication for individuals who use sign language, potentially being
integrated into various applications and devices. By providing an automated translation tool, this
technology can enhance interactions in educational settings, workplaces, and social environments,
promoting inclusivity and understanding.
(xiv) Limitations
The study is limited to three letters, and the dataset size is relatively small. Expanding the dataset and
the number of recognized signs is necessary for broader application. Additionally, the model's
performance may be affected by variations in lighting, background, and individual differences in hand
shapes and sizes. Addressing these limitations will be essential for developing a more robust and
generalizable system.
(xv) Recommendations
Future research should focus on increasing the dataset, including more sign language letters and words,
and optimizing the model for real-time detection in practical applications. This includes exploring more
advanced machine learning techniques, such as deep learning, which may offer improved accuracy and
robustness. Additionally, developing user-friendly interfaces and integrating the system into accessible
devices will enhance its usability and impact.