Voice Recognition System Using Machine L
Voice Recognition System Using Machine L
a r t i c l e i n f o a b s t r a c t
Article history:                                      Voice is a Special metric that, in addition to being natural to users, offers similar, if not higher, levels of
Received 30 March 2021                                security when compared to some traditional biometrics systems. The aim of this paper is to detect impos-
Accepted 5 April 2021                                 tors using various machine learning techniques to see which combination works best for speaker recog-
Available online xxxx
                                                      nition and classification. We present several methods of audio preprocessing, such as noise reduction and
                                                      vocal enhancements, to improve the audios available in real environments. Mel Frequency Cepstral
Keywords:                                             Coefficients (MFCC) are extracted for each audio, along with their differentials and accelerations, to verify
Machine learning
                                                      machine learning classification methods such as PART, JRip, Nave Bayes, RT, J48, Random Forest, and k-
Voice recognition
Naïve Bayes
                                                      Nearest Neighbor Classifiers. examine the 7 classifiers on two datasets, the extent of accuracy achieved
k-Nearest Neighbor                                    for each classifier. Among the high performance were the random forest algorithm and the naive bias
MFCC                                                  algorithm, and the weak performance of the PART algorithm.
                                                      Ó 2021 Elsevier Ltd. All rights reserved.
                                                      Selection and peer-review under responsibility of the scientific committee of the Emerging Trends in
                                                      Materials Science, Technology and Engineering.
   Finger Voice can combine what people say and how they say it                            For sixty years, research in automated speech recognition by
by two-factor authentication in a single action. Other forms of                        machines has attracted a lot of interest for a variety of reasons
identification can help with biometrics, but voice identification is                   ranging from scientific curiosity about the tools for the mechanical
needed for safe and unique authentication. Personal voice recogni-                     realization of human speech abilities to a request to automate
tion and telephone recognition are two variables that can be com-                      manageable tasks that demand human–machine interactions [6].
bined with voice [1]. Voice recognition systems are inexpensive                        In this section, some of the previous work related to this research
and easy to use. In today’s smart world, voice recognition is crucial                  will be reviewed:
in a variety of ways. Voice-activated banking, home automation,                            In 2017, the researchers have proposed a recognition systems
and voice-activated devices are only a few of the many uses for                        are implemented using both spectro-temporal features and
voice recognition [2]. The process of recognizing a person based                       voice-source features. For the i-vector process, classification is per-
on his voice signal is known as speaker recognition. Because of                        formed with two separate classifiers, and the accuracy rates are
variations in the shape of the vocal tract, the size of the larynx,                    compared. It was decided to compare the efficiency of two separate
and other sections of the voice production organs, each person’s                       speaker recognition systems. It is evident from the study that GMM
sound may be unique [3]. Since voice recognition must be con-                          performs better than i-vectors in the case of short utterances, with
ducted in a variety of environments, the features extracted must                       an accuracy of 94.33%, and that there was a substantial improve-
also be resistant to background noise and sensor mismatches [4].                       ment in the accuracy rates when concatenated test signals were
the speaker’s voice to be used to verify their identity and monitor                    used [7]. In 2018, the researcher proposed speech recognition sys-
access to services like voice dialing, telephone banking, dataset                      tem using SVM. Individual words are separated from continuous
access services, information service, voice mail, and security con-                    speeches using Voice Activity Detection (VAD). Each isolated
trol for sensitive information fields, and remote device access [5].                   word’s features were extracted, and the models were successfully
                                                                                       educated. Each individual utterance is modelled using SVM. The
                                                                                       MFCC is used to describe audio content and is measured as a col-
                                                                                       lection of features. By learning from training data, the SVM learn-
 ⇑ Corresponding author.
                                                                                       ing algorithm was used to recognize speech. The proposed audio
   E-mail address: ashraf88ashraf888@gmail.com (A. Tahseen Ali).
https://doi.org/10.1016/j.matpr.2021.04.075
2214-7853/Ó 2021 Elsevier Ltd. All rights reserved.
Selection and peer-review under responsibility of the scientific committee of the Emerging Trends in Materials Science, Technology and Engineering.
Please cite this article as: A. Tahseen Ali, H.S. Abdullah and M.N. Fadhil, Voice recognition system using machine learning techniques, Materials Today:
Proceedings, https://doi.org/10.1016/j.matpr.2021.04.075
A. Tahseen Ali, H.S. Abdullah and M.N. Fadhil                                                                                 Materials Today: Proceedings xxx (xxxx) xxx
support vector machine learning system has a strong output in 95%                       full spectrum of frequencies audible to the human ear. MP3 files,
speech recognition score, according to experimental results [8]. In                     on the other hand, are compressed and hence do not contain all of
2019, the researchers have suggested a system to recognition and                        the information that a WAV file of the corresponding audio does. Fur-
identification in Arabic speaker. It is divided into two phases                         thermore, function extraction from these WAV files is critical. This
(training and testing), each of which involves the use of audio fea-                    step serves as the foundation for the machine learning algorithms
tures (Mean, Standard Division, Zero Crossing, Amplitude). Follow-                      that will be used to classify the data. As a result, WAV files are often
ing the feature extraction, the recognition stage employs (J48, KNN,                    used in audio studies. In an audio sample, consistency in sampling
LVQ), with the Nearest Neighbor (KNN) neural network used for                           rate is critical to ensure that the extracted coefficients reflect the
data training and testing, and the LVQ neural network for Speech                        same underlying calculations. In this work two data sets were used.
Recognition and Arabic Language Identification. They had a higher                       The first voices dataset was (Prominent leader’s speeches), Includes
recognition rate of 85, 93, and 96.4% [9]. In 2020, the researchers                     audio clips of five country leaders, the second called Speaker Recog-
aim to test various pre-processing, feature extraction, and machine                     nition Audio Dataset, Contains audio clips of fifty persons. Both were
learning techniques on audios captured in unconstrained and nat-                        downloaded and the details from the kaggle website, and the details
ural settings to see which combination of these works best for                          of these data are shown in the Table 1.
speaker recognition and classification. This work is divided into
three sections: audio preprocessing, feature extraction (in which                       3.2. Pre-processing
Mel Frequency Cepstral Coefficients (MFCC) are extracted for each
audio), and machine learning classification (using the Random For-                         The preprocessing stage’s main advantage is that it organizes
est Algorithm to obtain the best classification rate for its hyper-                     the data, making the recognition task easier. All operations relating
parameter). The accuracy of using RF classifier reached 84% [10].                       to audio are referred to as ‘‘preprocessing.”
3. The proposed system architecture                                                     3.2.1. Remove noise using hamming window
                                                                                           Windowing is a method of analyzing long sound signals by
   The proposed system which is using biometrics will be recog-                         selecting a sufficiently representative segment [11]. This process
nized voice depending on machine learning system. In general,                           using to removes noise in a signal that is polluted by noise present
consist of voice records, dataset description, pre-processing, fea-                     in a wide frequency spectrum:
ture extraction and classification stages and post-processing stage.
                                                                                        y1 ðnÞ ¼ x1 ðnÞ wðnÞ; 0 6 n 6 N      1 ½11                                  ð1Þ
The proposed system architecture is as shown in Fig. 1.
                                                                                        where
3.1. Database description
                                                                                           y(n): is the product of the convolution between the input signal
  Input part is prerequisite for a voice recognition system. WAV and                       and the window function.
MP3 are the two most common audio formats currently available.                             x(n): is the signal to be convolved by the window function.
WAV files are preferred by most researchers because they span the                          w(n): usually uses window hamming which has the form.
Table 1
System’s Datasets.
  Input file Data                          Name of Dataset                                  File format                   File size                     No. of sample
  Voice Recognition                        Prominent leaders s speeches                     .Wave                         16 khz                        7500
                                           Speaker Recognition                              .Wave                         16 khz                        2226
                                                                                    2
A. Tahseen Ali, H.S. Abdullah and M.N. Fadhil                                                                    Materials Today: Proceedings xxx (xxxx) xxx
3.2.2. Smoothing spectral of speech signal using Pre-emphasis                  speaker recognition model. This is a classification problem since
   The pre-emphasis filter is needed for speech signal processing.             we want to classify audios and figure out who is speaking in them.
The pre-emphasis filter is based on the time domain input/output               As a result, the following successful supervised classification
relationship expressed in the equation below [12]. The aim of using            machine learning algorithms will be used.
this filter is to make the spectral form of the speech signal fre-
quency more smooth, this process apply by using the Eq. (2)                    3.4.1. PART
blowe:                                                                            PART is a separate-and-conquer rule learner. The algorithm
                                                                               generates ‘‘decision lists,” which are pre-determined sets of rules
yðnÞ ¼ xðnÞ         axðn     1Þ ½12                                ð2Þ
                                                                               [16]. This algorithm produces a decision list, which is an ordered
where, a is a pre-emphasis filter constant, it is usually 0.9 < a < 1.0.       set of rules. Each rule in the list is compared to new data, and
                                                                               the data is assigned to the category of the rule with the best match.
3.2.3. Signal domain transform based on (FFT)
   The Fourier series can be used to express a function with a finite          3.4.2. JRIP
time. A time series of bounded time-domain signals is converted                   One of the most common and widely used algorithms is JRip.
into a frequency spectrum using the Fourier transform [13]. This               Classes are analyzed as they grow larger, and an initial set of rules
process used to convert each frame from the time domain to the                 for the class is created using incrementally lower error rates [17].
frequency domain. This process done using Eq. (3) below:                       This algorithm used to classify all of the examples of a given data-
       N 1
                                                                               set in the training data and seeking a set of rules that apply to all
                                                                               members of that class. It then moves on to the next class and
       X
Xn ¼          xk e 2pjkn=N   ½13                                   ð3Þ
        k¼0                                                                    repeats the process until all classes have been examined.
where
                                                                               3.4.3. Naïve Bayes (NB)
                                                                                   The Naïve Bayes classifier is a straightforward probabilistic clas-
    xk: is the signal of a frame
                                                                               sifier based on Bayes’ Theorem and strict independence assump-
    X[n]: is the n frequency pattern formed by the Fourier
                                                                               tions as shown in Eq. (4), assuming that all features are equally
    transform.
                                                                               independent [18]. The feature will be assigned to the class of pos-
                                                                               terior probability using the NB classifier of the probability that the
3.3. Feature extraction
                                                                               feature belongs to a class of prior probability. The consequence of
                                                                               prediction is the class with the highest posterior probability. This
   Feature extraction is the process of calculating a collection of
                                                                               classifier predicts the test data set’s class quickly and accurately,
feature vectors that provides a compact representation of a partic-
                                                                               and it also performs well in multiclass prediction.
ular speech signal.
                                                                                          pðxjcÞpðcÞ
3.3.1. Apply Mel-Frequency Cepstral Coefficients (MFCC)
                                                                               pðcjxÞ ¼              ½18                                               ð4Þ
                                                                                             pðxÞ
   MFCC is a method that uses human hearing activity to detect
frequencies above 1 kHz. The MFCC system is focused on the fre-                where
quency differences that the human ear can detect [14]. The number
of Cepstral Coefficients was selected is 12 which results in more                 pðcjxÞ: the posterior probability of class (c, target) given predic-
complexity in the voice proposed system.                                          tor (x, attributes).
                                                                                  pðcÞ: the prior probability of class.
3.3.2. Apply vector quantization                                                  pðxjcÞ : the likelihood which is the probability of predictor given
   Quantization is an unavoidable phase in the digital representa-                class.
tion of signals for computer processing [15]. Here, it was used to                pðxÞ : the prior probability of predictor.
convert the binary matrix created by MFCC to a one-row matrix
until it was combined with the other tools’ output matrices (Fig. 2).          3.4.4. REP tree (RT)
                                                                                   The REPTree is an ensemble model of decision tree (DT) and
3.4. Proposed system classifiers                                               reduced error pruning (REP) algorithms, which is equally effective
                                                                               for classification and regression problems [19]. This algorithm uses
   Machine learning classifiers, including feature extraction tech-            knowledge gain to construct a decision tree and prunes it using
niques, are critical in assessing the overall effectiveness of the             reduced-error pruning. Since complex decision trees can lead to
                                                                               overfitting and reduced model interpretability, REP reduces com-
                                                                               plexity by eliminate leaves and branches from the DT structure.
                                                                               3.4.5. J48
                                                                                  The J48 has features such as missing values, rule derivation,
                                                                               continuous attribute value ranges, and decision tree pruning,
                                                                               among others. If possible, overfitting pruning may be used as a
                                                                               precision device [20]. This algorithm used to the creation of
                                                                               the rules for lead to the formation of a unique identity for the
                                                                               data. The aim of using this classifier is to gradually mainstream
                                                                               the decision tree until it achieves a balance between versatility
                                                                               and accuracy.
Each tree in the collection is created by choosing a small group of                                                                                   4. The proposed system implementation
features to split on at random for each node, and then determining
the best split based on these features in the training set. Each tree                                                                                    This system was implemented into two Phases as follows:
assigns a vote to that particular feature vector. For each feature
vector, the forest chooses the class with the most votes. It is simple                                                                                4.1. Training phase
to create and forecast, runs quickly on large datasets, easily esti-
mates missing data, and retains accuracy even when a large per-                                                                                           The proposed system’s first step is the training of the two data-
centage of data is missing.                                                                                                                           sets using cross-validation, where largest part of data enters to
                                                                                                                                                      training phase then the reminder data passed to testing phase
                                                                                                                                                      sec. (4.2). The dataset will be preprocessed using a hamming win-
3.4.7. k-nearest neighbor
                                                                                                                                                      dow, after which the features will be extracted and modeled using
    The KNN is a system that is based on supervised learning, which
                                                                                                                                                      MFCC and VQ, and the values of these features will be combined
allows machines to categorize objects, problems, or circumstances
                                                                                                                                                      with the features extracted to prepare for the classification algo-
based on data that has already been fed into them [22]. k is a user-
                                                                                                                                                      rithms. The mixed features are saved as reference models during
defined constant in the classification process, and an unlabeled
                                                                                                                                                      the training process. These models are then compared to the
vector (a query or test point) is categorized by assigning the mark
                                                                                                                                                      speech signals that have been entered.
that appears most frequently among the k training samples closest
to that query point. Euclidean distance was used as a distance met-
ric for continuous variables, this metric calculated according to Eq.                                                                                 4.2. Testing phase
(5) below:
                                                                                                                                                         The proposed system’s testing phase is the second phase. As
          qX
           ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi       mentioned above the reminder data will be tested after applying
                                n
dðx; yÞ ¼                       i¼1 i
                                                ða ðxÞ ai ðyÞÞ2 ½22                                                                                  the same pre-processing steps which applied on data in training
                                                                                                                                                      phase. The proposed system architecture is as shown in Algorithm
                                                                                                                                                      (1) below.
                                                                                                                                                  4
A. Tahseen Ali, H.S. Abdullah and M.N. Fadhil                                                                                  Materials Today: Proceedings xxx (xxxx) xxx
Table 2
Results of Machine Learning Classifiers.
   For evaluating a model’s performance, certain parameters are                          Specificity: measures the ability of a test to be negative when
used to determine its behavior. The results are influenced by the                         the condition is actually not present. It is also known as false-
size of the training data, the quality of the audio files, and, most                      positive rate, precision, Type I error, a error, the error of com-
importantly, the type of machine-learning algorithm used. The fol-                        mission, or null hypothesis.
lowing criteria are used to assess the models’ efficacy [23]:
                                                                                                       TN
                                                                                     Specificity ¼           100%½23                                                 ð11Þ
  Accuracy: Percentage of examples correctly categorized from                                       TN þ FP
   all given examples. It is calculated as:
                    tp þ tn
Accuracy ¼                       ½23                                     ð6Þ            KAPPA: Cohen’s kappa coefficient can be applied for evaluating
               tp þ tn þ fp þ fn
                                                                                          agreement between two regular nominal classifications. If one
                                                                                          uses Cohen’s kappa to quantify balance between the classifica-
                                                                                          tions, the ranges between all categories are considered identi-
  Precision: The percentage of true x-class instances for all those                      cal, and this makes sense if all nominal categories reflect
   listed as class x. It is calculated as:                                                different kinds of ‘presence’ [24]. The weighted kappa coeffi-
                                                                                          cient is defined as:
                  tp
Precision ¼            ½23                                               ð7Þ
               tp þ fp                                                                     O E
                                                                                     K¼        ½24                                                                   ð12Þ
                                                                                           I E
                                                                                5
A. Tahseen Ali, H.S. Abdullah and M.N. Fadhil                                                                     Materials Today: Proceedings xxx (xxxx) xxx
                                                                        Table 3
                                                                        Results Comparison.
                                                                        lowest error rate among other classifiers. The second dataset con-
                                                                        tains 50 persons and 1500 samples for each one of them. With this
                                                                        dataset can notice that the RF Classifier gives the best accuracy,
                                                                        precision, recall and f-measure while KNN classifier gives the worst
                    Fig. 6. Error Rate measured for Classifiers.        accuracy precision, recall and f-measure. As for error rate the
                                                                        results were on the contrary, where the KNN classifier produced
                                                                        highest error rate and RF produced lowest error rate among other
                                                                        classifiers (Table 3).
7. Results comparison
8. Conclusions                                                                           [14] Z. Aldeneh and E. Provost, ‘‘Using regional saliency for speech emotion
                                                                                              recognition.” in Acoustics,Speech, and Signal Processing (ICASSP), 2017 IEEE
                                                                                              International Conference on. IEEE, 2017, pp. 2741–2745.
    Audio preprocessing, feature extraction, and machine learning                        [15] Ranjodh Singh, Hemant Yadav, Mohit Sharma, Sandeep Gosain, Rajiv Ratn
classification are the three main components of this study. Since                             Shah, in: Automatic Speech Recognition for Real Time Systems, IEEE, 2019, pp.
                                                                                              189–198.
the audios used in our study were not captured in confined spaces,
                                                                                         [16] Amandeep Kaur Sandhu, Ranbir Singh Batth, Software reuse analytics using
audio pre-processing was a critical component of the study.                                   integrated random forest and gradient boosting machine learning algorithm,
Reduced ambient noise and emphasizing human vocals were the                                   Software: Practice and Experience 51 (4) (2021) 735–747.
                                                                                         [17] Parashar Dhakal, Praveen Damacharla, Ahmad Y. Javaid, Vijay Devabhaktuni,
two most critical aspects we focused on for pre-processing. We
                                                                                              in: Detection and Identification of Background Sounds to Improvise Voice
thought that relying solely on the MFCC coefficients would be suf-                            Interface in Critical Environments, IEEE, 2018, pp. 078–083.
ficient for the analysis. Two datasets were used and seven machine                       [18] Parashar Dhakal, Praveen Damacharla, Ahmad Y. Javaid, Vijay Devabhaktuni, A
learning algorithms. Our findings showed that using a machine                                 near real-time automatic speaker recognition architecture for voice-based
                                                                                              user interface, Machine Learning and Knowledge Extraction 1 (1) (2019) 504–
learning classifier in the classification process increased accuracy,                         520.
with (RF) classifiers reaching 97.9% accuracy. It was superior to                        [19] Samira Hazmoune, Fateh Bougamouza, Smaine Mazouzi, Mohamed
the accuracy results of the previous work, noting that the tested                             Benmohammed, A new hybrid framework based on Hidden Markov models
                                                                                              and K-nearest neighbors for speech recognition, Int. J. Speech Technol. 21 (3)
data sets differed                                                                            (2018) 689–704.
                                                                                         [20] Peng Song, Wenming Zheng, Feature selection based transfer subspace
                                                                                              learning for speech emotion recognition, IEEE Trans. Affective Comput. 11
Declaration of Competing Interest                                                             (3) (2018) 373–382.
                                                                                         [21] Joeky T. Senders, Mark M. Zaki, Aditya V. Karhade, Bliss Chang, William B.
    The authors declare that they have no known competing finan-                              Gormley, Marike L. Broekman, Timothy R. Smith, Omar Arnaout, An
                                                                                              introduction and overview of machine learning in neurosurgical care, Acta
cial interests or personal relationships that could have appeared                             Neurochir. 160 (1) (2018) 29–38.
to influence the work reported in this paper.                                            [22] Mohammad Gohari, Amir Mohammad Eydi, Modelling of shaft unbalance:
                                                                                              Modelling a multi discs rotor using K-Nearest Neighbor and Decision Tree
                                                                                              Algorithms, Measurement 151 (2020) 107253.
References                                                                               [23] Ali Hamid Meftah, Yousef Ajami Alotaibi, Sid-Ahmed Selouani, ‘‘Evaluation of
                                                                                              an Arabic speech corpus of emotions: A perceptual and statistical analysis.”
 [1] M. Algabri, H. Mathkour, M.A. Bencherif, M. Alsulaiman, M.A. Mekhtiche,                  IEEE, Access 6 (2018) 72845–72861.
     Automatic speaker recognition for mobile forensic applications, Mobile              [24] Matthijs J. Warrens, Kappa coefficients for dichotomous-nominal
     Information Systems 2017 (2017).                                                         classifications, Adv. Data Anal. Classif. (2020) 1–16.
 [2] Anggraeni, D., W. S. M. Sanjaya, M. Y. S. Nurasyidiek, and M. Munawwaroh.
     ‘‘the implementation of speech recognition using mel-frequency cepstrum             Further reading
     coefficients (MFCC) and support vector machine (SVM) method based on
     python to control robot arm.” In IOP Conference Series: Materials Science and
                                                                                          [25] B. ALhayani, and H. Ilhan, ‘‘Efficient cooperative imge transmission in one-
     Engineering, vol. 288, no. 1, p. 012042. IOP Publishing, 2018.
                                                                                                Way mult-hop sensor network,” International Journal of Electrical
 [3] Hasan, Md Al Mehedi, and Shamim Ahmad. ‘‘PredSucc-site: Lysine
                                                                                                Engineering Education, vol.57, no.2, 321-339. 2020.
     succinylation sites prediction in proteins by using support vector machine
                                                                                          [26] E. Milind, Rane, Umesh S Bhadade, ‘‘Comparative Study of ROI Extraction of
     and resolving data imbalance issue.” International Journal of Computer
                                                                                                Palmprint”, IJCSN International Journal of Computer Science and Network 5
     Applications 182, no. 15 (2018): 8887.
                                                                                                (2) (April 2016).
 [4] ELLaban, Hend Ab, Ahmed A. Ewees, and Abdelrazek E. Elsaeed. ‘‘A real-time
                                                                                          [27] Milind. Rane and Umesh. Bhadade, ‘‘ Multimodal score level fusion for
     system for facial expression recognition using support vector machines and k-
                                                                                                recognition using face and palmprint”, The International Journal of Electrical
     nearest     neighbor    classifier.” International  Journal    of   Computer
                                                                                                Engineering & Education, PP1-19, 2020
     Applications 159, no. 8 (2017): 23-29.
                                                                                          [28] Milind Rane, Tejas Latne, Umesh Bhadade, Biometric Recognition Using
 [5] Saleh Khawatreh, Belal Ayyoub, Ashraf Abu-Ein, Ziad Alqadi, A Novel
                                                                                                Fusion, ICDSMLA 1320–1329 (2019) 2019.
     Methodology to Extract Voice Signal Features, International Journal of
                                                                                          [29] Alhayani, B.S.A., llhan, H. Visual sensor intelligent module based image
     Computer Applications 975 (2018) 8887.
                                                                                                transmission in industrial manufacturing for monitoring and manipulation
 [6] Tayseer MF Taha, Ahsan Adeel, Amir Hussain, A survey on techniques for
                                                                                                problems. J Intell Manuf 32, 597–610 (2021). 10.1007/s10845-020-01590-1
     enhancing speech, International Journal of Computer Applications 179 (17)
                                                                                          [30] B. Alhayani, A.A. Abdallah, Manufacturing intelligent Corvus corone module
     (2018) 1–14.
                                                                                                for a secured two way image transmission under WSN, Engineering
 [7] Suma Paulose, Dominic Mathew, Abraham Thomas, Performance evaluation of
                                                                                                Computations 37 (9) (2020) 1–17.
     different modeling methods and classifiers with MFCC and IHC features for
                                                                                          [31] B. ALhayani and H. Ilhan, ‘‘Image transmission over decode and forward
     speaker recognition, Procedia Comput. Sci. 115 (2017) 55–62.
                                                                                                based cooperative wireless multimedia sensor networks for Rayleigh fading
 [8] R. Thiruvengatanadhan, Speech Recognition using SVM, International Research
                                                                                                channels in medical internet of things (MIoT) for remote health-care and
     Journal of Engineering and Technology (IRJET) 5, no. 09 (2018).
                                                                                                health communication monitoring,” Journal of Medical Imaging And Health
 [9] Nassren A. Alwahed, Talib M. Jawad, ARABIC SPEECH RECOGNITION BASED ON
                                                                                                Informatics, vol. 10, pp. 160-168.2020
     KNN, J48, AND LVQ, Iraqi Journal of Information & Communications
                                                                                          [32] B. .Alhayani and Milind Rane,”face recognition system by image processing”
     Technology 2 (2) (2019) 1–8.
                                                                                                International journal of electronics and communication engineering &
[10] M. Subba, G. Lakshmi, P. Gowri and K. Chowdary ‘‘RANDOM FOREST BASED
                                                                                                technology (IJCIET),vol.5, no.5, 80–90. 2014.
     AUTOMATIC SPEAKER RECOGNITION SYSTEM.” The International Journal of
                                                                                          [33] Bilal Alhayani, Husam Jasim Mohammed, Ibrahim Zeghaiton Chaloob, Jehan
     analytical and experimental modal analysis, pp: 526- 535, April/2020.
                                                                                                Saleh Ahmed, Effectiveness of artificial intelligence techniques against cyber
[11] Nidaa F. Hassan, Sarah Qusay Selah Alden, Gender classification based on
                                                                                                security risks apply of IT industry, Mater. Today:. Proc. (2021), https://doi.
     audio features, Journal of Al-Ma’moon College 31 (2018).
                                                                                                org/10.1016/j.matpr.2021.02.531.
[12] Sullivan, Michael. ‘‘Global markets and technologies for voice recognition.”
                                                                                         [34] Bilal Alhayani, Sara Taher Abbas, Dawood Zahi Khutar, Husam Jasim
     Information Technology Market Research Reports in BCC Research (2017).
                                                                                              Mohammed, Best ways computation intelligent of face cyber attacks,
[13] Seyedmahdad Mirsamadi, Emad Barsoum, Ch.a. Zhang, in: Automatic speech
                                                                                              MaterialsToday          Proceedings        (2021),       https://doi.org/10.1016/
     emotion recognition using recurrent neural networks with local attention,
                                                                                              j.matpr.2021.02.557.
     IEEE, 2017, pp. 2227–2231.