Speech Recognition of Isolated Words Usi
Speech Recognition of Isolated Words Usi
Published By:
Retrieval Number: C5874098319/2019©BEIESP Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.C5874.098319 6259 & Sciences Publication
Speech Recognition of Isolated Words using a New Speech Database in Sylheti
Published By:
Retrieval Number: C5874098319/2019©BEIESP Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.C5874.098319 6260 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-8 Issue-3, September 2019
Published By:
Retrieval Number: C5874098319/2019©BEIESP Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.C5874.098319 6261 & Sciences Publication
Speech Recognition of Isolated Words using a New Speech Database in Sylheti
recognition rates. As discussed in Section 1, Sylheti is an in SCB [52] and they are /b/, /t/, /ɡ/, /m/, /n/, / /, /s/, /h/, /r/, /l/,
unexplored and under-resourced language from both / /, /t/, /d/. The other 4 consonant phonemes /z/, /x/, /ɖ/ and
linguistic as well as technological points of view. Speech /Φ/ are specific in Sylheti language. The 17 consonant
phonemes /p/, /ph/, /bh/, /th/, /d/, /dh/, / t /, / dh/, /c/, /ch/, /k/,
recognition for Sylheti has not been considered yet except an h
ASR system reported for isolated digits pronounced in Sylheti /kh/, /ɡh/, /w/, /j/, /Ɉ/, / Ɉh/ available in SCB are not present in
from 0 to 9 [36] as a part of our initiative. Further to state here Sylheti.
that there is no speech database available in the Sylheti Table 1 : Sylheti Phonemes
language in electronic form for applications in speech or Vowel phonemes Consonant phonemes
speaker recognition. Considering the above observations, we
concentrate here on two aspects as follows: /i/, /e/, /a/, /u/, /ɔ/ /b/, /t/, /ɡ/, /m/, /n/, / /, /s/, /h/,
To construct a new speech database of small vocabulary /r/, /l/, / /, /t/, /d/, /z/, /x/, /ɖ/, /Φ/
for isolated Sylheti words. In doing so, the possible future When Sylheti is compared with English, it is observed that the
use of the database in ASR environment can also to be English language has a total of 12 vowel phonemes [52]. All
considered. The Sylheti words (except the digits) are to be the 5 vowel phonemes in Sylheti are also present in English.
chosen based on phonetic studies made in [31],[32] such Therefore, the remaining 7 vowel phonemes in English (/ə/,
that the words are phonetically rich. /æ/, /I/, /ɒ/, /ɜ/, /ʌ/, /ʊ/) are specific to the language. On the
To design ASR systems for isolated Sylheti words by other hand, 12 consonant phonemes /b/, /t/, /ɡ/, /m/, /n/, / /, /s/,
using FFNN and RNN types of neural networks. /h/, /r/, /l/, / /, /z/ in Sylheti [ 2] are also present in English
The following section presents the proposed Sylheti speech [52]. ence, Sylheti has 5 language specific consonant
database for isolated words. phonemes /t/, /d/, /x/, /ɖ/ and /Φ/, when compared with
English. The English language has 12 specific consonant
III. CONSTRUCTION OF NEW SYLHETI SPEECH phonemes which are /p/, /d/, /ɵ/, /k/, /f/, /v/, /Ʒ/, /t /, /dƷ/, /w/,
DATABASE /j/, /ð/. Therefore, there is enough scope to study the Sylheti
language from the linguistic point of view and [32],[41] may
A speech database (or corpus) is a collection of utterances for
be consulted for detail in this regard. This also entails that
a particular language and it is an important resource for
Sylheti can be considered to be studied from the technological
building a speech recognizer. The samples in such database
viewpoints. This phonetic study will facilitate a database in
are used for training and testing of an ASR system. In transcribed form which may be used in studying the
constructing a speech database, phonetic/linguistic level phoneme-based speech recognition and speaker recognition
discussion in the language is found to be relevant. As an problems in Sylheti. The construction of a Sylheti speech
under-resourced language, a study in Sylheti is also carried database of small vocabulary considering of isolated words is
out here based on its phonetic/linguistic characteristics and presented in the following.
accordingly a brief comparison on phonemic structure of In constructing this new Sylheti speech database of isolated
Sylheti language with two major languages English and words, 30 most frequently-used mono syllabic words are
Standard Colloquial Bangla (SCB) is presented below. considered which are phonetically rich. Out of these words,
Thereafter, the work on constructing a new database in 10 are the utterances of the digits 0-9 in Sylheti and 20 are
Sylheti is discussed. Although the ASR systems which are to other Sylheti words among which few are taken from [32].
be proposed here do not involve phoneme recognition, it is Table 2 lists the isolated words in Sylheti for the proposed
aimed in the following to construct a standard speech database and it shows the meaning in English of each word
database in Sylheti considering the phonetically rich words. and the phonemes present (in bold letters).
Each speech utterance is represented by some finite set of Table 2: Isolated Sylheti words in the proposed database.
symbols known as phonemes which describe the basic units of The phonemes present in the words are shown in bold
speech sound [52]. Phonemic status of a sound is not same letters.
across languages. Moreover, the number of phonemes in one Sylheti word Meaning Sylheti word Meaning
language varies from another language. The phoneme [suinjɔ] Zero [ex] One
[dui] Two [tin] Three
inventory of Sylheti presented in [32] shows that Sylheti has
[sair] Four [Фas] Five
some specific phonemes which are not present in SCB or in [sɔy] Six [sat] Seven
major language like English. This phonetic study also [aʈ] Eight [nɔy] Nine
presents a significant reduction and reconstruction compared [ an] Donate [ an] Paddy
to that of SCB. Also, Sylheti language has the nature of [pua] Boy [puri] Girl
[ ud] Milk [bari] Home
obstruent weakening which employs de-aspiration, [pul] Flower [bari] Heavy
spirantization and deaffrication. Altogether Sylheti has 22 [bala] Good [bala] Bracelet
phonemes as shown in Table 1, out of which 5 are vowel and [jamai] Husband [bɔu] Wife
17 are consonant [32]. On the other hand, SCB has 37 [ba ] Boiled Rice [ba ] Arthritis
[ma a] Head [mux] Face
phonemes, aggregated from 7 vowel and 30 consonant
[ɡa] Body [ɡa] Wound
phonemes [52]. Five vowel phonemes /i/, /e/, /a/, /u/ and /ǝ/ in [ɡai] Stroke [ɡai] Cow
Sylheti are common to Bangla. The two other vowel The construction of speech database primarily involves
phonemes of Bangla, /o/ and /æ/, are merged with the vowel speech acquisition and labeling [53]. The acquisition of
phonemes /u/ and /e/ respectively in Sylheti due to speech utterances may be from
restructuring in articulation [32]. Again, out of the 17 read out speech or from
consonant phonemes in Sylheti, 13 phonemes are also present spontaneous speech [53],[54].
Published By:
Retrieval Number: C5874098319/2019©BEIESP Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.C5874.098319 6262 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-8 Issue-3, September 2019
Both of these approaches have their intrinsic advantages and analyzing the performance of the proposed systems.
limitations [53]. As the first attempt to construct a Sylheti
speech database, the case of read out speech is chosen here. IV. PROPOSED SPEECH RECOGNITION SYSTEM
Therefore, speakers are asked to read out the Sylheti words FOR SYLHETI LANGUAGE
shown in Table 2 and the utterances are captured. In recording
It is observed in Section 2 that the many ASR systems
the speech utterances, the following hardware and software
are used: [4],[6],[7],[8],[42],[49] developed in recent times for
Microphone: iBall Rocky unidirectional microphone "well-resourced" as well as "under-resourced" languages use
(frequency range from 20Hz to 20KHz) MFCC features and neural network classifiers due to their
Laptop: Intel Core i3 processor and 2 GB RAM, distinct characteristics as presented in Section 1. In view of
manufactured by ASUS the above, an architecture for ASR system which employs
Operating system: Windows 7 MFCC features and ANN classifier as shown in Figure 3 is
Voice recording software: PRAAT (version proposed in this work. We consider to use two different types
praat5367_win64) of ANN classifiers to derive two ASR systems for recognizing
Further the following parameters are set up during recording: isolated words in Sylheti.
Sampling rate: 16 KHz
Channel mode: Mono
Environment: Noise-free closed room environment
Encoding format: WAV with 16 bit PCM
Distance of microphone from speaker's mouth: 10-12 cm
This speech database consists of data recorded from 10 native
speakers including 8 male speakers and 2 female speakers
who are willing to participate to contribute during the
construction of this database. These speakers do not have any
history of speech disorders. As there is no specific rule about
the male-female proportion in construction of speech
database, literatures [51],[56],[58] have considered various Figure 3. Architecture for ASR system employing
proportions like 60%-40%, 70%-30%, 65%-35%, etc. The MFCC features and ANN classifier
speakers in this work are chosen from Sylheti speaking areas The functions of the each block in Figure 3 are described in
in the Karimganj district of the state of Assam and the the following.
Kailasahar and Kumarghat districts of the state of Tripura,
India where they have been living since their childhood. The A. Signal Pre-processing
ages of the participating speakers are in the range from 25 to The signal pre-processing involves analog-to-digital (A/D)
70 years. Six speakers are in the age group 25 to 45 years and conversion, end point detection, pre-emphasis filtering and
have a graduate degree. The other 4 speakers are in the age windowing. In A/D conversion, the input speech signal is
group 46 to 70 years and are undergraduates. By choosing the sampled at 8 KHz and quantized with 16 bits/sample to derive
speakers in different age group, the variations in speech a digital signal. The voiced part of the speech signal is
characteristics with age are taken care of [55]. Apart from extracted from the digital signal by locating the beginning and
Sylheti, speakers can also speak English and Hindi. All the end points in the utterance (end point detection). One popular
speakers are asked to utter (read out) each of the 30 Sylheti method to extract the voiced part is to compute the
zero-crossing rate. Here, the rate at which the speech signal
words in Table 2 for 10 times. The samples are recorded and
crosses the zero amplitude line by transition from a positive to
stored according to:
negative value or vice versa is measured. Voiced part exhibits
speakernumber_age_gender_utteredword_utterancenumber
a low zero-crossing rate. Another method to extract voiced
.wav. Thereby, a total of 300 utterances are recorded for each part is short-time signal energy. After extraction of the voiced
speaker. This exercise derives a speech database containing part, a pre-emphasis filter is used to emphasize the
3000 speech samples of isolated Sylheti words. Duration of high-frequency components in the voiced part. It helps either
recording for this new Sylheti speech database is to separate the signal from noise or to restore distorted signal.
approximately 5 hours. In the labeling process [53], the Here, a first-order high-pass finite impulse response (FIR)
recorded utterances are verified by carefully listening the filter is applied to spectrally flatten the signal. Authors
target words and presence of any irregular noise or quiet consider the following FIR filter for pre-emphasis [5]:
segments in the recorded samples are examined. For each x p [n] xv [n] 0.95 xv [n 1] (3)
recorded utterance, the voiced parts are manually extracted by
selecting their beginning and end points and the unwanted where, xv [ n] is the input signal (voiced part) to the
silent parts are removed. This is done by using PRAAT pre-emphasis filter and x p [ n] is the output.
software. The labeling exercise is rechecked by another
Due to time varying nature, speech signal is divided into
verifier to confirm that only the voiced parts are retained from
short segments (of duration ranging from 5 to 100 ms) called
the recorded utterances in the final database.
frames [5]. Frames are assumed
The following section presents two ASR systems for
to be stationary and speech
recognizing isolated Sylheti words which are taken from the analysis is carried out on the
above-presented Sylheti speech database and also for frames.
Published By:
Retrieval Number: C5874098319/2019©BEIESP Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.C5874.098319 6263 & Sciences Publication
Speech Recognition of Isolated Words using a New Speech Database in Sylheti
Published By:
Retrieval Number: C5874098319/2019©BEIESP Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.C5874.098319 6264 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-8 Issue-3, September 2019
In the present work, the number of neurons in hidden layer is decide the optimum number of neurons in hidden layer for the
decided empirically as discussed above. The observed proposed ASR systems, we conducted training and testing of
performances for both the FFNN and RNN networks suggest the networks by varying numbers of neurons in the hidden
46 neurons in the hidden layer. A detail description of these layer according to the three rules-of-thumb mentioned in
performances is presented in the next section. Section 4(C). Better performances are observed when the size
Activation function: The non-linear activation (transfer) of hidden layer is set at 38 as per the third rule (i.e., the
functions logsigmoid and tansigmoid are used respectively in number of neurons is equal to the sum of the output layer size
this study for the output and hidden layers. The basic reasons and 2/3 of the input layer size) out of the three rules. However,
of using sigmoid function are its smoothness, continuity and to achieve superior performances, the trial and error approach
positive derivation. The logsigmoid function in the output is adopted in backward and forward directions by taking
layer produces the network outputs in the interval [0,1] i.e. hidden layer neurons in the range 36 to 50. Figure 5 presents
output of one class is closer to be 1 once the word is detected plots of the observed performances of the systems using the
and 0 otherwise. Again in tansigmoid function, it’s output is FFNN and RNN networks. It can be observed in the plots that
zero centered in between -1 to 1 and hence optimization the maximum performance of 84.5% is obtained for the
is easier. proposed FFNN based ASR system when the hidden layer
Training algorithm: In both the ASR systems, the scaled contains 46 neurons. Similarly, for RNN based system, the
conjugate gradient back-propagation method is used to train hidden layer with 46 neurons derives the best performance of
the networks due to its better learning speed [4],[5]. Many 86.6%. We, therefore, consider 46 neurons in the hidden
other authors have also used this training algorithm due to the layers of the proposed systems.
above said advantage [6],[12]. As a supervised algorithm, this
back-propagation method optimizes the weights of the
neurons by using a loss/cost function [5] and produces faster
convergence than other methods.
The following section presents the experimental setup and
%RR
Published By:
Retrieval Number: C5874098319/2019©BEIESP Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.C5874.098319 6265 & Sciences Publication
Speech Recognition of Isolated Words using a New Speech Database in Sylheti
VI. CONCLUSION
Speech Recognition using neural network has been an
area of research interest for long, and many ASR systems have
been proposed for different languages around the globe. This
paper has considered the "under-resourced" Sylheti language.
As no speech database for Sylheti in electronic form is
available, a new speech database of isolated Sylheti words has
been proposed which can be used by researchers working in
the domains of speech processing in Sylheti. This paper has
also presented two ASR systems for the Sylheti language to
recognize isolated Sylheti words by applying two variants of
neural network classifiers, FFNN and RNN. It has been
observed that the overall performance of ASR system using
Figure 6. Convergence plots for the proposed ASR the RNN network (recognition rate:86.38%) is better than that
systems of the FFNN based ASR system (84.55%) which is due to the
feedback of RNN. One of our future works will concentrate
In each group, 750 utterances (25 utterances of each of the 30 on updating this constructed Sylheti database to include
words) are considered. Out of these four groups, two groups connected words and also to design ASR system for
are considered for training and that of other two groups are recognizing connected words in Sylheti. Another future work
used for testing. Thereby, a total of 4C2 6 different training will be to employ DNN in ASR system for Sylheti. Also, the
and testing datasets are used. The corresponding observed problem of speaker identification will be taken up for the
recognition rates for both the proposed systems are presented Sylheti language.
in Table 3.
REFERENCES
Table 3. Performances of both the ASR systems 1. C. Kurian, "A Survey on Speech Recognition in Indian Languages",
International Journal of Computer Science and Information
ASR Technologies, vol. 5, no. 5, 2014, pp. 6169-6175.
Training Testing Average
system %RR 2. R. Matarneh, S. Maksymova, V. V. Lyashenko and N. V. Belova,
Dataset Dataset %RR
using "Speech recognition systems: A comparative Review", IOSR Journal of
Computer Engineering, vol. 19, no. 5, 2017, pp. 71-79.
G1,G2 G3,G4 83.9 3. S. K.Gaikwad, B.W.Gawali and P. Yannawar, "A Review on Speech
G1,G3 G2,G4 84.5 Recognition Technique", International Journal of Computer
G1,G4 G2,G3 85 84.55 Applications, vol. 10, no.3, Nov. 2010, pp. 16-24.
FFNN 4. G. Dede and M. H. Sazli, "Speech recognition with artificial neural
G2,G3 G1,G4 85.8
networks", Elsevier journal of Digital Signal Processing, vol.20, no. 3,
G2,G4 G1,G3 84.6 May, 2010, pp.763-768.
G3,G4 G1,G2 83.5 5. M. Sarma, K. Dutta, and K. K. Sarma, "Assamese Numeral Corpus for
G1,G2 G3,G4 85 speech recognition using Cooperative ANN architecture", International
Journal of Computer, Electrical, Automation, Control and Information
G1,G3 G2,G4 88.3 Engineering vol.3, no.4,2009.
G1,G4 G2,G3 87 86.38 6. B. P. Das and R. Parekh, "Recognition of isolated Words using Features
RNN based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers",
G2,G3 G1,G4 86.6
International Journal of Modern Engineering Research (IJMER), vol. 2,
G2,G4 G1,G3 86 no. 3, May-June 2012, pp. 854-858.
G3,G4 G1,G2 85.4 7. Y.A. Khan, S. M. Mostaq Hossain and M. M. Hoque, "Isolated Bangla
It may be observed from the above experimentations that both word recognition and Speaker detection by Semantic Modular Time
the proposed systems perform more or less consistently when Delay Neural Network (MTDNN)", 18th International conference on
Computer and Information Technology, Dhaka, Bangladesh , 21-23
different training and testing Sylheti datasets are used. This Dec. 2015.
implies good robustness of both the systems to variations in 8. N. Seman, Z. A. Bakar and N. A. Bakar, "Measuring the performance of
datasets. However, the RNN based ASR system derives better Isolated Spoken Malay Speech Recognition using Multi-layer Neural
recognition accuracy (average %RR of 86.38) than that of the Network", International Conference on Science and social
Research(CSSR 2010), Kualalumpur, Malaysia, December, 2010.
ASR system using FFNN (average %RR of 84.55). The better 9. A. Mohammed, G. E. Dahl, and G. Hinton, "Acoustic Modeling using
performance with the RNN classifier may be due to its Deep Belief Networks", IEEE transactions on Audio, Speech and
inherent feedback characteristics as discussed in Section 1. Language Processing, vol.20, no.1, January 2012, pp. 14-22.
Due to speech variability in age variation (which affect the 10. M. K. Luka, I. A. Frank and G. Onwodi, "Neural Network Based Hausa
Language Speech Recognition", International Journal of Advanced
performance of any ASR system) in the constructed Sylheti Research in Artificial Intelligence, vol. 1, no. 2, 2012, pp. 39-44..
speech database, it is also noticeable the minor deterioration 11. A. Kanagasundaram, "Speaker Verification using I-vector Features", a
of recognition results in the presented ASR systems. Thus, PhD thesis of Queensland University of Technology, 2014.
from the generated results it can be concluded that the 12. M. OPREA AND D. SCHIOPU, "AN ARTIFICIAL NEURAL
NETWORK-BASED ISOLATED WORD SPEECH RECOGNITION SYSTEM FOR
observed performances of the ASR systems for Sylheti
THE ROMANIAN LANGUAGE", 16TH INTERNATIONAL CONFERENCE ON
presented above are comparable to the performances of SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 12-14 OCT.,
similar systems available for other languages SINAIA, ROMANIA, 2012.
[6],[7],[8],[40],[50] and hence are considered to be
satisfactory.
Published By:
Retrieval Number: C5874098319/2019©BEIESP Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.C5874.098319 6266 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-8 Issue-3, September 2019
13. K. R. Ghule and R. R. Deshmukh, "Automatic Speech Recognition of conference on telecommunication, power analysis and computing
Marathi isolated words using Neural Network", International Journal of techniques(ICTPACT-2017),Chennai, India, 6-8 April,2017.
Computer Science and Information Technologies, vol.6(5), 2015, pp. 37. H. Sakoe and S. Chiba, "Dynamic programming algorithm
4296-4298. optimization for Spoken word recognition", IEEE Trans. Acoustic,
14. M. K. Sarma, A. Gajurel, A. Pokhrel and B. Joshi, "HMM based isolated Speech Signal Processing, vol. 26 , no.1, Feb 1978 , pp. 43-49.
word Nepali speech recognition", Proceedings of International 38. X. Lei, A. W. Senior, A. Gruenstein and J. Sorensen, "Accurate and
conference on Machine learning and Cybernetics, Ningbo, China,2017. Compact Large vocabulary speech recognition on mobile devices",
15. S. S. Bharali and S. K. Kalita, "A comparative study of different INTERSPEECH 2013, Lyon, France, 25-29 August 2013, pp. 662-665.
features for isolated spoken word recognition using HMM with 39. J.T.Geiger, Z.Zhang, F.Weninger, B. Schuller and G. Rigoli, "Robust
reference to Assamese language", International Journal of Speech speech recognition using long short term memory recurrent neural
Technology, Springer ,vol. 18, no. 4, 2015, pp. 673–684. networks for hybrid acoustic modeling", Conference of the International
16. S. Xihao and Y. Miyanaga, "Dynamic time warping for speech Speech Communication Association, 14-18 September 2014, Singapore
recognition with training part to reduce the computation", International INTERSPEECH 2014
Symposium on Signals, Circuits and Systems ISSCS2013, 11-12 July, 40. P. Sharma and A. Garg, "Feature Extraction and Recognition of Hindi
2013. Spoken Words using Neural Networks", International Journal of
17. B.W.Gawali, S. Gaikwad, P. Yannawar and S.C. Mehrotra, "Marathi Computer Applications, vol. 142, no.7, May 2016., pp. 12-17.
isolated word recognition system using MFCC and DTW features", 41. A. Goswami, "Simplification of CC sequence of Loan words in Sylheti
ACEEE International Journal of Information Technology, vol. 01, no. Bangla", Language in India, vol 13, no. 6 June,2013.
01, Mar 2011, pp. 21-24. 42. M. MoneyKumar, E. Sherly, and W. M. Varghese, "Isolated Word
18. C. Madhu, A. George and L. Mary, "Automatic language identification Recognition system for Malayalam Using Machine Learning", Proc. of
for seven Indian languages using higher level features", IEEE the 12th Intl. Conference on Natural Language Processing, Trivandrum,
International Conference on Signal Processing, Informatics, India. December 2015, pp. 158–165.
Communication and Energy Systems (SPICES), Kollam, India, 2017, 43. J. Kunze, L. Kirsch, I. Kurenkov, A. Krug, J. Johannsmeier, and S.
pp. 1-6. Stober, "Transfer Learning for Speech Recognition on a Budget",
19. P. SWIETOJANSKI, "LEARNING REPRESENTATIONS FOR SPEECH Proceeding of the 2nd workshop on Representation learning for NLP,
RECOGNITION USING ARTIFICIAL NEURAL NETWORK", A DOCTORAL Vancouver, Canada, August 3, 2017, pp. 168-177.
THESIS, 2016. 44. S. Furui, "Speaker-Independent Isolated Word Recognition Using
20. M. Borsky, "Robust recognition of strongly distorted speech", a doctoral Dynamic Features of Speech Spectrum", IEEE transactions on
thesis, 2016. Acoustic,Speech and Signal processing, vol. 34 , no.1, 1986, pp. 52-59.
21. S. G. Surampudi and Ritu Pal, "Speech Signal processing using Neural 45. D. Dhanashri and S.B. Dhonde, "Isolated word speech recognition
Networks", IEEE International Advance Computing Conference (IACC system using Deep Neural Networks", Proceedings of the International
2015), Bangalore, India, 12-13 June 2015. Conference on Data Engineering and Communication Technology, vol.
22. A. Zaatri, N. Azzizi and F. L. Rahmani, "Voice Recognition 1, 2017, pp. 9-17.
Technology using Neural Networks", Journal of New Technology and 46. T. Hori, C. Hori, S. Watanabe and J. R. Hershey, "Minimum word error
Materials, vol. 5, no. 1, 2015, pp. 26-30. Training of Long short-term memory Recurrent Neural Network
23. O.I. Abiodun, A Jantan, A.E.Omolara, K.V. Dada, N.A. Mohamed and Language models for Speech recognition", 41st IEEE International
H. Arshad, "State-of-the-art in artificial neural network applications: A conference on Acoustic, Speech and Signal processing, Shanghai,
survey", Heliyon, an Elsevier Journal, vol. 4, no. 11, 2018. China, vol. 2016-May, 2016, pp. 5990-5994.
24. L. Fausett, "Fundamentals of Neural Networks: Architecture, 47. A. Das, P. Jyothi, and M. H. Johnson, "Automatic Speech Recognition
Algorithms and Applications", Prentice-Hall, Inc., New Jersey 1994. using Probabilistic transcriptions in Swahili, Amharic and Dinka",
25. L. Besacier, E. Barnard, A. Karpov and T. Schultz, "Automatic Speech Proceedings of the Annual Conference of the International Speech
Recognition for Under-Resourced Languages: A Survey", Speech Communication Association, INTERSPEECH 2016, San Francisco,
Communication, vol. 56, January, 2014, pp. 85-100. USA, 8-12 September, 2016, pp. 3524-3528.
26. V. Berment, "Methods to computerise "little equipped" languages and 48. N. Zerari, S. Abdelhamid, H. Bouzgou, and C. Raymond,
group of languages", PhD. Thesis, J. Fourier University-Grenoble I, May "Bidirectional deep architecture for Arabic speech recognition", Open
2004. Comput. Sci., DE GRUYTER, 2019, pp. 92–102.
27. "Ethnologue Languages of the World" 49. C. Xu, X. Wang and S. Wang, "Research on Chinese Digit Speech
https://www.ethnologue.com/statistics/status. Recognition Based on Multi-weighted Neural Network", IEEE
28. H.B.Sailor, M.V.S. Krishna, D. Chhabra, A. T. Patil, M.R. Kamble and Pacific-Asia Workshop on Computational Intelligence and Industrial
H.A. Patil, "DA-IICT/IIITV system for low resource speech recognition Application, 2008, pp. 400-403.
challenge 2018", Interspeech 2018, 2-6 September 2018, Hyderabad, 50. I. Kipyatkova and A. Karpov, "Recurrent Neural Network- based
pp. 3187-3191. Language modeling for an Automatic Russian Speech Recognition
29. M.A. Hasegawa-Johnson, P. Jyothi, D. McCloy, M. Mirbagheri, G. M. System", Proceeding of the Artificial Intelligence and Natural Language
di Liberto, Amit Das, B. Ekin, C. Liu, V. Manohar, H. Tang, E. C. Lalor, and Information Extraction, Social Media and Web Search FRUCT
N. F. Chen, P. Hager, T. Kekona, R. Sloan and A. K. C. Lee, "ASR for Conference (AINL-ISMW FRUCT), 9-14 Nov. 2015, St. Petersburg,
Under-Resourced Languages From Probabilistic Transcription", Russia.
IEEE/ACM transactions on Audio, Speech, and Language processing, 51. D. T. Toledano, M. P. Fernandez-Gallego and A. Lozano-Diez,
vol. 25, no. 1, January 2017. "Multi-resolution speech analysis for automatic speech recognition
30. K. Kumar, R.K. Aggarwal and A. Jain, "A Hindi speech recognition using deep neural networks: Experiments on TIMIT", PLoS ONE, vol.
system for connected words using HTK", International Journal of 13, no. 10, October 10, 2018.
Computational Systems Engineering, vol. 1, no. 1, 2012, pp. 25-32. 52. B. Barman, "A contrastive analysis of English and Bangla phonemics",
31. A. Gope and S. Mahanta, "Lexical Tones in Sylheti", 4th International The Dhaka University Journal of Linguistics, vol. 2, no.4, August 2009.
Symposium on Tonal Aspects of 53. W. Hong and P. Jin'gui, "An undergraduate Mandarin Speech
Languages,Nijmegen,Netherlands,May 13-16,2014 Database for Speaker Recognition Research, Oriental COCOSDA
32. A.Gope and S. Mahanta, "An Acoustic Analysis of Sylheti Phonemes", International Conference on Speech Database and Assessments",
Proceedings of the 18th International Congress of Phonetic Sciences. Urumqi, China, 10-12 August, 2009.
Glassgow, UK, 2015. 54. C. Kurian, "A Review on Speech Corpus Development for Automatic
33. A.Gope and S. Mahanta, "Perception of Lexical Tones in Sylheti", Speech Recognition in Indian Languages", International Journal of
Tonal Aspects of Languages 2016, 24-27 May 2016, Newyork. Advanced Networking and Applications, vol.6, no.6, 2015, pp.
34. D. M. Kane, "Puthi-Pora:'Melodic Reading' and its use in the 2556-2558.
Islamisation of Bengal", Doctoral Dissertation, University of London, 55. B. Das, S. Mandal, P. Mitra and A. Basu, "Effect of aging on speech
2008. features and phoneme recognition: a study on Bengali voicing vowels",
35. K.H.Davis, R. Biddulph and S. Balashek, "Automatic Recognition of International Journal of Speech Technology, Springer, vol. 16, no. 1,
Spoken Digits", Journal of the Acoustic Soc. of America, vol. 24, no. 6, March 2013, pp. 19-31.
1952, pp. 627-642.
36. G.Chakraborty, M.Sharma, N. Saikia and K. K.Sarma, "Recurrent
Neural Network Based Approach To Recognise Isolated Digits In
Sylheti Language Using MFCC Features", Proceedings of International
Published By:
Retrieval Number: C5874098319/2019©BEIESP Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.C5874.098319 6267 & Sciences Publication
Speech Recognition of Isolated Words using a New Speech Database in Sylheti
AUTHORS PROFILE
Gautam Chakraborty, a research scholar in the
department of Electronics & Telecommunication
Engineering, Assam Engineering College, Guwahati,
India, under Gauhati University, is currently working as
an Assistant Professor at NERIM, Guwahati, Assam
since 2010. His research interests include speech
processing, cloud computing, etc. He has authored many research papers in
national and international conference proceedings.
Published By:
Retrieval Number: C5874098319/2019©BEIESP Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.C5874.098319 6268 & Sciences Publication