Kannada Speakers Speech Emotion Recognition Using Deep Learning Technique: Anger, Fear, Sad and Neutral state
DOI:
https://doi.org/10.32628/CSEIT2612116Keywords:
Accent, Emotion Recognition, MFCC, CNN, HCI, LibrosaAbstract
In this paper, we have experimenting to identifying the Kannada emotional speech signals recognition using deep learning technique. Speech Emotion Recognition (SER) is a hot research topic in the field of HCI (Human Computer Interaction). To experiment we have created our own Kannada Emotional Speech corpus recording in different android mobile phone, with sampling rate 48 kbps and mono channel and speakers are from the Karnataka and Tamil Nadu boarder region, whose accents is a mixer of Kannada, Tamil and Telugu languages. The emotions are anger, fear, sad and neutral is considered. Totally we have 440 Kannada emotional speech signals. To use this speech signal data in our experiment, we have first down sampled all speech signals into 16 kbps, mono channel and stored as wav files. Then we pass the each sample signals into the pre-emphasize phase, then framing and windowing is done, then we extracted features from each voice frame signal like MFCC, Chroma and ‘Mel’ and then passed to CNN (Convolution Neural Network) to train and testing the voice signals. The CNN is capable to handle small amount of data for training, and with smaller number of parameters. The average Kannada emotion recognition accuracy rate of 93.77% is achieved. All computations are done using Python programming language and used Python libraries like Librosa, Keras, pyaudio, soundfile, sklearn.
Downloads
References
Bennilo Fernandes et al., (2021), “Speech Emotion Recognition Using Deep Learning LSTM for Tamil Language”, Pertanika J. Sci. & Technol. 29 (3): 1915 - 1936 (2021), e-ISSN: 2231-8526, DOI: https://doi.org/10.47836/pjst.29.3.33 DOI: https://doi.org/10.47836/pjst.29.3.33
Leila Kerkeni et al., (2018), “Speech Emotion Recognition: Methods and Cases Study”, In Proceedings of the 10th International Conference on Agents and Artificial Intelligence (ICAART 2018) - Volume 2, pages 175-182 ISBN: 978-989-758-275-2. DOI: https://doi.org/10.5220/0006611601750182
Javier G. Razuri et al., (2015), “Speech emotion recognition in emotional feedback for Human-Robot Interaction”, (IJARAI) International Journal of Advanced Research in Artificial Intelligence, Vol. 4, No.2, 2015. DOI: https://doi.org/10.14569/IJARAI.2015.040204
Chunyi Wang et al., (2022), “Speech Emotion Recognition Based on Multi-feature and Multi-lingual Fusion”, Multimedia Tools and Applications, 2022, Springer, DOI:10.1007/s11042-021-10553-4 DOI: https://doi.org/10.1007/s11042-021-10553-4
Mohanty et al., (2022), "Speech Emotion Recognition System using Librosa for Better Customer Experience," Graduate Research in Engineering and Technology (GRET): Vol. 1: Issue 6, Article 7. DOI: 10.47893/GRET.2022.1114. DOI: https://doi.org/10.47893/GRET.2022.1114
Harshith B U et al., (2019), “Speaker Dependent Emotion Recognition from Speech for Kannada language”, International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Published by, www.ijert.org NCRACES - 2019 Conference Proceedings, Special Issue – 2019, Volume 7, Issue 10, Pg 1-4.
Syeda Tamanna Alam Monisha et al., (2022), “A Review of the Advancement in Speech Emotion Recognition for Indo-Aryan and Dravidian Languages”, Hindawi Advances in Human-Computer Interaction Volume 2022, Article ID 9602429, 11 pages. https://doi.org/10.1155/2022/9602429 DOI: https://doi.org/10.1155/2022/9602429
Geethashree and D. Ravi, (2018), “Kannada emotional speech database: design, development and evaluation,” in Proceedings of the International Conference on Cognition and Recognition, pp. 135–143, Springer, 2018. DOI: https://doi.org/10.1007/978-981-10-5146-3_14
Ashish B. Ingale et al., (2012), “Speech Emotion Recognition”, International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-2 Issue-1, March 2012 pg 235-238.
Xu Dong An and Zhou Ruan (2021), “Speech Emotion Recognition algorithm based on deep learning algorithm fusion of temporal and spatial features”, Journal of Physics: Conference Series 1861 (2021) 012064, IOP Publishing doi:10.1088/1742-6596/1861/1/012064. DOI: https://doi.org/10.1088/1742-6596/1861/1/012064
Xingyu Cai et al., (2021), “Speech Emotion Recognition with Multi-task Learning”, INTERSPEECH 2021 30 August – 3 September, 2021, Brno, Czechia, pg 4508-4512. DOI: https://doi.org/10.21437/Interspeech.2021-1852
Rahul. B. Lanjewar et al., (2013), “Speech Emotion Recognition: A Review”, International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075 (Online), Volume-2 Issue-4, March 2013, Pg. 68-71.
Siddhant S. Patil et al., (2022), “Speech Emotion Recognition System Using Recurrent Neural Network in Deep Learning”, International Journal for Research in Applied Science & Engineering Technology (IJRASET), Volume 10 Issue III Mar 2022, pg. 2332-2338, ISSN: 2321-9653. https://doi.org/10.22214/ijraset.2022.41112. DOI: https://doi.org/10.22214/ijraset.2022.41112
Shubham Singh Chaudhary and Sachin Garg, (2021), “Speech Emotion Recognition”, International Research Journal of Modernization in Engineering Technology and Science, Volume 03, Issue: 12, December-2021, pg. 218-221, e-ISSN: 2582-5208.
Hemakumar G. et al., (2016), “Large Vocabulary in Continuous Speech Recognition Using HMM and Normal Fit”, International Journal of Computer Trends and Technology (IJCTT) – Volume 42 Number 2 – December 2016 Pg. 102-107, ISSN: 2231-2803. DOI: https://doi.org/10.14445/22312803/IJCTT-V42P117
Hemakumar G. et al., (2014) “Speakers’ Accent And Isolated Kannada Words Recognition”, American Journal of Computer Science & Information Technology, Publishing by PUBICON International Publication, Volume 2, Issue 2, Nov-Dec 2014, pg. no. 071-077, ISSN: 2349-3917.
Hemakumar G et al., (2025), “Large Language Model For Kannada Speech Recognition”, International Journal of Engineering Development and Research, July 2025, Volume 13, Issue 3, Pg 603 – 609, ISSN: 2321-9939. DOI: https://doi.org/10.56975/ijedr.v13i3.301580
M. Gopal, (2025), “Deep Learning Core Concepts, Methods and Application”, publication by Pearson India Education Services Pvt. Ltd, fourth impression 2025, ISBN 978-93-560-6197-2.
Hemakumar G. et al., (2014) "Automatic Segmentation of Kannada Speech Signal into Syllables and Sub-words: Noised and Noiseless Signals”, published by International Journal of Scientific & Engineering Research, Volume 5, Issue 1, January-2014, Page No 1707-1711, ISSN 2229-5518.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.