This book captures the current challenges in automatic recognition of emotion in spontaneous speech and makes an effort to explain, elaborate, and propose possible solutions. Intelligent humancomputer interaction (iHCI) systems thrive on several technologies like automatic speech recognition (ASR); speaker identification; language identification; image and video recognition; affect/mood/emotion analysis; and recognition, to name a few. Given the importance of spontaneity in any humanmachine conversational speech, reliable recognition of emotion from naturally spoken spontaneous speech is crucial. While emotions, when explicitly demonstrated by an actor, are easy for a machine to recognize, the same is not true in the case of day-to-day, naturally spoken spontaneous speech. The book explores several reasons behind this, but one of the main reasons for this is that people, especially non-actors, do not explicitly demonstrate their emotion when they speak, thus making it difficult for machines to distinguish one emotion from another that is embedded in their spoken speech. This short book, based on some of authors previously published books, in the area of audio emotion analysis, identifies the practical challenges in analysing emotions in spontaneous speech and puts forward several possible solutions that can assist in robustly determining the emotions expressed in spontaneous speech.Introduction.- Literature Survey.- A Framework for Spontaneous Speech Emotion Recognition.- Improving Emotion Classication Accuracies.- Case Studies.- Conclusions.- Appendix.- Index.
Rupayan Chakraborty (Member IEEE) works as a scientist at the TCS Research and Innovation - Mumbai. He has been working in the area of speech and audio signal processing/recognition since 2008 and was involved in academic research prior to joining TCS. He worked as a researcher at the Computer Vision and Pattern Recognition (CVPR) Unit of the Indian Statistical Institute (ISI)l“2