تشخیص حالت احساسی از سیگنال گفتار در حالت مستقل از گوینده با استفاده از آنتروپی بسته موجک

نوع مقاله: مقاله پژوهشی

نویسندگان

1 گروه برق، موسسه آموزش عالی بنیان، شاهین شهر، اصفهان، ایران

2 دانشگاه آزاد اسلامی واحد نجف آباد

3 دانشجوی دکتری- دانشکده مهندسی کامپیوتر، دانشگاه مدیترانه شرقی، ترکیه

چکیده

در این مقاله آنتروپی بسته موجک برای بازشناسی احساسات از گفتار در حالت مستقل از گوینده پیشنهاد شده است. پس از پیش‌پردازش، بسته موجکِ db3 سطح 4 در هر فریم محاسبه شده است و آنتروپی شانون در گره‌های آن به عنوان ویژگی در نظر گرفته شده است. ضمناً ویژگی‌های نواییِ گفتار شامل فرکانس چهار فرمنت اول، جیتر یا دامنه تغییرات فرکانس گام و شیمر یا دامنه تغییرات انرژی به عنوان ویژگی‌های پرکاربرد در حوزه تشخیص احساسات در کنار ضرایب فرکانسی کپسترال مل (MFCC) برای تکمیل بردار ویژگی مورد استفاده قرار گرفته‌اند. طبقه‌بندی با استفاده از ماشین بردار پشتیبان (SVM) انجام شده است و ترکیب‌های مختلفی از بردار ویژگی در حالت چند دسته‌ای برای همه احساسات و دودسته‌ای نسبت به حالت طبیعی مورد بررسی قرار گرفته‌اند. 46 بیانِ مختلف از جمله واحد در دادگان احساسی دانشگاه برلین به زبان آلمانی انتخاب شده که توسط 10 گوینده مختلف با حالت‌های احساسی ناراحتی، خوشحالی، ترس، ملالت، خشم و حالت طبیعی بیان شده‌اند. نتایج نشان می‌دهند استفاده از ضرایب آنتروپی به عنوان بردار ویژگی نرخ بازشناسی را در حالت چند دسته‌ای بهبود می‌بخشد. علاوه بر آن ویژگی‌های پیشنهادی در ترکیب با سایر ویژگی‌ها باعث بهبود نرخ تشخیص احساس خشم، ترس و خوشحالی نسبت به حالت طبیعی می‌شوند.

کلیدواژه‌ها


عنوان مقاله [English]

Wavelet Packet Entropy in Speaker-Independent Emotional State Detection from Speech Signal

نویسندگان [English]

  • Mina Kadkhodaei Elyaderani 1
  • Seyed Hamid Mahmoodian 2
  • Ghazaal Sheikhi 3
1 Bonyan Institute of Higher Education Shahinshahr, Isfahan, Iran
2 Assistant Professor - Electrtical Engineering Department, Najafabad Branch, Islamic Azad University, Najafabad, Esfahan, Iran
3 Phd Student, Computer Engineering Department, Eastern Mediterranean University, Turkey
چکیده [English]

In this paper, wavelet packet entropy is proposed for speaker-independent emotion detection from speech. After pre-processing, wavelet packet decomposition using wavelet type db3 at level 4 is calculated and Shannon entropy in its nodes is calculated to be used as feature. In addition, prosodic features such as first four formants, jitter or pitch deviation amplitude, and shimmer or energy variation amplitude besides MFCC features are applied to complete the feature vector. Then, Support Vector Machine (SVM) is used to classify the vectors in multi-class (all emotions) or two-class (each emotion versus normal state) format. 46 different utterances of a single sentence from Berlin Emotional Speech Dataset are selected. These are uttered by 10 speakers in sadness, happiness, fear, boredom, anger, and normal emotional state. Experimental results show that proposed features can improve emotional state detection accuracy in multi-class situation. Furthermore, adding to other features wavelet entropy coefficients increase the accuracy of two-class detection for anger, fear, and happiness.

کلیدواژه‌ها [English]

  • Speech emotion recognition
  • wavelet Packet
  • shannon entropy coefficients
  • Support Vector Machine
[1] M. Ayadi, M. Kamel, "Servey on speech emotion recognition: Features, classification schemes, and databases", Pattern Recognition, Vol. 44, pp. 72-587, 2011.

[2] D. Ververidis, C. Kotropoulos, "Emotional speech recognition: Resources, features, and methods", Speech Communication, Vol. 48, pp. 1162–1181, 2006.

[3] B. Schuller, G. Rigoll, M. Long, "Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine –belief network architecture", Proceedings of the IEEE/ICASSP, Vol. 1, pp. 577–580, May 2004.

[4] France,et. al., "Acoustical properties of speech as indicators of depression and suicidal risk", IEEE Trans. on Biomedical Engineering, Vol. 47, No. 7, pp. 829-837,  2000.

[5] T. Pao, C. Wang, "A study on the search of the most discriminative speech features in the speaker dependent speech emotion recognition", Proceeding of the IEEE/PAAP, pp. 157-162, 2012.

[6] C. Busso, S. Lee, S. Narayanan, "Analysis of emotionally salient aspects of fundamental frequency for emotion detection", IEEE Trans. on Audio Speech Language Process, Vol. 17, pp. 582–596, 2009.

[7] B. Schuller, et. al.,"The relevance of feature type for the automatic classification of emotional user states: Low level descriptors and functionals", Proc. Inter speech, pp. 2253–2256, 2007.

[8] B. Vlasenkoet. al, "Combiningframe and turn-level information for robust recognition of emotions within speech", Proc. Interspeech, pp. 2225–2228, 2007.

[9] X. Mao, L. Chenand L. Fu, "Multi-level speech emotion recognition based on HMM and ANN", Proceeding of the World Cong. on Computer Science and Information Engineering, 2009.

[10] T.Polzehl, et. al, "Anger recognition in speech using acoustic and linguistic cues", Speech Communication, Vol. 53, pp. 1198–1209, 2011

[11] H.Marvi, Z.Esmaileyan, A.Harimi, "Estimation of LPC coefficients using evolutionary algorithms", Journal of AI and Data mining, Vol. 1,  pp. 111–118, 2013.

[12] L.S. Chen,et. al, "Emotion recognition from audiovisual information", Proceeding of the IEEE/MMSP, pp. 83–88, Redondo Beach, CA, Dec. 1998.

[13] X.Li, "Speech feature toolbox design and emotional speech feature extraction", Thesis Submitted to the Faculty of Graduate School, Marquette University, In Partial Fulfillment of the Requirements for the Degree of Master of Science

[14] Y.Pan, P. Shen, L.Shen, "Feature extraction and selection in speech emotion recognition", Proceeding of the onlinepresent.org, Vol. 2, pp. 64 -69, 2012.

[15] M. Gaurav, "Performance analyses of spectral and prosodic features and their fusion for emotion recognition in speech", Proceeding of the IEEE/SLT, pp. 313-316, Goa, Dec. 2008.

[16] T.Athanaselist, S.Bakamidis, "ASR for emotional speech: clarifying the issues and enhancing performance", Journal of Neural Network, Vol. 18, pp. 437-444, 2005.

[17] K.Daqrouq, "Wavelet entropy and neural network for text-independent speaker identification", Journal Engineering Applications of Artificial Intelligence, Vol. 24, No. 5, pp. 796–802, 2011.

[18] Y.Pan, P. Shen, L.Shen, "Speech Emotion Recognition using  support vector machine", International Journal of Smart Home, Vol. 6, No. 2, pp. 101 -108, 2012.

[19] A.Statinkov, et. al, "A Gentle introduction to support vector machines in biomedicine", world scientific.2011

[20] A. Cherif, L. Bouafif, T. Dabbabi, "Pitch detection and formants analysis of Arabic speech processing", Applied Acoustics, Vol. 62, No. 10, pp. 1129–1140, 2001.

[21] J. Clark, C. Yallop, J. Fletcher, "An introduction to phonetics and phonology", 3rded.malden MA, USA: Blackwell publishers.

[22] M. Kadkhodaee, G.H. Sheikhi, H. Mahmoodian, "Survey on time–frequency features for speaker emotion recognition in persian", National Conference Shushtar, 2014.

[23] I. Elamvazuthi, G. Ling, K. Nurhanim, P. Vasant, S. Parasuraman, " Surface electromyography feature extraction based on daubechies wavelets", Proceeding of the ICIEA, pp. 1492–1495, 2013.

[24] S.Ntalampiras, N.Fakotakis, "Modeling the temporal evolution of acoustic parameters for speech emotion recognition", IEEE Trans. on Affective Computing, Vol. 3, No. 1, pp. 116–125, 2012.