روشی جدید در تشخیص گوینده مستقل از متن در محیط‌های نویزی

نوع مقاله: مقاله پژوهشی

نویسندگان

1 کارشناس ارشد، شرکت پرشیان فولاد اصفهان

2 استادیار - دانشکده برق، دانشگاه آزاد اسلامی، واحد نجف آباد

چکیده

در این مقاله بازشناسی مقاوم به نویز گوینده در حالت مستقل از متن مورد توجه قرار گرفته است. روش پیشنهادی بر مبنای حذف سکوت از جملات و تقطیع آنها به واحدهای کوچک‌تر شامل چند آوا و حداقل یک واکه برای استخراج ویژگی‌های زمان‌بلند از جمله آنتروپی عمل می‌کند. یک واکه پرانرژی در هر قطعه گفتاری برای استخراج فرکانس پایه و فرمنت‌ها شناسایی می‌شود. با اعمال یک روش خوشه‌بندی، ویژگی‌های زمان‌کوتاه یعنی ضرایبِ MFCC با ویژگی‌های زمان‌بلند ترکیب می‌شوند. نتایج آزمایشات با استفاده از طبقه‌بندی کننده از نوع MLP نشان می‌دهد که میانگین نرخ بازشناسی گوینده با روش پیشنهادی در حالت بدون نویز 33/97% و در نسبت سیگنال به نویز 2- دسی‌بل 33/61% است که نسبت به روش‌های متداول بهبود نشان می‌دهد.  

کلیدواژه‌ها


عنوان مقاله [English]

A Novel Approach in Text-Independent Speaker Recognition in Noisy Environment

نویسندگان [English]

  • Nona Heydari Esfahani 1
  • Hamid Mahmoodian 2
1 MSc – Persian Foolal Company, Isfahan
2 Assistant Professor - Department of Electrical Engineering, Najafabad Branch, Islamic Azad University
چکیده [English]

In this paper, robust text-independent speaker recognition is taken into consideration. The proposed method performs on manual silence-removed utterances that are segmented into smaller speech units containing few phones and at least one vowel. The segments are basic units for long-term feature extraction. Sub-band entropy is directly extracted in each segment. A robust vowel detection method is then applied on each segment to separate a high energy vowel that is used as unit for pitch frequency and formant extraction. By applying a clustering technique, extracted short-term features namely MFCC coefficients are combined with long term features. Experiments using MLP classifier show that the average speaker accuracy recognition rate is 97.33% for clean speech and 61.33% in noisy environment for -2db SNR, that shows improvement compared to other conventional methods.

کلیدواژه‌ها [English]

  • Speaker identification
  • MFCC coefficients
  • pitch ferequency
  • formants
  • Shannon Entropy
  • MLP
[1] R. ShanthaSelvaKumari, S. SelvaNidhyananthan, G. Anand, "Fused Mel feature sets based text-independent speaker identification using Gaussian mixture model", Procedia Engineering, Vol. 30, pp. 319-326, 2012.
[2] K. Daqrouq, K.Y. Al Azzawi, "Average framing linear prediction coding with wavelet transform for text-independent speaker identification system", Computers & Electrical Engineering, Vol. 38, No. 6, pp. 1467-1479, Nov. 2012.
[3] A. Shafik, S.M. Elhalafawy, S.M. Diab, B.M. Sallam, F.E. Abd El-samie, "A wavelet based approach for speaker identification from degraded speech", International Journal of Communication Networks and Information Security (IJCNIS), Vol. 1, No. 3, Dec. 2009.
[4] M.I. Abdalla, S.A. Hanaa, "Wavelet-based mel-frequency cepstral coefficients for speaker identification using hidden markov models", JOURNAL OF TELECOMMUNICATIONS, Vol. 1, No 2, March 2010.
[5] K. Daqrouq, "Wavelet entropy and neural network for text-independent speaker identification", Engineering Applications of Artificial Intelligence, Vol. 24, No 5, pp. 796–802, Aug. 2011.
[6] Md. Murad Hossain, B. Ahmed, M. Asrafi, "A real time speaker identification using artificial neural network", 10th international conference on computer and information technology, iccit, pp.1-5, 27-29 Dec. 2007.
[7] E. Avci, "A new optimum feature extraction and classification method for speaker recognition: GWPNN ", Expert Systems with Applications, Vol. 32, No. 2, pp. 485–498, Feb. 2007.
[8] H. Harb, C. Liming, "Gender identification using a general audio classifier", Proceeding of the IEEE/ICME, Vol. 2, pp. II-733-736, July 2003.
[9] H. Harb, L. Chen, "Voice-based gender identification in multimedia applications", Journal of Intelligent Information Systems, Vol. 24, No. 2-3, pp. 179-198, March 2005.
[10] J.A. Bachorowski, M.J. Owren, "Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech", Journal of the Acoustical Society of America, Vol. 106, No. 2, pp. 1054–1063, Aug. 1999.
[11] A. Cherif, L. Bouafif, T. Dabbabi, "Pitch detection and formants analysis of arabic speech processing", Applied Acoustcs, Vol. 62, No. 10, pp. 1129–1140, Oct. 2001.
[12] A.M. Noll, "Cepstrum pitch determination", Journal of the Acoustical Society of America, Vol. 41, pp. 293-309, 1967.
[13] W. Yutai, L. Bo, J. Xiaoqing, L. Feng, W. Lihao, "Speaker recognition based on dynamic MFCC parameters", Proceeding of the IEEE/IASP, pp. 406-409, April 2009.
[14] S. Chougule, P.P. Rege, "Language independent speaker identification", Proceeding of the IEEE/ICIT, pp. 364-368, 15-17 Dec. 2006.
[15] S. Haykin, "Neural networks", Macmillan College Publishing Company, Section 5.3: The Steepest Descent Method, 1994.
[16] M. Katz, "Fractals and the analysis of waveforms", Computers in Biology and Medicine, Vol. 18, No. 3, pp. 145-156, 1988.
[17] J.D. Wu, B.F. Lin, "Speaker identification using discrete wavelet packet transform technique with irregular decomposition", Expert Systems with Applications, Vol. 36, No. 2, pp. 3136–3143, March 2009.
[18] S. Pandiaraj, H.N.R. Keziah, D.S. Vinothini, L. Gloria, "A confidence measure based – score fusion technique to integrate MFCC and Pitch for speaker verification", Proceeding of the IEEE/ICECT, Vol. 3, pp. 317-320, April 2011.