Klasifikasi Gender dan Usia berdasarkan Suara Pembicara Menggunakan Hidden Markov Model

  • Irfan Tri Handoko Telkom University
  • Suyanto Suyanto Telkom University
Abstract views: 359 , PDF downloads: 454
Tabel Evaluasi GMM-HMM downloads: 0

Abstract

Klasifikasi usia-genderberdasarkan suara sangat berguna dalam perkenalan pidato dan dalam pengenalan emosi. Klasifikasi genderjuga telah diterapkan dalam pengenalan wajah, peringkasan video, penentuan tingkat izin yang berbeda untuk kelompok umur yang berbeda, dan lainnya. Pengelompokan usia yang berbeda dibagi menjadi tiga kelompok: anak, muda, menengah, dan senior berdasarkan rentang usia tertentu. Penelitian ini berfokus pada klasifikasi usia-gender berdasarkan suara pembicara menggunakan gabungan Gaussian Mixture Modeldan Hidden Markov Model(GMM-HMM). Pertama, dilakukan pembangunan vektor ciri menggunakan Mel-Frequency Cepstrum Coefficient (MFCC). Selanjutnya, dilakukan pelatihan untuk menghasilkan model akustik untuk semua penutur (pria dan wanita dari berbagai usia) di dalam basisdata pelatihan. Terakhir, HMM diterapkan untuk mendeteksi genderdan kelompok usia. Pada penelitian ini, basisdata suara diambil dari situs Common Voice, yang berisi banyak posting blog, buku-buku lama, film, dan pidato publik lainnya. Hasil eksperimen menunjukkan bahwa model GMM-HMM yang telah dibangun mampu melakukan klasifikasi usia-genderdengan akurasi hingga 96,4%. Model ini dapat diperbaiki dengan pengaturan parameter secara lebih presisi dan penggunaan dataset yang lebih besar.

Kata Kunci: Klasifikasi, Mel-Frequency Cepstrum CoefficientAcoustic Models, Gaussian Mixture Model, Hidden Markov Model

Downloads

Download data is not yet available.

References

D. B. D. B. Arafat Abu Mallouh. SA Framework for Enhancing Speaker Age and Gender Classification by Using a New Feature Set and Deep Neural Network Architectures. THE SCHOOL OF ENGINEERING UNIVERSITY OF BRIDGEPORT CONNECTICUT, 2017.

B. Barkana and J. Zhou. A New Pitch-Range Based Feature Set for a Speaker’s Age and Gender Classification. Applied Acoustics, 98, 2015.

A. a. Dr. Yusra Al-Irhayim. Speaker Gender Recognition Using Hidden Markov Model. College of Computer Science and Mathematics, University of Mosul, Mosul Iraq, 2016.

F. Faek. Objective Gender and Age Recognition from Speech Sentences. Aro, The Scientific Journal of Koya University, 3 (2), 24-29. doi:10.14500/aro.10072, 2015.

H. Fayek. Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s in-between. https://haythamfayek.com/2016/04/21/ speech-processing-for-machine-learning.html. Online; Accessed 2 November 2018.

J. M. Jirı Pribil, Anna Pribilova. GMM-Based Speaker Age And Gender Classification In Czech And Slovak. Journal of ELECTRICAL ENGINEERING, 68:3–12, 2017.

J. M. Jirı Pribil, Anna Pribilova. GMM-Based Speaker Gender And Age Classification After Voice Conversion. Journal of ELECTRICAL ENGINEERING, 2017.

D. Katerenchuk. Age Group Classification With Speech And Metadata Multimodality Fusion. CUNY Graduate Center 365 Fifth Avenue, Room 4319 New York, USA, 2018.

D.Z.J.Z. W.Z.LianzhangZhu, LeimingChen. Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of Svm and Dbn. Sensors 2017, 17, 1694; doi:10.3390/s17071694, 7, 2017.

Z. R. T. D. T. Lozano-Diez, A. and J. Gonzalez-Rodriguez. An Analysis of The Influence of Deep Neural Network (DNN) Topology in Bottleneck Feature Based Language Recognition. Plos One, 12( 8). doi:10.1371/journal.pone.0182580, 2017.

J. Lyons. Mel Frequency Cepstral Coefficient (MFCC) Tutorial. http://practicalcryptography.com/ miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/. Online; Accessed 15 Desember 2018.

M.E.M.Fairhurst and M.D.Costa-Abreu. Selective Review and Analysis of Aging Effectsin Biometric System Implementation. IEEE Transactions on Human-Machine Systems, 45, 2015.

H.v.H, M.H.Bahari, M.McLarenand, D. Van Leeuwen. Speaker Age Estimation Using I-Vectors, Engineering Applications of Artificial Intelligence. Sensors 2017, 17, 1694; doi:10.3390/s17071694, 34, 2014.

K. D. Michael Henretty, Tilman Kamp, and T. C. V. Team. 500 Hours of Speech Recordings, with Speaker Demographics. https://www.kaggle.com/mozillaorg/common-voice. Online; Accessed 3 November 2018.

T. S. K. D. K. A. Z. I. S. C. Nagendra Kumar Goel, Mousmita Sarma. Extracting Speaker’s Gender, Accent, Age and Emotional State from Speech. Go-Vivace Inc., McLean, VA, USA, 2018.

P. Saikia. Hmm-dnn speech recognition techniques: a review. Gauhati University-Institute of Distance and Open Learning, Assam, India, 7:14068–14072, 2017.

F. B. Vej. The Mel Frequency Scale and Coefficients. http://kom.aau.dk/group/04gr742/pdf/MFCC_ worksheet.pdf. Online; Accessed 24 January 2019.

Voice-Academy. Male and Female Voices. https://uiowa.edu/voice-academy/male-female-voices. Online; Accessed 1 September 2018.

Published
2020-01-07
How to Cite
Handoko, I. T., & Suyanto, S. (2020). Klasifikasi Gender dan Usia berdasarkan Suara Pembicara Menggunakan Hidden Markov Model. Indonesia Journal on Computing (Indo-JC), 4(3), 99-106. https://doi.org/10.34818/INDOJC.2019.4.3.375
Section
Computer Science