TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION

Efe Tankut Yaparoğlu; Yavuz Şenol

doi:10.28948/ngumuh.516805

Research Article

TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION

Year 2019, Volume: 8 Issue: 1, 48 - 58, 28.01.2019

Efe Tankut Yaparoğlu Yavuz Şenol

https://doi.org/10.28948/ngumuh.516805

Abstract

In this
paper, a two-stage decision making algorithm is proposed for the task of
speaker verification. This two-stage algorithm aims to eliminate the
first-stage qualifying impostors by the help of impostor-resistant structure in
the second stage. First, a baseline system is formed using mel-frequency
cepstral coefficients (MFCC) as features and, a radial basis function (RBF)
neural network for speaker modelling. Then, the investigations have been
realized for optimizing the training set by means of two issues: (1) the ratio
of impostor features to genuine speaker features, (2) the ratio of same gender
features to opposite gender features (in respect of the genuine speaker) within
the impostor speakers’ set. Last, the two-stage decision making algorithm is presented,
and the performance enhancement provided by the two-stage system is given with
the test results.

Keywords

Speaker verification, Training set optimization, RBF neural network, MFCC, Cohort

References

[1] WIQAS G., NAVDEEP S., “Literature Review on Automatic Speech Recognition”. International Journal of Computer Applications, 41, 42-50, 2012.
[2] SHIKHA G., AMIT P., ACHAL S., “A Study on Speech Recognition System: A Literature Review”, International Journal of Science, Engineering and Technology Research (IJSETR), 3, 2192-2196, 2014.
[3] LIU Y, QIAN Y., CHAN N., FU T., ZHANG Y., YU K., “Deep Feature for Text-dependent Speaker verification”, Speech Communication, 73, 1–13, 2015.
[4] BHATTACHARYYA S, SRIKANTHAN T, KRISHNAMURTHY P, “Ideal GMM parameters and posterior log likelihood in speaker verification”, Proc. IEEE Signal Processing Soc. Neural Networks for Signal Processing XI, 471-480, 2001.
[5] XU Y., SHEN F., ZHAO J., “An incremental learning vector quantization algorithm for pattern classiﬁcation”. Neural Computing and Applications, 21, 1205–1215, 2012.
[6] GALES M., YOUNG S., “The Application of Hidden Markov Models in Speech Recognition”. Foundations and Trends in Signal Processing, 1, 195–304, 2007.
[7] PATEL I., SRINIVAS Y. R., “A Frequency Spectral Feature Modeling for Hidden Markov Model Based Automated Speech Recognition” Recent Trends in Networks and Communications, Communications in Computer and Information Science, 90, 134-143. Springer, Berlin, Heidelberg, 2010.
[8] KAMRUZZAMAN S. M., A. N. M. REZAUL KARIM A. N. M., ISLAM S., HAQUE E., “Speaker Identification using MFCC-Domain Support Vector Machine”, International Journal of Electrical and Power Engineering, 1, 274-278, 2007.
[9] NAIR P. G., NAIR R., “Efficient Speaker Identification Using Artificial Neural Network”, International Journal of Electronics & Communication Technology (IJECT), 6, 27-30, 2015.
[10] SWAMY S., SHALINI T., NAGABHUSHAN S.P., NAWAZ S., RAMAKRISHNAN K.V., “Text Dependent Speaker Identification and Speech Recognition Using Artificial Neural Network” Global Trends in Computing and Communication Systems. Communications in Computer and Information Science, 269, 160-168. Springer, Berlin, Heidelberg, 2012.
[11] YUE X, YE D, ZHENG C, WU X, “Neural networks for improved text-independent speaker identification”, IEEE Engineering in Medicine and Biology Magazine, 53-58, 2002.
[12] MUSTA E., KOMINI V., “A Comparative Study Of Linear Predictive Analysis Methods With Application To Speaker Identification Over a scripting programing”, Journal of Multidisciplinary Engineering Science and Technology (JMEST), 2, 2881-2885, 2015.
[13] SINGH A. K., SINGH R, DWIVEDI A., “Mel Frequency Cepstral Coefficients Based Text Independent Automatic Speaker Recognition Using Matlab”, International Conference on Reliability, Optimization and Information Technology (ICROIT), 524-527, Haryana, India, 2014.
[14] DAS A., JENA M.R., BARIK K. K., “Mel-Frequency Cepstral Coefficient (MFCC) - a Novel Method for Speaker Recognition”, Digital Technologies, 1,1-3, 2014.
[15] PANDIARAJ S., SHANKAR KUMAR K. R., “Speaker Identification Using Discrete Wavelet Transform”, Journal of Computer Science, 11, 53-56, 2015.
[16] HALDAR R., MISHRA P. K., “Multilingual Speech Recognition Using Radial Basis Function (RBF) Neural Network”, International Research Journal of Engineering and Technology (IRJET), 3, 2856-2862, 2016 .
[17] SHARMA S., SHUKLA A., MISHRA P., “Speech and Language Recognition using MFCC and DELTA-MFCC”, International Journal of Engineering Trends and Technology (IJETT), 12, 449-452, 2014.
[18] http://www.phon.ox.ac.uk/files/apps/old_IViE/download1.php (erişim tarihi 08.01.2018)
[19] HAYKIN S., Neural Networks a Comprehensive Foundation (2nd ed), Prentice Hall Inc. USA, 1999.
[20] JHANWAR N., RAINA., “Pitch Correlogram Clustering for Fast Speaker Identification”, EURASIP Journal on Applied Signal Processing, 17, 2640-2649, 2004.
[21] DJEMILI R., BOUROUBA H., KORBA M.C.A., “A Speech Signal Based Gender Identification System Using Four Classifiers”, 2012 International Conference on Multimedia Computing and Systems, 1-4, Tangier, Morocco, 10-12 May 2012.
[22] DJEMİLİ R., BOUROUBA H., KORBA M.C.A., O’SAUGHNESSY D., “Boosting Speaker Identification Performance Using a Frame Level Based Algorithm”, International Conference on Communications, Signal Processing, and their Applications (ICCSPA'15), 1-6, Sharjah, United Arab Emirates, 17-19 Feb. 2015.

KONUŞMACI DOĞRULAMA İÇİN EĞİTİM SETİ OPTİMİZASYONLU İKİ AŞAMALI KARAR VERME ALGORİTMASI

Year 2019, Volume: 8 Issue: 1, 48 - 58, 28.01.2019

Efe Tankut Yaparoğlu Yavuz Şenol

https://doi.org/10.28948/ngumuh.516805

Abstract

Bu çalışmada, konuşmacı doğrulama görevi
için iki aşamalı bir karar verme algoritması önerilmiştir. Bu iki aşamalı
algoritma, ikinci aşamada sahtekarlara dayanıklı yapı sayesinde ilk aşamayı
geçen sahtekârları ortadan kaldırmayı amaçlıyor. Birinci aşamada, öznitelik
olarak mel-frekanslı sepstral katsayılar (MFCC) kullanılarak temel bir sistem
oluşturulmuş ve bir radyal taban fonksiyonu (RBF) sinir ağı kullanılarak
konuşmacı modellemesi gerçekleştirilmiştir. Ardından, eğitim setini iki kısımda
optimize etmek için araştırmalar gerçekleştirildi: (1) taklitçi konuşmacı
özniteliklerinin gerçek konuşmacı özniteliklerine oranı, (2) taklitçi konuşmacı
kümesi içinde aynı cinsiyet özniteliklerinin zıt cinsiyet özniteliklerine oranı
(gerçek konuşmacıya bağlı olarak). Son olarak, iki aşamalı karar verme
algoritması sunulmuş ve iki aşamalı sistem tarafından sağlanan performans
artışı test sonuçlarıyla birlikte verilmiştir.

Keywords

Konuşmacı doğrulama, Eğitim kümesi optimizasyonu, RBF yapay sinir ağları, MFCC, Cohort

References

[1] WIQAS G., NAVDEEP S., “Literature Review on Automatic Speech Recognition”. International Journal of Computer Applications, 41, 42-50, 2012.
[2] SHIKHA G., AMIT P., ACHAL S., “A Study on Speech Recognition System: A Literature Review”, International Journal of Science, Engineering and Technology Research (IJSETR), 3, 2192-2196, 2014.
[3] LIU Y, QIAN Y., CHAN N., FU T., ZHANG Y., YU K., “Deep Feature for Text-dependent Speaker verification”, Speech Communication, 73, 1–13, 2015.
[4] BHATTACHARYYA S, SRIKANTHAN T, KRISHNAMURTHY P, “Ideal GMM parameters and posterior log likelihood in speaker verification”, Proc. IEEE Signal Processing Soc. Neural Networks for Signal Processing XI, 471-480, 2001.
[5] XU Y., SHEN F., ZHAO J., “An incremental learning vector quantization algorithm for pattern classiﬁcation”. Neural Computing and Applications, 21, 1205–1215, 2012.
[6] GALES M., YOUNG S., “The Application of Hidden Markov Models in Speech Recognition”. Foundations and Trends in Signal Processing, 1, 195–304, 2007.
[7] PATEL I., SRINIVAS Y. R., “A Frequency Spectral Feature Modeling for Hidden Markov Model Based Automated Speech Recognition” Recent Trends in Networks and Communications, Communications in Computer and Information Science, 90, 134-143. Springer, Berlin, Heidelberg, 2010.
[8] KAMRUZZAMAN S. M., A. N. M. REZAUL KARIM A. N. M., ISLAM S., HAQUE E., “Speaker Identification using MFCC-Domain Support Vector Machine”, International Journal of Electrical and Power Engineering, 1, 274-278, 2007.
[9] NAIR P. G., NAIR R., “Efficient Speaker Identification Using Artificial Neural Network”, International Journal of Electronics & Communication Technology (IJECT), 6, 27-30, 2015.
[10] SWAMY S., SHALINI T., NAGABHUSHAN S.P., NAWAZ S., RAMAKRISHNAN K.V., “Text Dependent Speaker Identification and Speech Recognition Using Artificial Neural Network” Global Trends in Computing and Communication Systems. Communications in Computer and Information Science, 269, 160-168. Springer, Berlin, Heidelberg, 2012.
[11] YUE X, YE D, ZHENG C, WU X, “Neural networks for improved text-independent speaker identification”, IEEE Engineering in Medicine and Biology Magazine, 53-58, 2002.
[12] MUSTA E., KOMINI V., “A Comparative Study Of Linear Predictive Analysis Methods With Application To Speaker Identification Over a scripting programing”, Journal of Multidisciplinary Engineering Science and Technology (JMEST), 2, 2881-2885, 2015.
[13] SINGH A. K., SINGH R, DWIVEDI A., “Mel Frequency Cepstral Coefficients Based Text Independent Automatic Speaker Recognition Using Matlab”, International Conference on Reliability, Optimization and Information Technology (ICROIT), 524-527, Haryana, India, 2014.
[14] DAS A., JENA M.R., BARIK K. K., “Mel-Frequency Cepstral Coefficient (MFCC) - a Novel Method for Speaker Recognition”, Digital Technologies, 1,1-3, 2014.
[15] PANDIARAJ S., SHANKAR KUMAR K. R., “Speaker Identification Using Discrete Wavelet Transform”, Journal of Computer Science, 11, 53-56, 2015.
[16] HALDAR R., MISHRA P. K., “Multilingual Speech Recognition Using Radial Basis Function (RBF) Neural Network”, International Research Journal of Engineering and Technology (IRJET), 3, 2856-2862, 2016 .
[17] SHARMA S., SHUKLA A., MISHRA P., “Speech and Language Recognition using MFCC and DELTA-MFCC”, International Journal of Engineering Trends and Technology (IJETT), 12, 449-452, 2014.
[18] http://www.phon.ox.ac.uk/files/apps/old_IViE/download1.php (erişim tarihi 08.01.2018)
[19] HAYKIN S., Neural Networks a Comprehensive Foundation (2nd ed), Prentice Hall Inc. USA, 1999.
[20] JHANWAR N., RAINA., “Pitch Correlogram Clustering for Fast Speaker Identification”, EURASIP Journal on Applied Signal Processing, 17, 2640-2649, 2004.
[21] DJEMILI R., BOUROUBA H., KORBA M.C.A., “A Speech Signal Based Gender Identification System Using Four Classifiers”, 2012 International Conference on Multimedia Computing and Systems, 1-4, Tangier, Morocco, 10-12 May 2012.
[22] DJEMİLİ R., BOUROUBA H., KORBA M.C.A., O’SAUGHNESSY D., “Boosting Speaker Identification Performance Using a Frame Level Based Algorithm”, International Conference on Communications, Signal Processing, and their Applications (ICCSPA'15), 1-6, Sharjah, United Arab Emirates, 17-19 Feb. 2015.

There are 22 citations in total.

Details

Primary Language	English
Subjects	Electrical Engineering
Journal Section	Electrical and Electronics Engineering
Authors	Efe Tankut Yaparoğlu This is me 0000-0003-1537-1237 Yavuz Şenol 0000-0002-3686-5597
Publication Date	January 28, 2019
Submission Date	February 8, 2018
Acceptance Date	September 26, 2018
Published in Issue	Year 2019 Volume: 8 Issue: 1

Cite

APA	Yaparoğlu, E. T., & Şenol, Y. (2019). TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, 8(1), 48-58. https://doi.org/10.28948/ngumuh.516805
AMA	Yaparoğlu ET, Şenol Y. TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION. NOHU J. Eng. Sci. January 2019;8(1):48-58. doi:10.28948/ngumuh.516805
Chicago	Yaparoğlu, Efe Tankut, and Yavuz Şenol. “TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 8, no. 1 (January 2019): 48-58. https://doi.org/10.28948/ngumuh.516805.
EndNote	Yaparoğlu ET, Şenol Y (January 1, 2019) TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 8 1 48–58.
IEEE	E. T. Yaparoğlu and Y. Şenol, “TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION”, NOHU J. Eng. Sci., vol. 8, no. 1, pp. 48–58, 2019, doi: 10.28948/ngumuh.516805.
ISNAD	Yaparoğlu, Efe Tankut - Şenol, Yavuz. “TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 8/1 (January 2019), 48-58. https://doi.org/10.28948/ngumuh.516805.
JAMA	Yaparoğlu ET, Şenol Y. TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION. NOHU J. Eng. Sci. 2019;8:48–58.
MLA	Yaparoğlu, Efe Tankut and Yavuz Şenol. “TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, vol. 8, no. 1, 2019, pp. 48-58, doi:10.28948/ngumuh.516805.
Vancouver	Yaparoğlu ET, Şenol Y. TWO-STAGE DECISION MAKING ALGORITHM FOR SPEAKER VERIFICATION WITH TRAINING SET OPTIMIZATION. NOHU J. Eng. Sci. 2019;8(1):48-5.

Download Cover Image

Article Files

Full Text

download