In this work we give an overview of different state–of–the–art speaker and language recognition systems. We analyze some techniques to extract and model features from the acoustic signal and to model the speech content by means of phonetic decoding. We then present state–of–the–art generative systems based on latent variable models and discriminative techniques based on Support Vector Machines. We also present the author’s contributions to the field. These contributions cover the different topics presented in this work. First we propose an improvement to Neural Network training for speech decoding which is based on the use of General Purpose Graphic Processing Units computational framework. We also propose adaptations of latent variable models developed for speaker recognition to the field of language identification. A novel technique which enhances the generation of low–dimensional utterance representations for speaker verification is also presented. Finally, we give a detailed analysis of different training algorithms for SVM–based speaker verification and we propose a novel discriminative framework for speaker verification, the Pairwise SVM approach, which allows for fast utterance testing and allows to achieve very good recognition performance.

Speaker and Language Recognition Techniques / Cumani, Sandro. - (2012). [10.6092/polito/porto/2496928]

Speaker and Language Recognition Techniques

CUMANI, SANDRO
2012

Abstract

In this work we give an overview of different state–of–the–art speaker and language recognition systems. We analyze some techniques to extract and model features from the acoustic signal and to model the speech content by means of phonetic decoding. We then present state–of–the–art generative systems based on latent variable models and discriminative techniques based on Support Vector Machines. We also present the author’s contributions to the field. These contributions cover the different topics presented in this work. First we propose an improvement to Neural Network training for speech decoding which is based on the use of General Purpose Graphic Processing Units computational framework. We also propose adaptations of latent variable models developed for speaker recognition to the field of language identification. A novel technique which enhances the generation of low–dimensional utterance representations for speaker verification is also presented. Finally, we give a detailed analysis of different training algorithms for SVM–based speaker verification and we propose a novel discriminative framework for speaker verification, the Pairwise SVM approach, which allows for fast utterance testing and allows to achieve very good recognition performance.
2012
File in questo prodotto:
File Dimensione Formato  
phd_thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2496928
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo