UGR Nancy

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

HIWIRE MEETING Nancy, July 6-7, 2006: 

HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

Schedule: 

Schedule Non-linear feature normalization for mobile platform Integration scheme Results and discussion Rapid speaker adaptation Combination of adaptation at signal level and acoustic model level Results and discussion Assessment of two non-linear techniques for feature normalization Non-linear parametric equalization Model based feature compensation (VTS) New improvements in robust VAD Model based VAD

HIWIRE MEETING Nancy, July 6-7, 2006: 

HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

Schedule: 

Schedule Non-linear feature normalization for mobile platform Integration scheme Results and discussion Rapid speaker adaptation Combination of adaptation at signal level and acoustic model level Results and discussion Assessment of two non-linear techniques for feature normalization Non-linear parametric equalization Model based feature compensation (VTS) New improvements in robust VAD Model based VAD

Non-linear Parametric Equalization: 

Non-linear Parametric Equalization Feature normalization Motivation of PEQ: Limitation of linear methods: Cepstral Mean Normalization Cepstral Mean and Variance Normalization Limitation of non-linear methods (HEQ, OSEQ): Speech/non-speech ratio Estimation problems Parametric Equalization PEQ: Two Gaussian Model (speech / non-speech) Training of clean Gaussians; estimation of noisy Gaussians Non-linear transformation: combination of two linear transformations (one for speech, one for non-speech)

Non-linear Parametric Equalization: 

Non-linear Parametric Equalization Aurora-2 results: Aurora-4 results:

Non-linear Parametric Equalization: 

Non-linear Parametric Equalization Additional problem of non-linear transformations: Once the transformation is estimated, it is an “instantaneous transformation” Temporal correlations are not exploited Temporal Smoothing (TES): Each equalized cepstrum is time-filtered with an ARMA filter that restores autocorrelation of clean data

Non-linear Parametric Equalization: 

Non-linear Parametric Equalization Aurora-2 results: Aurora-4 results: TES TES

Model Based Feature Compensation (VTS): 

Model Based Feature Compensation (VTS) VTS feature normalization: Performed in log-FBE domain, (previous to DCT) Based on a Gaussian mixture model trained with clean speech Allows feature compensation and uncertainty estimation Summary of VTS (vector Taylor series approach): Given the noisy conditions, VTS provides a noisy Gaussian from each clean Gaussian The noisy Gaussian mixture model allow the computation of the probabilities P(k|y) An estimation of the clean speech x is then possible An estimation of the uncertainty is also possible

Model Based Feature Compensation (VTS): 

Model Based Feature Compensation (VTS) Step 1: Estimation of a noisy Gaussian from a clean Gaussian: where the function g0, f0 and h0 are evaluated at the mean of the clean Gaussian and at the mean of the noise:

Model Based Feature Compensation (VTS): 

Model Based Feature Compensation (VTS) Step 2: Estimation of P(k|y): is the k-th Gaussian evaluated at the noisy speech y, and P(k) is the a-priori probability of the Gaussian. where: Step 3: Estimation of clean speech:

Model Based Feature Compensation (VTS): 

Model Based Feature Compensation (VTS) Step 4: Estimation of uncertainty: the uncertainty of the clean speech can be estimated as: and from the estimation of the clean speech: assuming small values of the variance of the noise:

Model Based Feature Compensation (VTS): 

Aurora-2 results: Model Based Feature Compensation (VTS) Some considerations about VTS: Computational load Better than HEQ, PEQ, etc., but only valid for additive noise or channel distortion Estimation of noise is critical There are some approximations in the formulation Uncertainty: small improvement (insert., substit., delet.) Alternative: model-based compensation based on numerical integration of pdfs

Schedule: 

Schedule Non-linear feature normalization for mobile platform Integration scheme Results and discussion Rapid speaker adaptation Combination of adaptation at signal level and acoustic model level Results and discussion Assessment of two non-linear techniques for feature normalization Non-linear parametric equalization Model based feature compensation (VTS) New improvements in robust VAD Model based VAD

Model-based VAD: 

Model-based VAD Fundamentals of model-based VAD: Gaussian mixture model in log-FBE domain Gaussian mixture model trained with clean speech VTS provides a noisy version of the GMM From the noisy GMM, P(k|y) can be estimated for each observation y and each Gaussian k A-priori probability of kth Gaussian being speech P(V|k) can be estimated from the training data Then, the probability P(V|y) of the noisy observation y being speech is given by:

Model-based VAD: 

Model-based VAD Some considerations about model-based VAD: VAD decision relies on a Gaussian mixture model trained with clean speech (based on speech events observed in the training database) Not based on energy.... Based on observations in the log-FBE domain VTS adapts the Gaussian mixture to noisy conditions: the performance of the VAD is expected to be stable for a wide range of SNRs Computational load

Model-based VAD: 

Model-based VAD Model-based VAD for different SNRs:

Model-based VAD: 

Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

Model-based VAD: 

Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

Model-based VAD: 

Aurora-2 recognition results (WAcc): Model-based VAD Baseline: 60.5 % (no VAD, no WF, no FD)

HIWIRE MEETING Nancy, July 6-7, 2006: 

HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

authorStream Live Help