1 / 18

Noise Supression Techniques for Speech Enhancement Using Adaptive Filtering

Noise Supression Techniques for Speech Enhancement Using Adaptive Filtering. Derek Shiell 03/09/2006 ECE 463: Project Presentation Professor Michael Honig. Overview. Objective/Problem Description Applications Overview of Noise Reduction Methods System Description Filter analysis

nicolette
Download Presentation

Noise Supression Techniques for Speech Enhancement Using Adaptive Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Noise Supression Techniques for Speech Enhancement Using Adaptive Filtering Derek Shiell 03/09/2006 ECE 463: Project Presentation Professor Michael Honig

  2. Overview • Objective/Problem Description • Applications • Overview of Noise Reduction Methods • System Description • Filter analysis • Linear methods • Wiener approximation • KLT preprocessing • Signal subspace embedding • Kalman filter based methods • Non-linear methods • Current results • Future work • Implementation/ practical considerations • Conclusions

  3. Objective/Problem Description The goal of my project was to research noise reduction techniques specifically for automatic speech recognition system front-end processing on a single microphone without an independent noise recording or clean reference signal.

  4. Applications • Cell phone speech enhancement • Automatic speech recognition • Speaker identification • Biomedical signal processing (1) (2) • http://images.businessweek.com/mz/04/45/techbuy/images/razr_phone.jpg • http://www.nanopac.com/images/smnsbox.jpg • http://ldt.stanford.edu/~sgilutz/Shulis_Portfolio/fall/hci/images/sensory.jpg (3)

  5. Overview of Speech Enhancement • Microphone Array Processing • Utilizing multiple microphones, blind source separation (BSS) techniques such as independent component analysis (ICA) may be used to distinguish one speaker from other directional or diffuse noises. • Active echo/noise cancellation (ANC) • In this case, the echo or noise is estimated and re-generated with opposite phase to destructively interfere with the original echo or noise. • Blind noise suppression • In this case, there is a single speech signal corrupted by noise, no separate noise recording with which to make noise estimates, and no source signal to reference.

  6. System Descriptions BSS/ICA ANC Active Noise Cancellation with single microphone/speaker [4] BSS based on frequence domain ICA [6] Blind Noise Reduction Blind noise reduction schematic [1]

  7. Filter Analysis (1) Linear MMSE (Wiener approximation) MMSE cost function Reduces to (frame length N):

  8. Filter Analysis (2) Linear Estimation (continued) Signal is estimated from a linear filtering of the corrupted signal Minimizing the MMSE cost function with respect to w the result is as follows: This is an approximation to the Wiener solution where we are estimating the crosscorrelation vector pwith (ry – rn) (similar to spectral subtraction)

  9. Filter Analysis (3) Linear estimation with Karhunen Lòeve Transform (KLT) Preprocessing the signal using KLT (or PCA) separates the signal into its directions of greatest variance. Using the transform the signal can be mapped into a lower dimensional space which helps decorrelate the signal from noise. For a changing signal this requires that U be adaptively updated. Define U the KLT transform as the eigenvectors of Ry the autocorrelation matrix of the noisy signal. Using this transformation we can define the transformed yk as: The resulting closed form solution for the weight vector is:

  10. Filter Analysis (4) Signal subspace embedding This method allows for a matrix of gain factors, W, rather than simply a weight vector, w (MIMO) so that a simultaneous block estimate of can be made. In addition the matrix Q can be chosen as either I or to taper the tap weights by some factor(s) such that is emphasized more in the minimization phase. MMSE cost function: Update Equations for the filter matrix and transform basis can be found iteratively:

  11. Filter Analysis (5) Kalman Filtering Approaches Kalman filters are widely used in speech enhancement and much theoretical work has been done analyzing Kalman filters. The Kalman filter is the minimum mean-square estimator of the state of a linear dynamical system and can be used to derive many types of RLS filters. Extended Kalman filters can be expanded to handle nonlinear models through a linearization process. Kalman filters have the advantages that they are: • more robust (stationarity not assumed) • require only the previous estimate for the next estimation (versus all passed values for instance) • computationally efficient Standard linear state-space model for Kalman filter

  12. Filter Analysis (6) Nonlinear filtering Many nonlinear filtering methods exist to suppress noise in noisy speech. Examples include filters based on neural networks or phase space reconstruction. In general, they are very complex to analyze, but do not require estimation of noise or speech spectra and are not characterized by “musical tone” artifacts. Feed forward neural network (1) Phase space reconstruction for different speech phonemes [9] • http://research.yale.edu/ysm/images/78.2/articles-neural-network.jpg

  13. Typical Results Segmental SNR results (left) and SNR results (below) for various linear and nonlinear noise reduction methods [8] Noisy Speech Signal (white noise) Wiener Filtered Ephraim Filtered • Comparison of segmental SNR performance for different noise sources: • White noise (SNR 6.08 dB) • Pink noise (SNR 4.34 dB) • Factory noise (SNR 5.16 dB) • F16 noise (SNR 4.61 dB) • a) Linear estimation b) linear estimation with KLT preprocessing c) signal subspace embedding d) weighted signal subspace embedding e) NN with KLT f) linear with clean target g) nonlinear with clean target h) standard spectral subtraction method (3dB segmental SNR ~ 5dB SNR) [1]

  14. Future Work • Perform ASR after noise reduction filtering • AVICAR database • Data collected in a car environment • Time varying SNR • No independent noise recording (detecting speech is difficult) • Experiments • KLT preprocessing + linear estimation (Wiener) • Ephraim filter (ML short time spectral amplitude estimator) • Nonlinear methods

  15. Implementation/Practical Considerations • Real-time processing • Applications require computationally efficient algorithms to be feasible. • Determining noise sample • Single microphone, speech detection to estimate noise statistics is difficult. • Use visual information to detect speech or nonlinear noise reduction methods

  16. Conclusions • Noise suppression methods have become increasingly important due to the proliferation of mobile devices, ASR systems, and biometrics/ bioinformatics • Speech enhancement is a very broad field • Array processing for source separation, noise cancellation • Interested in blind noise reduction • Linear, Linear + KLT preprocessing, Signal subspace embedding • Kalman filter based methods, Non-linear methods • Using state-of-the-art noise reduction methods, typical SNR improvements are ~5 dB • Proposed experiments to test ASR improvement

  17. References • Eric A. Wan and Rudolph van der Merwe, “Noise-Regularized Adaptive Filtering for Speech Enhancement,” Proc. Eurospeech, pp. 2643-2646, 1999. • Ki Yong Lee., Byung-Gook Lee, Iickho Song, and Souguil Ann, “Robust Estimation of AR Parameters and its Application for Speech Enhancement,” Proc. IEEE ICASSP, pp. 309 - 312, 1992. • Phil S. Whitehead, David V. Anderson, and Mark A. Clements, “Adaptive, Acoustic Noise Suppression for Speech Enhancement.” Proc. IEEE ICME, pp. 565 – 568, 2003. • A. V. Oppenheim, E. Weinstein, K. C. Zangi, M. Feder, and D. Gauger, “Single Sensor Active Noise Cancellation Based on the EM Algorithm,” Proc. IEEE ICASSP, pp. 277 – 280, 1992. • T. Rutkowski, A. Cichocki, and A. K. Barros, “Speech Enhancement Using Adaptive Filters and Independent Component Analysis Approach,” Proc. AISAT, 2000. • H. Saruwatari, K. Sawai, A. Lee, K. Shikano, A. Kaminuma, and M. Sakata, “Speech Enhancement and Recognition in Car Environment Using Blind Source Separation and Subband Elimination Processing,” Proc. ICA, pp. 367 – 372, 2003. • Simon Haykin, Adaptive Filter Theory, Prentice-Hall Inc., Upper Saddle River, NJ, pp 466 – 501, 2002. • M. T. Johnson, A. C. Lindgren, R. J. Povinelli, and X. Yuan, “Performance of Nonlinear Speech Enhancement using Phase Space Reconstruction,” Proc IEEE ICASSP, pp. 872 – 875, 2003. • Andrew C. Lindgren, “Speech Recognition Using Features Extracted from Phase Space Reconstructions,” Thesis, Marquette University, Milwaukee WI, May 2003.

  18. END

More Related