1 / 25

Speech Processing

Speech Processing. Applications of Images and Signals in High Schools. AEGIS RET All-Hands Meeting Florida Institute of Technology July 6, 2012. Contributors. Dr . Veton Këpuska , Faculty Mentor, FIT vkepuska@fit.edu Jacob Zurasky , Graduate Student Mentor, FIT jzuraksy@my.fit.edu

galia
Download Presentation

Speech Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Processing Applications of Images and Signals in High Schools AEGIS RET All-Hands Meeting Florida Institute of Technology July 6, 2012

  2. Contributors Dr. VetonKëpuska, Faculty Mentor, FIT vkepuska@fit.edu Jacob Zurasky, Graduate Student Mentor, FIT jzuraksy@my.fit.edu Becky Dowell, RET Teacher, BPS Titusville High dowell.jeanie@brevardschools.org

  3. Timeline • 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided • 1952: Bell Labs develops first effective speech recognizer • 1971-1976 DARPA: speech should be understood, not just recognized • 1980’s: Call center and text-to-speech products commercially available • 1990’s: PC processing power allow of SR software by ordinary user Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

  4. Motivation • Applications • Call center speech recognition • Speech-to-text applications (e.g. dictation software) • Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri) • Science Fiction 1968: Stanley Kubrick’s 2001: A Space Odysseyhttp://www.youtube.com/watch?v=6MMmYyIZlC4 • Science Fact 2011: Apple iPhone 4S Sirihttp://www.apple.com/iphone/features/siri.html

  5. Difficulties • Continuous Speech (word boundaries) • Noise • Background • Other speakers • Differences in speakers • Dialects/Accents • Male/female

  6. Motivation • Speech recognition requires speech to first be characterized by a set of “features”. • Features are used to determine what words are spoken. • Our project implements the feature extraction stage of a speech processing application.

  7. Speech Recognition Front End: Pre-processing Back End: Recognition Features Recognized speech Speech Large amount of data. Ex: 256 samples Reduced data size. Ex: 13 features • Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. • 256 samples ------> 13 features • Back End - statistical models used to classify feature vectors as a certain sound in speech

  8. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Pre-emphasis

  9. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • Pre-emphasis

  10. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content

  11. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

  12. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

  13. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • IFFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals • Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”

  14. Speech Analysis and Sound Effects (SASE) Project • Graphical User Interface (GUI) • Speech input • Record and save audio • Sound file (*.wav, *.ulaw, *.au) • Graphs the entire audio signal • Select a “frame” by clicking on graph • Process speech frame and display output for each stage of processing • Displays spectrogram

  15. GUI Components

  16. GUI Components Plotting Axes

  17. Buttons GUI Components Plotting Axes

  18. MATLAB Code • Graphical User Interface (GUI) • GUIDE (GUI Development Environment) • Callback functions • Work in progress • Extendable • Stages of speech processing • Modular functions for reusability

  19. SASE Lab • Interactive teaching tool • Demo

  20. Future Work • Improve GUI • Audio Effects • Ex: Echo, Reverberation, Chorus, Flange • Noise Filtering

  21. References • Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. • Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. • Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007. • Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

  22. Thank you! Questions?

  23. Unit Plan • Introduction • Lesson #1: The Sound of a Sine Wave • Lesson #2: Frequency Analysis • Lesson #3: Filtering (work in progress) • Lesson #4: SASE Lab (work in progress) • Conclusion

More Related