1 / 35

Speech Processing

Speech Processing. Applications of Images and Signals in High Schools. AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012. Contributors. Dr . Veton Këpuska , Faculty Mentor, FIT vkepuska@fit.edu Jacob Zurasky , Graduate Student Mentor, FIT jzuraksy@my.fit.edu

trilby
Download Presentation

Speech Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Processing Applications of Images and Signals in High Schools AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012

  2. Contributors Dr. VetonKëpuska, Faculty Mentor, FIT vkepuska@fit.edu Jacob Zurasky, Graduate Student Mentor, FIT jzuraksy@my.fit.edu Becky Dowell, RET Teacher, BPS Titusville High dowell.jeanie@brevardschools.org

  3. Speech Processing Project • Speech recognition requires speech to first be characterized by a set of “features” • Features are used to determine what words are spoken. • Our project implements the feature extraction stage of a speech processing application.

  4. Timeline • 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided • 1952: Bell Labs develops first effective speech recognizer • 1971-1976 DARPA: speech should be understood, not just recognized • 1980’s: Call center and text-to-speech products commercially available • 1990’s: PC processing power allows use of SR software by ordinary user Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

  5. Applications • Call center speech recognition • Speech-to-text applications (e.g. dictation software) • Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri) • Science Fiction 1968: Stanley Kubrick’s 2001: A Space Odysseyhttp://www.youtube.com/watch?v=6MMmYyIZlC4 • Science Fact 2011: Apple iPhone 4S Sirihttp://www.apple.com/iphone/features/siri.html • Medical Applications • Parkinson’s Voice Initiative • Detection of Sleep Disorders

  6. Difficulties • Continuous Speech (word boundaries) • Noise • Background • Other speakers • Differences in speakers • Dialects/Accents • Male/female

  7. Speech Recognition Front End: Pre-processing Back End: Recognition Features Recognized speech Speech Large amount of data. Ex: 256 samples Reduced data size. Ex: 13 features • Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. • 256 samples ------> 13 features • Back End - statistical models used to classify feature vectors as a certain sound in speech

  8. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Pre-emphasis

  9. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • Pre-emphasis

  10. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content

  11. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

  12. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

  13. Front-End Processing of Speech Recognizer • High pass filter to compensate for higher frequency roll off in human speech • Separate speech signal into frames • Apply window to smooth edges of framed speech signal • Window • FFT • log • IFFT • Pre-emphasis • Mel-Scale • Transform signal from time domain to frequency domain • Human ear perceives sound based on frequency content • Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) • Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals • Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”

  14. Speech Analysis and Sound Effects (SASE) Project • Graphical User Interface (GUI) • Speech input • Record and save audio • Read sound file (*.wav, *.ulaw, *.au) • Graphs the entire audio signal • Process user selected speech frame and display output for each stage of processing • Displays spectrogram • Apply audio effects

  15. MATLAB Code • Graphical User Interface (GUI) • GUIDE (GUI Development Environment) • Callback functions • Front-end speech processing • Modular functions for reusability • Graphs display output for each stage • Sound Effects • Echo, Reverb, Flange, Chorus, Vibrato, Tremolo, Voice Changer

  16. GUI Components

  17. GUI Components Plotting Axes

  18. Buttons GUI Components Plotting Axes

  19. SASE Lab Demo • Record, play, save audio to file, open existing audio files • Select and process speech frame, display graphs of stages of front-end processing • Display spectrogram for entire speech signal or user selectable 3 second sample • Play speech – all or selected 3 sec sample • Show differences in certain sounds in spectrogram and the features ex: “a e i o u” so audience understands how these graphs tell us about the sounds • Apply sound effects, show user configurable parameters • Graphs spectrogram and speech processing on sound effects • Show echo effect in spectrogram • Use as teaching tool

  20. Future Work on SASE Lab • Audio Effects • Ex: Pitch removal • Noise Filtering

  21. Applications of Signal Processing in High Schools • Convey the relevance and importance of math to high school students • Bring knowledge of engineering, technological innovation, and academic research into high school classrooms • Opportunity for students to acquire technical knowledge and analytical skills through hands-on exploration of real-world applicationsin the field of Signal Processing • Encourage students to pursue higher education and careers in STEM fields

  22. Unit Plan: Speech Processing • Collection of lesson plans introduce high school students to fundamentals of speech and sound processing • Connections to Pre-Calculus mathematics standards (NGSSS and Common Core) • Mathematical Modeling • Trigonometric Functions • Complex Numbers in Rectangular and Polar Form • Function Operations • Logarithmic Functions • Sequences and Series • Matrices • Hand-on lessons involving MATLAB projects • Teacher notes

  23. Unit Introduction • Students research, explore, and discuss current applications of speech and audio processing

  24. Lesson 1: The Sound of a Sine Wave • Modeling sound as a sinusoidal function • Concepts covered: • Continuous vs. Discrete Functions • Frequency of Sine Wave • Composite signals • Connections to real-world applications: • Synthesis of digital speech and music

  25. Lesson 1: The Sound of a Sine Wave • Student MATLAB Project • Create discrete sine waves with given frequencies • Create composite signal of the sine waves • Plot graphs and play sounds of the sine waves • Analyze the effect of frequency on the graphs and the sounds of the sine functions • Project Extensions • Play songs using sine waves • Synthesize vowel sounds with sine waves

  26. Lesson 2: Frequency Analysis • Use of Fourier Transformation to transform functions from time domain to frequency domain • Concepts covered: • Modeling harmonic signals as a series of sinusoids • Sine wave decomposition • Fourier Transform • Euler’s Formula • Frequency spectrum • Connections to real-world applications: • Speech processing and recognition

  27. Lesson 2: Frequency Analysis • Student MATLAB Project • Create a composite signal with the sum of harmonic sine waves • Plot graphs and play sounds of the sine waves • Compute the FFT of the composite signal • Plot and analyze the frequency spectrum

  28. Lesson 3: Sound Effects • Concepts covered: • Connections to real-world applications: • Digital music effects and speech sound effects

  29. Lesson 3: Sound Effects • Student MATLAB Project

  30. Unit Conclusion • Student presentation and report or poster • Summarize and reflect on lessons • Ask research questions • Develop new ideas for applications of speech processing

  31. References • Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. • Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. • Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007. • Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

  32. AEGIS Project • AEGIS website: http://research2.fit.edu/aegis-ret/ • Lesson plans available for download ????? • Contacts: • Becky Dowell, dowell.jeanie@brevardschools.org • Dr. VetonKëpuska, vkepuska@fit.edu • Jacob Zurasky, jzuraksy@my.fit.edu

  33. Thank you! Questions?

More Related