150 likes | 172 Views
This course explores the principles and applications of speech and audio signal processing. Topics include speech synthesis, recognition, audio coding, source separation, and more. Prerequisites include EE123 or equivalent and Stat 200A or equivalent.
E N D
EE 225D Audio Signal Processing in Humans and Machines Prof. N. Morgan and friends MW 4:00-5:30 http://www.icsi.berkeley.edu/eecs225d/spr12/overview.html
Textbook: Prerequisites Speech and Audio Signal Processing Gold, Morgan, and Ellis Wiley&Sons, 2nd edition, 2011 EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor
Speech and audio signal processing: why does this material matter? • Speech w/o visual vs visual w/o speech • Requires DSP, machine learning • Multidisciplinary tasks are good training • Many applications!
What should we be able to do(automatically)? • Human example suggests, plenty • What was said • Who said it • When they said it • What it meant • How to respond
Why is it hard? • Speaker variability (within and between) • Noise, reverberation, channel • Confusable vocabulary • Meaning and tone
Course Philosophy I • People can do these tasks effortlessly • Include psychoacoustics and physiology • Also some acoustics • But of course, also DSP and machine learning
Course Philosophy II • First part of the course is basic stuff • The rest is applications • Much of the course grade based on an original project • Some practice in oral presentation
Section I: Broad background • Synthesis/vocoding history (chaps 2&3) • Recognition history (chap 4) • Machine recognition basics (chap 5) • Human recognition basics (chap 18)
Section II: Scientific background • Pattern classification (chaps 8 and 9) • Ear physiology (chap 14) • Acoustics (chaps 10 and 13) • Linguistic sound categories (chap 23)
Section IIIa: Engineering Apps • Signal processing “front end” (chaps 19-22) • Perceptual audio coding (chap 35) • Music signal analysis (chap37) • Source separation (chap 39)
Section IIIb: Engineering Apps • Deterministic sequence recognition (chap 24) • Statistical modeling and inference (chaps 25,26) • Discriminant methods and adaptation (chaps 27,28)
Section IIIc: Engineering Apps • Speech synthesis (chap 30) • Spoken dialog systems (chap29++) • Speaker verification (chap 41) • Speaker diarization (chap 42)
Course grading • Quizzes/assignments (for first half): 30% • Project proposal: 10% • Project oral presentation: 20% • Project write-up & results: 40%
Course location • After today, 6th floor ICSI • 1947 Center Street, between Milvia and MLK • Class will start at 4:15 instead of 4:10 (15 minute walk from Cory) • Office hour, one hour before each class