260 likes | 534 Views
Speaker Recognition = Speaker Identification, Speaker Verification. Florian Schiel Venice International University Oct 2007. Agenda. See the Context Speech Recognition vs. Speaker Recognition Speaker Identification vs. Speaker Verification Speaker Recognition: Basics
E N D
Speaker Recognition = Speaker Identification, Speaker Verification Florian Schiel Venice International University Oct 2007
Agenda • See the Context • Speech Recognition vs. Speaker Recognition • Speaker Identification vs. Speaker Verification • Speaker Recognition: Basics • Speaker Verification using HMM • Discussion • and then ...
General Approach to Authentification • Three general ways to perform authentification:- proof of knowledge (e.g. password),- proof of possession (e.g. chip card),- proof of property (biometrics), and their combinations • Biometrics: physiological based vs. behavioural based • Biometrical features:Fingerprint, iris scan, facial scan, hand geometry, signature, voice from U. Türk 2007
Biometric Features: General Requirements ++ + ++ + o oo + • universal: can be found in any user • unique: even for identical twins • measurable: does not require human evaluation • robust to short-term and long-term variability • low dimensionality • robust to changing environment • robust to impersonation from U. Türk 2007
Taxonomie Speech Processing Natural Language Processing(NLP) Spoken Language Processing(SLP) Dialogue systems Speech Synthesis SyntaxParsing Spellers SpeechIdentification Speech Recognition Terminology Lexica Forensics Speaker recognition Thesaurus Search /Indexing Semantics
Accepted/ Rejected ID "Sehr geehrter .." Speaker Characteristics Speech Models Speaker Recognition "Determine the identity of a speaker from acoustic signal" Speech Recognition "Decode the spoken content from the acoustic signal" ClaimedIdentity SI/SV ASR
reject accept reject reject accept accept correct falsereject identity ok Identität ok Identität ok falseaccept correct Identität falsch identity wrong Identität falsch Speaker Recognition Speaker Verification Speaker Identification • Identification from limited numberof participants • Result is speaker identity • Scaling: effort increases linear with number of participants • Accuracy: dependent of+ size of enrolment data+ number of participants • Authentification according toclaimed identity • Result is binary:"accept" / "reject" • Scaling: effort independentof number of participants • Accuracy: dependent of sizeof enrolment data 100 Correctness N
Speaker Verification Speaker Identification • Applications: • Forensics • Police Work • Automatic User Settings • Speaker Classification:Advertising • Applications: • Access Control • Verification of identity via the phone • Automatic Teller Machines • Password resetting • Banking: Identity for new accounts etc. • Protection against theft (cars...)
Speaker Verification: Doddington's Zoo (1) User = registered speaker, Impostor = non-registered speaker • Goats : users that are often rejected wrongly (increasing 'false reject' errors) • Lambs : users that are easily imitated (increasing 'false accept' errors) • Sheep : users that 'behave' (not goats and not lambs) • Wolfs : particulary successful impostors (increasing 'false accept' errors) from Doddington 1998
Speaker Verification: Doddington's Zoo (2) Wolfs may perform zero-effort or active impostor attempts to break into a SV system. Problem: Speaker verification data bases do not contain active impostor attempts data of wolfs -> most technical evaluations are based on non-realistic data!
Symbols: • Text • Action • Semantics Anti-AliasingFilter A / D Digital Signal Vectors Symbols Analog Signal • "Call Richard!" • "Radio off!" • "216" m1..mN m1..mN ... 0 t t 10 20 Technical Speech Processing Highpass Featuredetection Dekoder
Anti-AliasingFilter A / D Highpass Featuredetection "Accept""Reject" Claimedidentity Select PIN ASR Speaker Models Fingerprint Speaker Verifikation: Basics (1) Verification ID
Anti-AliasingFilter A / D Anti-aliasingfilter A / D „Accept” „Reject” Speaker Verification: Basics (2) Highpass Feature detection Verification + Analog-Digital Converter Analog low pass filter to avoid anti-aliasing effects f fsam/2
Anti-AliasingFilter Anti-AliasingFilter A / D A / D Highpass Merkmals-berechnung Verification "Accept""Reject" Window 0 25 ms Extraction ofSpeaker characteristics m1...mN m1...mN m1...mN m1...mN ... 10 30 20 40 Speaker Verification: Basics (3) Featuredetection Features: • speaker specific • robust against noise • partly long term
Anti-AliasingFilter A / D Highpass Featuredetection "Accept""Reject" Verification vector sequenceS speaker modelof claimed ID m1..mN m1..mN ... 10 20 Speaker Verification: Basics (4) decision p(S | ID) > threshold "Accept" p(S | ID) < threshold "Reject"
Speaker Verification: Tuning • Error types highly dependent on thresholdhigh security -> false accept low false reject highuser friendly -> false reject low false accept high EqualErrorRate falseaccept falsereject threshold • Solution:- multiple enrolments- adaptive learning • Both errors increase by:- channel disturbance- crosstalk- noise- room acoustics
Speaker Verification: Score Normalisation (1) Problem:How to set the optimal threshold? HMMs generate a priori probabilities: O : observation = sequence of features l : speaker model Bayes: but is dependent on various factors
Speaker Verification: Score Normalisation (2) Solution: Bayesian Decision Rule: with Bayes and log to both sides this leads to: CFR, CFA : cost functions
Speaker Verification: Score Normalisation (3) Often assumed: costs are equal and speakers occur equally distributed is estimated using a world or cohort model world model : speaker model trained to all speakers cohort model : speaker model trained to a group of most competing models (wolfs)
Speaker Verification: Enrolment Method Enrolment Remarks Fixed, pre-specified sentence:e.g. "My voice is my password" Speak sentence 3 - 5 times Sentence may be intercepted and played back Additional securityby content Fixed, selectable sentence:e.g. maiden name of grandmother Speak sentence 3 – 5 times Speak each number3 – 5 times High security by many possible combinations Changing number triplets:e.g. fifteen, thirtynine, seventythree System generates a new sentence for each verification Speak each phoneme 3 – 5 times Elaborate enrolment, high processing effort,very high security
linear piecewise linear ergodic Speaker Verification: HMM types Method Model Security Accuracy pre-specified sentence recombination of segments taken from enrolment data modeling without time structure o
Speaker Verification: Features (1) Variable signal characteristics • often required: telephone band 300 – 3300 Hz(higher resonances cut off) • changing channel characteristics, caused by transmission line, handset, distance to mouth • static and intermittent noise • user: health, intoxication, fatigue
Speaker Verification: Features (2) Candidates determined by physiology: • fundamental frequency, average • wave form of vocal folds, jimmer, jitter, irregularities • formants: average and dynamics • places of articulation: fricatives, plosives • nasal cavity resonance • sub-glottal resonance
Speaker Verification: Features (3) Candidates determined by behaviour: • voiced/unvoice ratio • fundamental frequency, dynamics • syllable rate, pause/speech ratio • dialectal features: vowel quality Candidates determined by speech technology: • Linear Predictor Coefficients (LPC) • filter bank, Bark filter bank, Mel filter bank • Cepstrum, Mel-Cepstrum • (derivations with respect to time)
2020 Heute 1990 2010 Sprecherverifikation: Road Map Authentifizierungim Hintergrund Geräte "erkennen"ihren Benutzer ZugangskontrollenSicherheitsbereich Authentifizierungüber Telefon Sprecherprofilauf Chipkarten ÖffentlicheSprecherprofile Zugangskontrolle fürTastaturlose PDAs Automatischer Alkohol-test im Fahrzeug