220 likes | 373 Views
On TFSR (semi)automatic systems supportability : novel instruments for analysis and compensation. Francesco Borchi, Monica Carfagni, Matteo Nunziati. Outline. Main goal TFSR Systems LogR estimation Common test procedures for TFSR systems System behaviour classification
E N D
On TFSR (semi)automatic systems supportability: novel instruments for analysis and compensation Francesco Borchi, Monica Carfagni, Matteo Nunziati
Outline • Main goal • TFSR Systems • LogR estimation • Common test procedures for TFSR systems • System behaviour classification • Supportability evaluation tools • Score compensation tools • Quality assessment logics • Conclusion
Main goal Our goal is to propose a general purpose set of tools for system compensation and quality assessment Specific goals: • to build a generic framework for system analysis • to develop a novel generic tool for system compensation • to assess system quality level on the basis of the amount of compensation required by the system itself
TFSR Systems Voice sample 1 TFSR system LogR Voice sample 2 We define a TFSR system as a black box which receives two or more recordings as inputs and produces one or more scores (LogR) as outputs
LogR estimation 1/2 LogR = log10[P(E | H0) / P( E | H1)] Log-likelihood ratio defines the most supportable hypotesis Hypotesis 1: the two samples belong to different speakers Hypotesis 0: the two samples belong to the same speaker If LogR>0 support goes to the H0 hypotesis If LogR<0 support goes to the H1 hypotesis If LogR=0 no support is provided
LogR estimation 2/2 The real LogR value is unknown. We can estimate it using some approximations. Our systems are error-prone. The system goodness depends ona number of factors: • The way we have used to retrieve voice samples • The kind of parameters employed in the recognition • The algorithms used for parameter extraction • The mathematic model used to estimate LogR Experimentation is the best way to assess system behaviour
Speaker1 … SpeakerN … Common test procedures for TFSR systems1/2 The system is tested against a set of recordings having known origin: 2 or more recordings …
Same speaker pairs (SS) Different speaker pairs (DS) Common test procedures for TFSR systems2/2 Recordings are mixed up and grouped in pairs: SS: test system behaviour when H0 is true. Is LogR>0? DS: test system behaviour when H1 is true. Is LogR<0?
% SS False negatives % DS H1 H0 False positives System behaviour classification1/3 Tippett Plot: a common method to show system behaviour
Only false scores Wrong support System behaviour classification2/3 Provide a solution to eliminate “false score only” areas (red boxes)
System behaviour classification3/3 isoperforming Provide a solution to reduce the amount of false scores ipoperforming
Supportability evaluation tools 1/3 A quantitative evaluation of false scores has been proposed byP. Rose et Al. (2003): LRtest=P(LogR>0 | H0) / P(LogR>0 | H1) Percentage of true positives Percentage of false positives • Interpretable via Evett Table • No information is provided about false negatives • No information about the distribution of false scores • Do they affect a narrow range of scores? Do they widely perturb the system response?
Supportability evaluation tools 2/3 We propose to generalize the LRtest index using a new tool: the “Supportability of System” function (SoS): We know how much we can rely on our system, time by time! SoS(x)=P(LogR>x|H0)/P(LogR>x|H1) if x>0 SoS(x)=[1- P(LogR>x|H1)]/[1-P(LogR>x|H0)] if x<0 Interpretable via Evett Table Defined for both false positives and negatives Univocally detects the amount of false scores for each LogR Provides the accuracyof each score
20% false SoS=90/20=4.5 LogR = -13 90% true Supportability evaluation tools 3/3
original 0 X translated DX Score compensation tools 1/3 Preliminary operation: Eliminate “false score only” areas encreasing or reducing all scores
Score compensation tools 2/3 New LogR = LogR*tanh( Log10(SoS) ) LogR=4 LogR=3 LogR=2 LogR=1
compressed original Reduced amount of false scores Decreased values for true scores Score compensation tools 3/3 Compress all scores by a value defined by the SoS function Reduce the amount of false scores at the cost of a lower discriminative power
Quality assessment logics 1/3 • Score compensation reduces system’s discriminative power • Score compensation is required to prevent unbalanced responses • Compensation increases for decreasing values of SoS • Compensation is intrinsic to the system • A good system must have a strong SoS for each LogR value
Quality assessment logics 2/3 DMTI procedure • Step 1: test the system against a dataset (LogR) • Step 2: calculate supportability (SoS) • Step 3:calculate compensated scores (New LogR) • Step 4: calculate the percentage P of new LogR which has a “strong” SoS score (fixed by our standards) • Step 5: evaluate the Degree of Supportability (DoS): DoS= atanh(2P-1)
Quality assessmentlogics 3/3 Regardless of the specific procedure, our DoS score is equivalent to a LogR score!
Conclusion • A general purpose tool has been developed to score system supportability • An additional mathematic tool has been developed to compensate unbalanced systems • The toolsare system independent and theoretically motivated rather than empirically built • The tools are useful to reduce both false positives and false negatives • False score reduction produces a decrement in discriminative power • Such decrement is intrinsic to the system response and is univocally usable for system quality assessment • The proposed procedure for system quality assessment (degree of supportability) uses the well known Evett scale to score the system supportability