Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space Yun-Nung Chen, Chia-Ping Chen, Hung-Yi Lee , Chun-an Chan, and Lin-shan Lee Query Q Corpus MFCC Extraction ASR Retrieval Lattice Re-Ranking Original Relevance Score SQ(x) v (i) Acoustic Distance 1st-Pass Results XQ Graph-Based Re-Computation d(xi ,xj) Utterance Pair xi, xj White blocks: original spoken term detection Lattice w2: 0.3 Q: 0.5 w1: 0.1 w2: 0.6 Q: 0.7 MFCC sequence of x w1: 0.2 w2: 0.1 x1 x5 DTW Hit Region of the utterance xi d(xi ,xj) Hit Region of the utterance xj Out(vi)= {x3, x5, x6} x4 x3 v(4) x2 x6 National Taiwan University Summary Main Idea Acoustic Feature Space • Using graph-based re-ranking to improve spoken term detection with acoustic similarity. • With MLLR acoustic model, MAP improves from 55.54% to 67.38%. The relative improvement rate is 22.04% relevant irrelevant • Relevant utterances are similar to each other in feature space. • Utterances similar to more utterances with higher scores should be given higher relevance scores. Acoustic Distance • Compute acoustic distance d(xi ,xj) for each utterance pair xi ,xjin first-pass result. • Node: ﬁrst-pass retrieved utterance • Edge: weighted by the similarity between two utterances • Nodes connected to more nodes with higher scores are given higher scores. • Considering global similarity among first-pass retrieved utterances. • “Hit Region”: the corresponding MFCC sequence which is most possible to be the query in the lattice. Experiments • Corpus: 33 hours of course lectures (single instructor) • Language: primarily in Mandarin Chinese • Acoustic Model: SI, MLLR, SD Graph-Based Re-Ranking with Acoustic Feature • Modified Random Walk • PRF: pseudo-relevance feedback in feature space (InterSpeech 2010) • Our approach performs better, specially for the relatively poorer acoustic models (SI and MLLR). • Scores propagated from neighbors of node i • Normalized original relevance score • Node: first-pass retrieved utterance • Edge: weighted by the similarity between the two utterances evaluated in feature space • v(i)is higher when • Higher original relevance score • Similar to more utterances with higher scores (matrix form) • Compute similarity from distance 0 • Solution: dominant eigenvector of P’ • Re-Ranking 1 • Normalized similarity • The performance is optimized at α = 0.9 • Better retrieval relies primarily on global similarity. • The graphic structure provides signiﬁcant information in ranking. • Integrated with original relevance score

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space

Presentation Transcript

The IBM 2006 Spoken Term Detection System

Input Space versus Feature Space in Kernel-Based Methods

Rapid and Accurate Spoken Term Detection

Feature Detection

Using Conversational Word Bursts in Spoken Term Detection

QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION

The SRI 2006 Spoken Term Detection System

Feature detection

GMDH-based feature ranking and selection for improved classification of medical data

QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION

Spoken Term Detection Evaluation Overview

Re-ranking Subgroups for Fraud Detection

QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION

Boosting-based parse re-ranking with subtree features

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Feature Space Based Watermarking in Multi-Images

Graph-based Iterative Hybrid Feature Selection

Feature Detection

Rapid and Accurate Spoken Term Detection

Input Space versus Feature Space in Kernel-Based Methods

QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION

Feature Detection