1 / 1

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space Yun-Nung Chen, Chia -Ping Chen, Hung-Yi Lee , Chun-an Chan, and Lin- shan Lee. Query Q. Corpus. MFCC Extraction. ASR. Retrieval. Lattice. Re-Ranking. Original Relevance Score S Q ( x ). v ( i ).

tasha
Download Presentation

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space Yun-Nung Chen, Chia-Ping Chen, Hung-Yi Lee , Chun-an Chan, and Lin-shan Lee Query Q Corpus MFCC Extraction ASR Retrieval Lattice Re-Ranking Original Relevance Score SQ(x) v (i) Acoustic Distance 1st-Pass Results XQ Graph-Based Re-Computation d(xi ,xj) Utterance Pair xi, xj White blocks: original spoken term detection Lattice w2: 0.3 Q: 0.5 w1: 0.1 w2: 0.6 Q: 0.7 MFCC sequence of x w1: 0.2 w2: 0.1 x1 x5 DTW Hit Region of the utterance xi d(xi ,xj) Hit Region of the utterance xj Out(vi)= {x3, x5, x6} x4 x3 v(4) x2 x6 National Taiwan University Summary Main Idea Acoustic Feature Space • Using graph-based re-ranking to improve spoken term detection with acoustic similarity. • With MLLR acoustic model, MAP improves from 55.54% to 67.38%. The relative improvement rate is 22.04% relevant irrelevant • Relevant utterances are similar to each other in feature space. • Utterances similar to more utterances with higher scores should be given higher relevance scores. Acoustic Distance • Compute acoustic distance d(xi ,xj) for each utterance pair xi ,xjin first-pass result. • Node: first-pass retrieved utterance • Edge: weighted by the similarity between two utterances • Nodes connected to more nodes with higher scores are given higher scores. • Considering global similarity among first-pass retrieved utterances. • “Hit Region”: the corresponding MFCC sequence which is most possible to be the query in the lattice. Experiments • Corpus: 33 hours of course lectures (single instructor) • Language: primarily in Mandarin Chinese • Acoustic Model: SI, MLLR, SD Graph-Based Re-Ranking with Acoustic Feature • Modified Random Walk • PRF: pseudo-relevance feedback in feature space (InterSpeech 2010) • Our approach performs better, specially for the relatively poorer acoustic models (SI and MLLR). • Scores propagated from neighbors of node i • Normalized original relevance score • Node: first-pass retrieved utterance • Edge: weighted by the similarity between the two utterances evaluated in feature space • v(i)is higher when • Higher original relevance score • Similar to more utterances with higher scores (matrix form) • Compute similarity from distance 0 • Solution: dominant eigenvector of P’ • Re-Ranking 1 • Normalized similarity • The performance is optimized at α = 0.9 • Better retrieval relies primarily on global similarity. • The graphic structure provides significant information in ranking. • Integrated with original relevance score

More Related