Yun - Nung ( Vivian ) Chen, Yu Huang, Sheng -Yi Kong, Lin-Shan Lee

Automatic Key Term Extractionfrom Spoken Course LecturesUsing Branching Entropy and Prosodic/Semantic Features Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan

Key Term Extraction, National Taiwan University Introduction

Key Term Extraction, National Taiwan University Definition • Key Term • Higher term frequency • Core content • Two types • Keyword • Key phrase • Advantage • Indexing and retrieval • The relations between key terms and segments of documents

Key Term Extraction, National Taiwan University Introduction

Key Term Extraction, National Taiwan University Introduction language model n gram hmm acoustic model hidden Markov model phone

Key Term Extraction, National Taiwan University Introduction bigram language model n gram hmm acoustic model hidden Markov model phone Target: extract key terms from course lectures

Key Term Extraction, National Taiwan University Proposed Approach

Key Term Extraction, National Taiwan University Automatic Key Term Extraction ▼ Original spoken documents Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans ASR Branching Entropy Feature Extraction speech signal

Key Term Extraction, National Taiwan University Automatic Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans ASR Branching Entropy Feature Extraction speech signal

Key Term Extraction, National Taiwan University Automatic Key Term Extraction Phrase Identification Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans ASR Branching Entropy Feature Extraction speech signal First using branching entropy to identify phrases

Key Term Extraction, National Taiwan University Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans ASR Branching Entropy Feature Extraction speech signal Key terms entropy acoustic model : Then using learning methods to extract key terms by some features

Key Term Extraction, National Taiwan University Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans ASR Branching Entropy Feature Extraction speech signal Key terms entropy acoustic model :

Key Term Extraction, National Taiwan University How to decide the boundary of a phrase? Branching Entropy represent is of is hidden Markov model • “hidden” is almost always followed by the same word in can : : : :

Key Term Extraction, National Taiwan University How to decide the boundary of a phrase? Branching Entropy represent is of is hidden Markov model • “hidden” is almost always followed by the same word • “hidden Markov” is almost always followed by the same word in can : : : :

Key Term Extraction, National Taiwan University How to decide the boundary of a phrase? Branching Entropy represent is of is hidden Markov model in can : : : : boundary • “hidden” is almost always followed by the same word • “hidden Markov” is almost always followed by the same word • “hidden Markov model” is followed by many different words Define branching entropy to decide possible boundary

Key Term Extraction, National Taiwan University How to decide the boundary of a phrase? Branching Entropy represent is of is hidden Markov model in can : : X xi : : • Definition of Right Branching Entropy • Probability of children xi for X • Right branching entropy for X

Key Term Extraction, National Taiwan University How to decide the boundary of a phrase? Branching Entropy represent is of is hidden Markov model in can : : X : : boundary • Decision of Right Boundary • Find the right boundary located between X and xi where

Key Term Extraction, National Taiwan University How to decide the boundary of a phrase? Branching Entropy represent is of is hidden Markov model in can : : : :

Key Term Extraction, National Taiwan University How to decide the boundary of a phrase? Branching Entropy represent is of is hidden Markov model in can : : X : : boundary • Decision of Left Boundary • Find the left boundary located between X and xi where X: model Markov hidden Using PAT Tree to implement

4 6 5 1 2 3 Key Term Extraction, National Taiwan University How to decide the boundary of a phrase? Branching Entropy • Implementation in the PAT tree • Probability of children xi for X • Right branching entropy for X hidden X: hidden Markov x1: hidden Markov model x2: hidden Markov chain Markov X state variable chain model distribution x2 x1

Key Term Extraction, National Taiwan University Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans ASR Branching Entropy Feature Extraction speech signal Key terms entropy acoustic model : Extract prosodic, lexical, and semantic features for each candidate term

Key Term Extraction, National Taiwan University Feature Extraction Speaker tends to use longer duration to emphasize key terms • Prosodic features • For each candidate term appearing at the first time duration of phone “a” normalized by avg duration of phone “a” using 4 values for duration of the term

Key Term Extraction, National Taiwan University Feature Extraction Higher pitch may represent significant information • Prosodic features • For each candidate term appearing at the first time

Key Term Extraction, National Taiwan University Feature Extraction Higher energy emphasizes important information • Prosodic features • For each candidate term appearing at the first time

Key Term Extraction, National Taiwan University Feature Extraction • Lexical features Using some well-known lexical features for each candidate term

Key Term Extraction, National Taiwan University Feature Extraction Key terms tend to focus on limited topics • Semantic features • Probabilistic Latent Semantic Analysis (PLSA) • Latent Topic Probability tj: terms Di:documents Tk: latent topics

Key Term Extraction, National Taiwan University Feature Extraction Key terms tend to focus on limited topics • Semantic features • Probabilistic Latent Semantic Analysis (PLSA) • Latent Topic Probability non-key term key term How to use it? describe a probability distribution

Key Term Extraction, National Taiwan University Feature Extraction Key terms tend to focus on limited topics • Semantic features • Probabilistic Latent Semantic Analysis (PLSA) • Latent Topic Significance Within-topic to out-of-topic ratio non-key term key term within-topic freq. out-of-topic freq.

Key Term Extraction, National Taiwan University Feature Extraction Key terms tend to focus on limited topics • Semantic features • Probabilistic Latent Semantic Analysis (PLSA) • Latent Topic Entropy non-key term key term

Key Term Extraction, National Taiwan University Feature Extraction Key terms tend to focus on limited topics • Semantic features • Probabilistic Latent Semantic Analysis (PLSA) • Latent Topic Entropy non-key term Higher LTE key term Lower LTE

Key Term Extraction, National Taiwan University Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR Branching Entropy Feature Extraction ASR trans speech signal Key terms entropy acoustic model : Using unsupervised and supervised approaches to extract key terms

Key Term Extraction, National Taiwan University Learning Methods • Unsupervised learning • K-means Exemplar • Transform a term into a vector in LTS (Latent Topic Significance) space • Run K-means • Find the centroid of each cluster to be the key term The terms in the same cluster focus on a single topic The candidate term in the same group are related to the key term The key term can represent this topic

Key Term Extraction, National Taiwan University Learning Methods • Supervised learning • Adaptive Boosting • Neural Network Automatically adjust the weights of features to produce a classifier

Key Term Extraction, National Taiwan University Experiments & Evaluation

Key Term Extraction, National Taiwan University Experiments • Corpus • NTU lecture corpus • Mandarin Chinese embedded by English words • Single speaker • 45.2 hours 我們的solution是viterbi algorithm (Our solution is viterbi algorithm)

Key Term Extraction, National Taiwan University Experiments some data from target speaker • ASR Accuracy SI Model Bilingual AM and model adaptation AM CH EN trigram interpolation LM Adaptive Background In-domain corpus Out-of-domain corpora

Key Term Extraction, National Taiwan University Experiments • Reference Key Terms • Annotations from 61 students who have taken the course • If the k-th annotator labeled Nk key terms, he gave each of them a score of , but 0 to others • Rank the terms by the sum of all scores given by all annotators for each term • Choose the top N terms form the list (N is average Nk) • N = 154 key terms • 59 key phrases and 95 keywords

Key Term Extraction, National Taiwan University Experiments • Evaluation • Unsupervised learning • Set the number of key terms to be N • Supervised learning • 3-fold cross validation

Key Term Extraction, National Taiwan University Experiments • Feature Effectiveness • Neural network for keywords from ASR transcriptions F-measure 56.55 48.15 42.86 35.63 20.78 Pr: Prosodic Lx: Lexical Sm: Semantic Prosodic features and lexical features are additive Three sets of features are all useful Each set of these features alone gives F1 from 20% to 42%

Key Term Extraction, National Taiwan University Experiments AB: AdaBoost NN: Neural Network F-measure • Conventional TFIDF scores w/o branching entropy • stop word removal • PoS filtering • Overall Performance 67.31 62.39 55.84 51.95 23.38 Branching entropy performs well Supervised approaches are better than unsupervised approaches K-means Exempler outperforms TFIDF

Key Term Extraction, National Taiwan University Experiments AB: AdaBoost NN: Neural Network F-measure • Overall Performance 67.31 62.70 62.39 57.68 55.84 52.60 51.95 43.51 23.38 20.78 Supervised learning using neural network gives the best results The performance of ASR is slightly worse than manual but reasonable

Key Term Extraction, National Taiwan University Conclusion

Key Term Extraction, National Taiwan University Conclusion • We propose the new approach to extract key terms • The performance can be improved by • Identifying phrases by branching entropy • Prosodic, lexical, and semantic features together • The results are encouraging

Key Term Extraction, National Taiwan University Thanks for your attention! Q & A Thank reviewers for valuable comments NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lecture

Yun - Nung ( Vivian ) Chen, Yu Huang, Sheng -Yi Kong, Lin-Shan Lee

Yun - Nung ( Vivian ) Chen, Yu Huang, Sheng -Yi Kong, Lin-Shan Lee

Presentation Transcript

Computer security Lab Juseung Yun

Jun Yun cAFE

Solar Electric System + Arduino Yun + thelibraproject

SPEAKER: PASTOR LAWM CUNG NUNG

Yun Wen

Grand Master Hsu Yun (V)

Feng Wei, Zhaoxia Wang, Yun Wu

Table S2. Yun et al ., 2014

PROF. DR. LI XIAO-YUN

Grand Master Hsing Yun

Table S4. Yun et al ., 2014

Fenghueih Huarng and Yun-Chi Hsieh

Table S3. Yun et al ., 2014

Figure S3. Yun et al ., 2014

Figure S4. Yun et al ., 2014

TAE YUN KIM - TAE YUN KIM AND TAE KWON DO

Buy Arduino Yun India-Robomart

Price Arduino Yun By Robomart

Yun&Borodulin

YUN House @ Four Seasons Hotel Kuala Lumpur