10 likes | 154 Views
Exploiting Word-level Features for Emotion Prediction. Why predict emotions?. 1. Affective computing – direction for improving spoken dialogue systems Emotion detection (prediction) Emotion handling.
E N D
Exploiting Word-level Features for Emotion Prediction Why predict emotions? 1 • Affective computing – direction for improving spoken dialogue systems • Emotion detection (prediction) • Emotion handling Poster by Greg Nicholas. Adapted from paper by Greg Nicholas, Mihai Rotaru, & Diane Litman Feature granularity levels 2 Detecting emotion: train a classifier on features extracted from user turns. Types of features: Turn Level Word Level Previous work uses mostly features computed over the entire turn. [1] uses pitch features computed at the word-level Amplitude Approximations of pitch contours Approximation of pitch contour Lexical Pitch Duration Offers a better approximation of the pitch contour (e.g. captures the big changes in uttering the word “great.”) Efficient but offers a coarse approximation of the pitch contour. We concentrate on Pitchfeatures to detect Uncertainty 3 4 Problems classifying the overall turn emotion Techniques to solve this problem • Word-level is more complicated: • Label granularity mismatch: label at turn level, features at word level • Variable number of features per turn • Turn-level is simple: • Labeling granularity = turn • One set of features per turn Technique 1: Word-level emotion model (WLEM) Technique 2: Predefined subset of sub-turn units (PSSU) Train: word-level model with turn’s emotion label Predict: emotion label of each word Combine: majority voting of predictions Combine: Concatenate features from 3 words (first, middle, last) into a conglomerate feature set Train & predict: turn-level model with turn’s emotion Example student turn: “The force of the truck” Turn-level speech: The force of the truck Turn-level speech: The force of the truck Word-level feature set: Word-level feature set: … … … … … … … … … … extract extract “the” “force” “of” “the” “truck” “the” “force” “of” “the” “truck” Word-level feature set: (Five sets) … … … … … Turn-level feature set: predict (One set) combine …… Word-level predictions: PSSU feature set: “the” “force” “of” “the” “truck” Non-uncertain Uncertain Non-uncertain “the force of the truck” … … … Uncertain Non-uncertain predict ? predict “the” “of” “truck” Overall turn prediction: (One prediction) Overall turn prediction: Overall turn prediction: Uncertain (One prediction) combine Predict Uncertain Overall turn prediction: Non-uncertain (3/5) Non-uncertain • Issues: • Turn Word level labeling assumption • Majority voting is a very simple scheme • Issues: • Might lose details from discarded words Experimental Results 5 Recall/Precision [1] showed that the WLEM method works better than turn-level Comparison of recall and precision for predicting uncertain turns Used in [2] at breath-group level but not at word level Corpus • ITSPOKE dialogues • Domain: qualitative physics tutoring • Backend: WHY2-Atlas, Sphinx2 speech recognition, Cepstral text-to-speech Future work 6 Overall prediction accuracy • Many alterations could further improve these techniques: • Annotate each individual word for certainty instead of whole turns • Include other features pictured above: lexical, amplitude, etc. • Try predicting in a human-human dialogue context • Better combination techniques (e.g. confidence weighting) • More selective choices for PSSU than the middle word of the turn (e.g. longest word in the turn, ensuring the word chosen has domain-specific content) Corpus comparison with previous study [1] Baseline: 77.79% • WLEM word-level slightly improves upon turn-level (+0.56%) • PSSU word-level show a much better improvement (+2.14%) • Overall, PSSU is best according to this metric as well • Turn-level: Medium recall/precision • WLEM: Best recall, lowest precision • Tends to over-generalize • PSSU: Good recall, best precision • Much less over-generalization, overall best choice [1] M. Rotaru and D. Litman, "Using Word-level Pitch Features to Better Predict Student Emotions during Spoken Tutoring Dialogues," Proceedings of Interspeech, 2005. [2] J. Liscombe, J. Hirschberg, and J. J. Venditti, "Detecting Certainness in Spoken Tutorial Dialogues,” Proceedings of Interspeech, 2005.