1 / 24

multimodality, universals, natural interaction…

multimodality, universals, natural interaction…. and some other stories…. Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4. going multimodal. ‘multimodal’ is this decade’s main ‘affective interaction’ aspect. plethora of modalities available to capture and process

ali-gentry
Download Presentation

multimodality, universals, natural interaction…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

  2. going multimodal • ‘multimodal’ is this decade’s main ‘affective interaction’ aspect. • plethora of modalities available to capture and process • visual, aural, haptic… • ‘visual’ can be broken down to ‘facial expressivity’, ‘hand gesturing’, ‘body language’, etc. • ‘aural’ to ‘prosody’, ‘linguistic content’, etc.

  3. why multimodal? • Extending unimodality… • recognition from traditional unimodal inputs had serious limitations • Multimodal corpora become available • What to gain? • have recognition rates improved? • or just introduced more uncertain features

  4. essential reading • Communications of the ACM,Nov. 1999, Vol. 42, No. 11, pp. 74-81

  5. putting it all together • myth #6: multimodal integration involves redundancy of content between modes • you have features from a person’s • facial expressions and body language • speech prosody and linguistic content, • even their heartbeat rate • so, what do you do when their face tells you different than their …heart?

  6. first, look at this video

  7. and now, listen!

  8. but it can be good • what happens when one of the available modalities is not robust? • better yet, when the ‘weak’ modality changes over time? • consider the ‘bartender problem’ • very little linguistic content reaches its target • mouth shape available (viseme) • limited vocabulary

  9. but it can be good

  10. again, why multimodal? • holy grail: assigning labels to different parts of human-human or human-computer interaction • yes, labels can be nice! • humans do it all the time • and so do computers (e.g., classification) • OK, but what kind of label?

  11. In the beginning … • Based on the claim that ‘there are six facial expressions recognized universally across cultures’… • all video databases used to contain images of sad, angry, happy or fearful people… • thus, more sad, angry, happy or fearful people appear, even when data involve HCI, and subtle emotions/additional labels are out of the picture • can you really be afraid that often when using your computer?

  12. the Humaine approach • so where is Humaine in all that? • subtle emotions • natural expressivity • alternative emotion representations • discussing dynamics • classification of emotional episodes from life-like HCI and reality TV

  13. Humaine WP4 results

  14. HUMAINE 2010 three years from now in a galaxy (not) far, far away…

  15. a fundamental question

  16. a fundamental question • OK, people may be angry or sad, or express positive/active emotions • face recognition provides response to the ‘who?’ question • ‘when?’ and ‘where?’ are usually known or irrelevant • but, does anyone know ‘why?’ • context information • semantics

  17. a fundamental question (2)

  18. is it me or?...

  19. is it me or?... • some modalities may display no clues or, worse, contradicting clues • the same expression may mean different things coming from different people • can we ‘bridge’ what we know about someone or about the interaction with what we sense? • and can we adapt what we know based on that? • or can we align what we sense with other sources?

  20. another kind of language

  21. another kind of language • sign language analysis poses a number of interesting problems • image processing and understanding tasks • syntactic analysis • context (e.g. when referring to a third person) • natural language processing • vocabulary limitations

  22. want answers? Let us try to extend some of the issues already raised!

  23. C1 C2. Cn Semantics – Context (a peak at the future) Centralised /Decentralised Knowledge Repository Visual- data Segmentation Feature Extraction Visual analysis Fuzzy Reasoning Engine (FiRE) Adapt - ation Semantic Analysis Label- ling Fusion Classifiers Ontology infrastructure Context Context analysis

  24. Standardisation Activities • W3C Multimedia Semantics Incubator Group • W3C Emotion Incubator Group • Provide machine understandable representations of available Emotion Modelling, Analysis, Synthesis theory, cues and results to be accessed through the Web and used in all types of affective interaction.

More Related