1 / 26

Content-based Video Indexing, Classification & Retrieval

Content-based Video Indexing, Classification & Retrieval. Presented by HOI, Chu Hong Nov. 27, 2002. Outline. Motivation Introduction Two approaches for semantic analysis A probabilistic framework ( Naphade, Huang ’01 )

Download Presentation

Content-based Video Indexing, Classification & Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

  2. Outline • Motivation • Introduction • Two approaches for semantic analysis • A probabilistic framework (Naphade, Huang ’01) • Object-based abstraction and modeling [Lee, Kim, Hwang ’01] • A multimodal framework for video content interpretation • Conclusion

  3. Motivation • There is an amazing growth in the amount of digital video data in recent years. • Lack of tools for classify and retrieve video content • There exists a gap between low-level features and high-level semantic content. • To let machine understand video is important and challenging.

  4. Introduction • Content-based Video indexing • the process of attaching content based labels to video shots • essential for content-based classification and retrieval • Using automatic analysis techniques - shot detection, video segmentation - key frame selection - object segmentation and recognition - visual/audio feature extraction - speech recognition, video text, VOCR

  5. Introduction • Content-based Video Classification • Segment & classify videos into meaning categories • Classify videos based on predefined topic • Useful for browsing and searching by topic • Multimodal method • Visual features • Audio features • Motion features • Textual features • Domain-specific knowledge

  6. Introduction • Content-based Video Retrieval • Simple visual feature query • Retrieve video with key-frame: Color-R(80%),G(10%),B(10%) • Feature combination query • Retrieve video with high motion upward(70%), Blue(30%) • Query by example (QBE) • Retrieve video which is similar to example • Localized feature query • Retrieve video with a running car toward right • Object relationship query • Retrieve video with a girl watching the sun set • Concept query (query by keyword) • Retrieve explosion, White Christmas

  7. Introduction • Feature Extraction • Color features • Texture features • Shape features • Sketch features • Audio features • Camera motion features • Object motion features

  8. Semantic Indexing & Querying • Limitation of QBE • Measuring similarity using only low-level features • Lack reflection of user’s perception • Difficult annotation of high level features • Syntactic to Semantic • Bridge the gap between low-level feature and semantic content • Semantic indexing, Query By Keyword (QBK) • Semantic description scheme – MPEG-7 • Semantic interaction between concepts • no scheme to learn the model for individual concepts

  9. Semantic Modeling & Indexing • Two approaches • Probabilisticframework, ‘Multiject’ (Naphade’01) • Object-based abstraction and indexing [Lee, Kim, Hwang ’01]

  10. A probabilistic approach (‘Multiject’ & ‘Multinet’) (Naphade, Huang ’01) • a probabilistic multimedia object • 3 categories semantic concepts • Objects • Face, car, animal, building • Sites • Sky, mountain, outdoor, cityscape • Events • Explosion, waterfall, gunshot, dancing

  11. Multiject for semantic concept P( Outdoor = Present | features, other multijects) = 0.7 Other multijects Outdoor Visual features Audio features Text features

  12. How to create a Multiject • Shot-boundary detection • Spatio-temporal segmentation of within-shot frames • Feature extraction (color, texture, edge direction, etc ) • Modeling • Sites: mixture of Gaussians • Events: hidden Markov models (HMMs) with observation densities as gaussian mixtures • All audio events: modeled using HMMs • Each segment is tested for each concept and the information is then composed at frame level

  13. Multiject : Hierarchical HMM ss1 - ssm : state sequence for supervisor HMM sa1 - sam : state sequence for audio HMM xa1 - xam : audio observations sv1 - svm : state sequence for video HMM xv1 - xvm : video observations

  14. Multinet: Concept Building based on Multiject • A network of multijects modeling interaction between them • + / - : positive/negative interaction between multijects

  15. Bayesian Multinet • Nodes : binary random variables (presence/absence of multiject) • Layer 0 : frame-level multiject-based semantic features • Layer 1 : inference from layer 0 : • Layer 2 : higher level for performance improvement

  16. Video Sequence VO Extraction Object-based Video Abstraction Object-based Low-Level Feature Extraction Indexing /Retrieving Semantic Features Modeling Object-based SemanticVideo Modeling

  17. In In-1 von-1 Motion Projection Model Update (Histogram Backprojection) delay Object Post-processing von Object Extraction based on Object Tracking [Kim, Hwang ‘00]

  18. Object Features HMM Training Pre-processing Abstracted frame sequence Semantic Feature Modeling • Modeling based on temporal variation of object features • Boundary shape and motion statistics of object area

  19. ….. S1 S2 ST HMM Modeling 1. Observation Sequence O1 ……. OT . . . . object features 2. Left-Right 1-D HMM modeling

  20. Video Modeling: Three Layer Structure Three layer structure of video modeling, compared to NLP Video Understanding Natural Language Processing Content Interpretation Interpretation Semantic Video Modeling Frame-based Structural Modeling Object-based Structural Modeling Sentence Structure & grammar Word Recognition Audio-Visual Feature Extraction

  21. A Multimodal Framework for Video Content Interpretation • Long-term goal • Application on automatic TV Programs Scout • Allow user to request topic-level programs • Integrate multiple modalities: visual, audio and Text information • Multi-level concepts • Low: low-level feature • Mid: object detection, event modeling • High: classification result of semantic content • Probabilistic model, Using Bayesian network for classification (causal relationship, domain-knowledge)

  22. How to work with the framework? • Preprocessing • Story segmentation (shot detection) • VOCR, Speech Recognition • Key frame selection • Feature Extraction • Visual features based on key-frame • Color, texture, shape, sketch, etc. • Audio features • average energy, bandwidth, pitch, mel-frequency cepstral coefficients, etc. • Textual features (Transcript) • Knowledge tree, a lot of keyword categories: politics, entertainment, stock, art, war, etc. • Word spotting, vote histogram • Motion features • Camera operation: Panning, Tilting, Zooming, Tracking, Booming, Dollying • Motion trajectories (moving objects) • Object abstraction, recognition • Building and training the Bayesian network

  23. Challenging points • Preprocessing is significant in the framework. • Accuracy of key-frame selection • Accuracy of speech recognition & VOCR • Good feature extraction is important for the performance of classification. • Modeling semantic video objects and events • How to integrate multiple modalities still need to be well considered.

  24. Conclusion • Introduction of several basic concepts • Semantic video modeling and indexing • Propose a multimodal framework for topic classification of Video • Discussion of Challenging problems

  25. Q & A Thank you!

More Related