1 / 65

Linking Video Analysis to Annotation Technologies

Linking Video Analysis to Annotation Technologies. Presentation for the BMVA’s Computer Vision Methods for Ambient Intelligence 31st May 2006 Dimitrios Makris (Kingston University) & Bogdan Vrusias (University of Surrey). REVEAL project. EPSRC-funded project, initiated in 2004

tybalt
Download Presentation

Linking Video Analysis to Annotation Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linking Video Analysis to Annotation Technologies Presentation for the BMVA’s Computer Vision Methods for Ambient Intelligence 31st May 2006 Dimitrios Makris (Kingston University) & Bogdan Vrusias (University of Surrey)

  2. REVEAL project • EPSRC-funded project, initiated in 2004 • Academic partners • Kingston University, University of Surrey • Industrial partners/observers • SIRA Ltd, Ipsotek Ltd, CrowdDynamics Ltd, Overview Ltd • End-Users • PITO, PSDB (Home Office), Surrey Police • Aim: “to promote those key technologies which will enable automated extraction of evidence from CCTV archives”

  3. See No EvilHear No EvilSpeak No Evil

  4. Scope of REVEAL • See Evil • Computer Vision • Input: Video Streams • Hear Evil • Natural Language Processing • Input: Annotations of Video Streams • Speak Evil • Link together • Output: Automatic Video Annotations

  5. Challenges • Development of Visual Evidence Thesaurus • Automatic Extraction of Surveillance Ontology • Extracting Visual Semantics • Motion Detection & Tracking • Geometric and Colour Constancy • Object Classification, Behaviour Analysis, Semantic Landscape • Analysing Crowds • Development of Surveillance Meta-Data Mode • Multimodal Data Fusion • Fusion of Visual Semantics and Annotations • Video Summarisation

  6. reveal@dirc.kingston

  7. Video Analysis Overview • Motion Analysis • Motion Detection • Motion Tracking • Crowd Analysis • Automatic Camera Calibration • Colour • Geometric • Visual Semantics extraction • Object Classification • Behaviour Analysis • Semantic landscape

  8. Motion Analysis (1/2) • Motion Detection • Novel Technique for handling rapid light variations, based on correlating changes in YUV (Renno et al, VS2006) • Motion Tracking • Blob-based Kalman filter for tackling partial occlusion (Xu&Ellis, BMVC2002)

  9. Motion Analysis (2/2) Example

  10. Crowd Analysis (1/3) • Problem: Detect and Track Individuals in Crowded situations Original Frame Foreground Mask

  11. Crowd Analysis (2/3) • Combine edges of original image with edges of foreground mask Original Frame Edges Foreground Mask Edges

  12. Crowd Analysis (3/3) • Fit a head-shoulder (Omega) model Head Candidates on the boundaries of the foreground Head Candidates in the scene Head Candidates within the foreground

  13. Large Vehicle Heights Vehicle Person Horizon Image Position (pixels) g Height (pixels) i - horizon h Automatic Geometric Calibration (1/3) • Pedestrian height model • Estimate linear pedestrian height model from observations (Renno et al, ICIP2002)

  14. Automatic Geometric Calibration (2/3) • Ground Plane Estimation • Use the pedestrian linear model to estimate ground plane. (Renno et al, BMVC2002)

  15. Occlusion Edges Depth Map Automatic Geometric Calibration (3/3) • Scene Depth Map • Use moving objects estimated depths to determine the scene depth map (Renno et al, BMVC2004)

  16. Automatic Colour Calibration (1/3) • Variation of colour responses is significant! • A real-time colour constancy algorithm is required

  17. Automatic Colour Calibration (2/3) • Grey world and Gamut Mapping algorithms were tested. • Automatic method for reference frame. • Gamut Mapping performs better, but Grey World can operate real-time. (Renno et al, VS-PETS2005)

  18. Automatic Colour Calibration (3/3) • Real Time Colour Constancy

  19. Visual Semantics(Makris et al, ECOVISION 2004) Targets • Pedestrians • Cars • large vehicles Actions • move • stop • enter/exit • accelerate • turn left/right Static features • road/corridor • door/gate • ATM • desk • bus stop

  20. Visual Semantics • Object Classification (ongoing work) • Behaviour Analysis (ongoing work) • Semantic landscape • Label static scene by observing activity (Makris&Ellis, AVSS 2003)

  21. Reverse Engineering

  22. Entry/Exit Zones Detected by an EM-based algorithm

  23. Detected Routes

  24. Segmentation of Routesto Paths & Junction

  25. Possible extensions • Use target labels • paths: traffic road or pavements • pedestrian crossing: junction of • pedestrian route • vehicle route • More complicated rules • bus stop • pedestrians stop • vehicle stop • pedestrians merge with vehicle

  26. Data Hierarchy in Video Analysis Textual Summary Actor labels Scene labels Action labels Blobs Trajectories Pixels

  27. Natural Language Processing (Surrey) Hypothesis: Experts are using a common language/keywords to describe crime scene/video evidence. • Visual Evidence Thesaurus • Data acquisition (workshops) • Data analysis • Automatic ontology extraction

  28. Video Annotation Workshops • 2 different workshops were organised and ran to prove the hypothesis and construct a domain thesaurus. • Different experts • Police Forces (Surrey-West Yorkshire). • Forensic Services (London-Birmingham). • Private Video Evidence expert (Luton). • Several data collection tasks • Purpose: Gather knowledge and feedback from experts in order to understand the way videos are observed and perceived. • Task: Validate the hypothesis and extract common keywords used and the description pattern.

  29. Video Annotation Workshops Selected sample from the Workshop

  30. Video Evidence Thesaurus • Workshop Outputs: • Initial descriptions from experts, for analysis. • Useful feedback and comments. • Workshop Feedback: • Strong interest in the project. • Willing to help (within the legal limits).

  31. Analysis • Same video clip. • Description from 3 different people. • Pattern (Identify, Elaborate, Location)

  32. Video Evidence Thesaurus • Analysis: • Description from different people, for same video clips. • Pattern (Identify <I>, Elaborate <E>, Location <L>) . • Grammar: • <Description>: <I><E|L><L|E|{Φ}><Description|{Φ}>. • <I>: <Single|Group> • <Single> : <{Person}|{Male}|{Female}|….> • <Group> : <2|3|…..|n><{People}|Single>

  33. Thesaurus Construction Methodology • We adopt a text-driven and bottom-up method: starting from a collection of texts in a specialist domain, together with a representative general language corpus • Use a five-step algorithm for identifying discourse patterns with more or less unique meanings, without any overt access to an external knowledge base

  34. Thesaurus Construction Methodology • Select training corpora: CCTV-Related Corpus and a general language corpus. • Extract key words; • Extract key collocates; • Extract local grammar using collocation and relevance feedback; • Assert the grammar as a finite state automaton.

  35. Development of Visual Evidence Thesaurus • Once the single terms, especially weird terms, are identified we find candidate compound terms by computing collocation statistics between the single terms and other open class words in the entire CORPUS.

  36. Development of Visual Evidence Thesaurus • Collocates of the the weird term EARPRINT + collocation statistics

  37. Development of Visual Evidence Thesaurus • Collocates of the the weird term EARPRINT IDENTIFICATION

  38. Development of Visual Evidence Thesaurus A inheritance hierarchy of EARPRINT collocates exported to a knowledge representation system PROTEGE:

  39. Development of Visual Evidence Thesaurus • A multiple inheritance hierarchy of EARPRINT collocates now exported to a knowledge representation workbench PROTEGE: Rubbish!

  40. Experiments and Evaluation • I. Select training corpora • Training-Corpus • The British National Corpus, comprising 100-million tokens distributed over 4124 texts (Aston and Burnard 1998); • Crime Alerts Corpus (FBI Crime Alerts, Wanted by the Royal Canadian Mounted Police (RCMP) & Polizei Bayern, Journal/Conference papers)comprising 109 articles and contains 214,437 words

  41. Experiments and Evaluation • II. Extract key words • The frequencies of individual words in the Crime Alerts Corpus were computed using System Quirk;

  42. Experiments and Evaluation

  43. Experiments and Evaluation

  44. Experiments and Evaluation • III. Extract key collocates

  45. Experiments and Evaluation • IV. Extract local grammar using collocation and relevance feedback

  46. Experiments and Evaluation • V. Assert the grammar as a finite state automaton • The (re-) collocation patterns can then be asserted as a finite state automata for each of the movement verbs and spatial preposition metaphors

  47. Describing Videos • An experiment: • 16 videos in the CAVIAR data set were shown to 4 different surveillance experts; • The experts were asked to describe the videos in their own words – in English surveillance speak • Experts will describe videos in a succinct manner using terminology of their domain and framing the description in a ‘local grammar’ • The interviews were transcribed and sentences and phrases were marked up using a basic ontology: Action, Location, Result, Miscellaneous.

  48. Describing Videos One of our experts described the frame on the left as Man in blue t-shirt, centre of scene, facing camera, raises white card high above his head. Second man wearing a dark top with white stripes down the sleeve enters scene from above. Meets a third individual with a dark top and pale trousers and an altercation occurs in the centre of the open space. Individuals meet briefly and leave scene in opposite directions. The original person with the white sleeves -- white stripes on the sleeves leaves scene below camera. Second person in the altercation leaves scene by the red chairs. That was an assault, I’d say.

  49. Describing Videos One of our experts described the frame on the left as Event 1: Man in blue t-shirt, centre of scene, facing camera, raises white card high above his head. Event 1: Miscellaneous: Man in blue t-shirt, Location: centre of scene, Action: facing camera, Result: raises white card high above his head.

  50. Describing Videos One of our experts described the frame on the left as Event 1: M Man in blue t-shirt, L centre of scene, A facing camera, R raises white card high above his head. Event 2: A Second man wearing a dark top with white stripes down the sleeve enters scene L from above. A Meets a third individual with a dark top and pale trousers and an altercation occurs L in the centre of the open space. A Individuals meet briefly R and leave scene in opposite directions.

More Related