1 / 45

Visual Grouping and Object Recognition

This study focuses on visual grouping and object recognition, including finding boundaries, recognizing objects, and recognizing actions. It explores techniques such as contour detection, deformable templates, shape matching, and action recognition.

Download Presentation

Visual Grouping and Object Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Grouping and Object RecognitionJitendra Malik*U.C. Berkeley* with S. Belongie, C. Fowlkes, T. Leung, D. Martin, G. Mori, J. Puzicha, J.Shi, X. Ren

  2. From images/video to objects Labeled sets: tiger, grass etc

  3. A B C Consistency Perceptual organization forms a tree: Image BG L-bird R-bird bush far grass body beak body beak eye head eye head Two segmentations are consistent when they can be explained by the same segmentation tree (i.e. they could be derived from a single perceptual organization). • A,C are refinements of B • A,C are mutual refinements • A,B,C represent the same percept • Attention accounts for differences

  4. Outline • Finding boundaries • Recognizing objects • Recognizing actions

  5. Finding boundaries: Is texture a problem or a solution? image orientation energy

  6. Statistically optimal contour detection • Use humans to segment a large collection of natural images. • Train a classifier for the contour/non-contour classification using orientation energy and texture gradient as features.

  7. Orientation Energy • Gaussian 2nd derivative and its Hilbert pair • Can detect combination of bar and edge features [Perona & Malik 90]

  8. Texture gradient = Chi square distance between texton histograms in half disks across edge Chi-square i 0.1 j k 0.8

  9. ROC curve for local boundary detection

  10. Outline • Finding boundaries • Recognizing objects • Recognizing actions

  11. Biological Shape • D’Arcy Thompson: On Growth and Form, 1917 • studied transformations between shapes of organisms

  12. Deformable Templates: Related Work • Fischler & Elschlager (1973) • Grenander et al. (1991) • von der Malsburg (1993)

  13. Matching Framework ... model target • Find correspondences between points on shape • Fast pruning • Estimate transformation & measure similarity

  14. Comparing Pointsets

  15. Shape Context Count the number of points inside each bin, e.g.: Count = 4 ... Count = 10 • Compact representation of distribution of points relative to each point

  16. Shape Context

  17. Comparing Shape Contexts Compute matching costs using Chi Squared distance: Recover correspondences by solving linear assignment problem with costs Cij [Jonker & Volgenant 1987]

  18. Matching Framework ... model target • Find correspondences between points on shape • Fast pruning • Estimate transformation & measure similarity

  19. Fast pruning • Find best match for the shape context at only a few random points and add up cost

  20. Matching Framework ... model target • Find correspondences between points on shape • Fast pruning • Estimate transformation & measure similarity

  21. Thin Plate Spline Model • 2D counterpart to cubic spline: • Minimizes bending energy: • Solve by inverting linear system • Can be regularized when data is inexact Duchon (1977), Meinguet (1979), Wahba (1991)

  22. MatchingExample model target

  23. Outlier Test Example

  24. Object Recognition Experiments • Handwritten digits • COIL 3D objects (Nayar-Murase) • Human body configurations • Trademarks

  25. Terms in Similarity Score • Shape Context difference • Local Image appearance difference • orientation • gray-level correlation in Gaussian window • … (many more possible) • Bending energy

  26. Handwritten Digit Recognition • MNIST 600 000 (distortions): • LeNet 5: 0.8% • SVM: 0.8% • Boosted LeNet 4: 0.7% • MNIST 60 000: • linear: 12.0% • 40 PCA+ quad: 3.3% • 1000 RBF +linear: 3.6% • K-NN: 5% • K-NN (deskewed): 2.4% • K-NN (tangent dist.): 1.1% • SVM: 1.1% • LeNet 5: 0.95% • MNIST 20 000: • K-NN, Shape Context matching: 0.63%

  27. COIL Object Database

  28. Prototypes Selected for 2 Categories Details in Belongie, Malik & Puzicha (NIPS2000)

  29. Error vs. Number of Views

  30. Human body configurations

  31. Deformable Matching • Kinematic chain-based deformation model • Use iterations of correspondence and deformation • Keypoints on exemplars are deformed to locations on query image

  32. Results

  33. Trademark Similarity

  34. Recognizing objects in scenes

  35. Outline • Finding boundaries • Recognizing objects • Recognizing actions

  36. Examples of Actions • Movement and posture change • run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), … • Object manipulation • pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit, press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)… • Conversational gesture • point, … • Sign Language

  37. Key cues for action recognition • “Morpho-kinesics” of action (shape and movement of the body) • Identity of the object/s • Activity context

  38. Image/Video  Stick figure  Action • Stick figures can be specified in a variety of ways or at various resolutions (deg of freedom) • 2D joint positions • 3D joint positions • Joint angles • Complete representation • Evidence that it is effectively computable

  39. Tracking by Repeated Finding

  40. Achievable goals in 3 years • Reasonable competence at object recognition at crude category level (~1000) • Detection/Tracking of humans as kinematic chains, assuming adequate resolution. • Recognition of ~10-100 actions and compositions thereof.

More Related