1 / 43

16-824: Learning-based Methods in Vision

16-824: Learning-based Methods in Vision. Instructors: Alexei (Alyosha) Efros efros@cs.cmu.edu , 225 Smith Hall Leon Sigal lsigal@disneyresearch.com , Disney Research Pittsburgh Web Page: http://www.cs.cmu.edu/~efros/courses/LBMV12/. Today. Introduction Why This Course?

jason-booth
Download Presentation

16-824: Learning-based Methods in Vision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 16-824: Learning-based Methods in Vision • Instructors: • Alexei (Alyosha) Efros efros@cs.cmu.edu, 225 Smith Hall • Leon Sigal lsigal@disneyresearch.com, Disney Research Pittsburgh • Web Page: • http://www.cs.cmu.edu/~efros/courses/LBMV12/

  2. Today • Introduction • Why This Course? • Administrative stuff • Overview of the course

  3. A bit about Us • Alexei (Alyosha) Efros • Ph.D 2003, from UC Berkeley (signed by Arnie!) • Postdoctoral Fellow, University of Oxford, ’03-’04 • Research Interests: • Vision, Graphics, Data-driven “stuff” • Leonid Sigal • PhD 2007, from Brown University • Postdoctoral Fellow, University of Toronto, ’07-’09 • Research interests: • Vision, Graphics, Machine Learning

  4. Why this class? • The Old Days™: • 1. Graduate Computer Vision • 2. Advanced Machine Perception

  5. Why this class? • The New and Improved Days: • 1. Graduate Computer Vision • 2. Advanced Machine Perception • Physics-based Methods in Vision • Geometry-based Methods in Vision • Learning-based Methods in Vision

  6. The Hip & Trendy Learning Describing Visual Scenes using Transformed Dirichlet Processes. E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. NIPS, Dec. 2005.

  7. Learning as Last Resort

  8. Learning as Last Resort • EXAMPLE: • Recovering 3D geometry from single 2D projection • Infinite number of possible solutions! from [Sinha and Adelson 1993]

  9. Learning-based Methods in Vision • This class is about trying to solve problems that do not have a solution! • Don’t tell your mathematician frineds! • This will be done using Data: • E.g. what happened before is likely to happen again • Google Intelligence (GI): The AI for the post-modern world! • Note: this is not quite statistics • Why is this even worthwhile? • Even a decade ago at ICCV99 Faugeras claimed it wasn’t!

  10. The Vision Story Begins… • “What does it mean, to see? The plain man's answer (and Aristotle's, too). would be, to know what is where by looking.” • -- David Marr, Vision (1982)

  11. depth map Vision: a split personality • “What does it mean, to see? The plain man's answer (and Aristotle's, too). would be, to know what is where by looking. In other words, vision is the process of discovering from images what is present in the world, and where it is.” • Answer #1: pixel of brightness 243 at position (124,54) • …and depth .7 meters • Answer #2: looks like bottom edge of whiteboard showing at the top of the image • Which do we want? • Is the difference just a matter of scale?

  12. Measurement vs. Perception

  13. Brightness: Measurement vs. Perception

  14. Brightness: Measurement vs. Perception Proof!

  15. Lengths: Measurement vs. Perception Müller-Lyer Illusion http://www.michaelbach.de/ot/sze_muelue/index.html

  16. Vision as Measurement Device Real-time stereo on Mars Physics-based Vision Virtualized Reality Structure from Motion

  17. …but why do Learning for Vision? • “What if I don’t care about this wishy-washy human perception stuff? I just want to make my robot go!” • Small Reason: • For measurement, other sensors are often better (in DARPA Grand Challenge, vision was barely used!) • For navigation, you still need to learn! • Big Reason: • The goals of computer vision (what + where) are in terms of what humans care about.

  18. So what do humans care about? slide by Fei Fei, Fergus & Torralba

  19. Verification: is that a bus? slide by Fei Fei, Fergus & Torralba

  20. Detection: are there cars? slide by Fei Fei, Fergus & Torralba

  21. Identification: is that a picture of Mao? slide by Fei Fei, Fergus & Torralba

  22. Object categorization sky building flag face banner wall street lamp bus bus cars slide by Fei Fei, Fergus & Torralba

  23. Scene and context categorization • outdoor • city • traffic • … slide by Fei Fei, Fergus & Torralba

  24. Rough 3D layout, depth ordering slide by Fei Fei, Fergus & Torralba

  25. Challenges 1: view point variation slide by Fei Fei, Fergus & Torralba Michelangelo 1475-1564

  26. Challenges 2: illumination slide credit: S. Ullman

  27. Challenges 3: occlusion slide by Fei Fei, Fergus & Torralba Magritte, 1957

  28. Challenges 4: scale slide by Fei Fei, Fergus & Torralba

  29. Challenges 5: deformation slide by Fei Fei, Fergus & Torralba Xu, Beihong 1943

  30. Challenges 6: background clutter slide by Fei Fei, Fergus & Torralba Klimt, 1913

  31. Challenges 7: object intra-class variation slide by Fei-Fei, Fergus & Torralba

  32. Challenges 8: local ambiguity slide by Fei-Fei, Fergus & Torralba

  33. Challenges 9: the world behind the image

  34. In this course, we will: Take a few baby steps…

  35. Role of Learning Data Features Learning Algorithm

  36. Role of Learning Data Features Algorithm Shashua

  37. Course Outline • Overview of Learning for Vision (1 lecture) • Overview of Data for Vision (1 lecture) • Features  • Human Perception and visual neuroscience • Theories of Human Vision • Low-level Vision • Filters, edge detection, interest points, etc. • Mid-level Vision • Segmentation, Occlusions, 2-1/2D, scene layout, etc. • High-Level Vision • Object recognition • Scene Understanding • Action / Motion Understaing • Etc.

  38. Goals • Read some interesting papers together • Learn something new: both you and us! • Get up to speed on big chunk of vision research • understand 70% of CVPR papers! • Use learning-based vision in your own work • Learn how to speak • Learn how think critically about papers • Participate in an exciting meta-study!

  39. Course Organization • Requirements: • Class Participation (33%) • Keep annotated bibliography • Post on the Class Blog before each class • Ask questions / debate / flight / be involved! • Two Projects (66%) • Deconstruction Project • Implement and Evaluate a paper and present it in class • Must talk to us AT LEAST 2 weeks beforehand! • Can be done in groups of 2 (but must do 2 projects) • Synthesis Project • Do something worthwhile with what you learned for Deconstruction Project • Can be done in groups of 2 (1 project)

  40. Class Participation • Keep annotated bibliography of papers you read (always a good idea!). The format is up to you. At least, it needs to have: • Summary of key points • A few Interesting insights, “aha moments”, keen observations, etc. • Weaknesses of approach. Unanswered questions. Areas of further investigation, improvement. • Before each class: • Submit your summary for current paper(s) in hard copy (printout/xerox) • Submit a comment on the Class Blog • ask a question, answer a question, post your thoughts,praise, criticism, start a discussion, etc.

  41. Deconstruction Project • Pick a paper / set of papers from the list • Understand it as if you were the author • Re-implement it • If there is code, understand the code completely • Run it on data the same data (you can contact authors for data and even code sometimes) • Understand it better than the author • Run it on two other data sets (e.g. LabelMe dataset, Flickr dataset, etc, etc) • Run it with two other feature representations • Run it with two other learning algorithms • Maybe suggest directions for improvement. • Prepare an amazing 45min presentation • Discuss with me twice – once when you start the project, 3 days before the presentation

  42. Synthesis Project • Hopefully can grow out of the deconstruction project • 2 people can work on one

  43. End of Semester Awards • We will vote for: • Best Deconstruction Project • Best Synthesis Project • Best Blog Comment  • Prize: dinner in a French restaurant in Paris (transportation not included!) or some other worthy prizes

More Related