1 / 20

Cognitive Computer Vision

Cognitive Computer Vision. Kingsley Sage khs20@sussex.ac.uk and Hilary Buxton hilaryb@sussex.ac.uk Prepared under ECVision Specific Action 8-3 http://www.ecvision.org. Lecture 13. Learning Bayesian Belief Networks Taxonomy of methods

bond
Download Presentation

Cognitive Computer Vision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cognitive Computer Vision Kingsley Sage khs20@sussex.ac.uk and Hilary Buxton hilaryb@sussex.ac.uk Prepared under ECVision Specific Action 8-3 http://www.ecvision.org

  2. Lecture 13 • Learning Bayesian Belief Networks • Taxonomy of methods • Learning BBNs for the fully observable data and known structure case

  3. So why are BBNs relevant to Cognitive CV? • Provides a well-founded methodology for reasoning with uncertainty • These methods are the basis for our model of perception guided by expectation • We can develop well-founded methods of learning rather than just being stuck with hand-coded models

  4. B A O C N Reminder: What is a BBN? • Compact representation of the joint probability • Each variable is represented as a node. • Conditional independence assumptions are encoded using a set of arcs • Different types of graph exist. The one shown is a Directed Acyclic Graph (DAG)

  5. Why is learning important in the context of BBNs? • Knowledge acquisition can be an expensive process • Experts may not be readily available (scarce knowledge) or simply not exist • But you might have a lot of data from (say) case studies • Learning allows us to construct BBN models from the data and in the process gain insight into the nature of the problem domain

  6. The process of learning Data (may be full or partial) Learning process Model structure (if known)

  7. A B O What do we mean by “partial” data? • Training data where there are missing values e.g.: Discrete valued BBN with 3 nodes

  8. A O B A O B What do we mean by “known” and “unknown” structure? Known structure Unknown structure

  9. Taxonomy of learning methods • In this lecture we will look at the full observability and known model structure case in detail • In the next lecture we will take an overview of the other three cases Observability

  10. LIKELIHOOD Full observability & known structure Getting the notation right • The model parameters (CPDs) are represented as  (example later) • Training data set D • We want to find parameters to maximise P(|D) • Likelihood function L(:D) is P(D| )

  11. Training data Dz A O B Full observability & known structure Getting the notation right

  12. A O B Factorising the likelihood expression

  13. Decomposition in general All the parameters for each node can be estimated separately

  14. A B O L(:D)  ExampleEstimating parameter for root node Let’s say our training data D contains these values for A {T,F,T,T,F,T,T,T} We represent our single parameter  as the probability that a=T The likelihood for the sequence is:

  15. So what about the prior on ? We have an expression for P(a[1],…,a[M]), all we need to do now is to say something about P() If all values of  were equally likely at the outset, then we have a MAXIMUM LIKELIHOOD ESTIMATE (MLE) for P(|a[1],…,a[M]) which for our example is  = 0.75 I.e. p(a=T is 0.75)

  16. So what about the prior on ? If P() is not uniform, we need to take that into account when computing our estimate for a model parameter. In that case P(|x[1],…,x[M]) would be a MAXIMUM APOSTERIORI PROBABILITY (MAP) estimate There are many different forms of prior, one of the more common ones in this application is the DIRICHLET prior …

  17. Dirichlet(T,F) p()  The Dirichlet prior

  18. Semantic priors • If the training data D is sorted into known classes, the priors can be estimate beforehand. These are called “semantic priors” • This involves an element of hand coding and loses the advantage gaining some insight into the problem domain • Does give the advantage of mapping into expert knowledge of the classes in the problem

  19. Summary • Estimation relies on sufficient statistics • For ML estimate for discrete valued nodes, we use counts #: • For MAP estimate, we have to account for the prior

  20. Next time … • Overview of methods for learning BBNs: • Full data and unknown structure • Partial data and known structure • Partial data and unknown structure • Excellent tutorial at by Koller and Friedman: www.cs.huji.ac.il/~nir/Nips01-Tutorial/ • Some of today’s slides were adapted from that tutorial

More Related