1 / 59

Bayesian Hierarchical Model for Learning Natural Scene Categories

This paper discusses a Bayesian hierarchical model for learning and recognizing natural scene categories without extracting objects. It also explores techniques from statistical text modeling, such as pLSA and LDA, for scene classification and object discovery.

cgallego
Download Presentation

Bayesian Hierarchical Model for Learning Natural Scene Categories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005 Tomasz Malisiewicz tomasz@cmu.edu Advanced Machine Perception February 2006

  2. Graphical Models: Recent Trend in Machine Learning Describing Visual Scenes using Transformed Dirichlet Processes. E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. NIPS, Dec. 2005.

  3. Outline • Goals of both vision papers • Techniques from statistical text modeling - pLSA vs LDA • Scene Classification via LDA • Object Discovery via pLSA

  4. Goal: Learn and Recognize Natural Scene Categories Classify a scene without first extracting objects Other techniques we know of: -Global frequency (Oliva and Torralba) -Texton Histogram (Renninger, Malik et al)

  5. Goal: Discover Object Categories • Discover what objects are present in a collection of images in an unsupervised way • Find those same objects in novel images • Determine what local image features correspond to what objects; segmenting the image

  6. Enter the world of Statistical Text Modeling • D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003. • Bag-of-words approaches: the order of words in a document can be neglected • Graphical Model Fun

  7. Bag-of-words • A document is a collection of M words • A corpus (collection of documents) is summarized in a term-document matrix

  8. Object Bag of ‘words’

  9. 1990: Latent Semantic Analysis (LSA) • Goal: map high-dimensional count vectors to a lower dimensional representation to reveal semantic relations between words • The lower dimensional space is called the latent semantic space • Dim( latent space ) = K

  10. words topics topics words NxM NxK KxK KxM topics topics = x x documents documents 1990: Latent Semantic Analysis (LSA) • D = {d1,…,dN} N documents • W = {w1,…,wM} M words • Nij = #(di,wj) NxM co-occurrence term-document matrix

  11. words topics topics words NxM NxK KxK KxM topics topics = x x documents documents What did we just do? Singular Value Decomposition

  12. LSA summary • SVD on term-document matrix • Approximate N by thresholding all but the largest K singular values in W to zero • Produces rank-K optimal approximation to N in the L2-matrix or Frobenius norm sense

  13. According to this superposition principle, LSA is unable to capture multiple senses of a word LSA and Polysemy • Polysemy: the ambiguity of an individual word or phrase that can be used (in different contexts) to express two or more different meanings • Under the LSA model, the coordinates of a word in latent space can be written as a linear superposition of the coordinates of the documents that contain the word

  14. Problems with LSA • LSA does not define a properly normalized probability distribution • No obvious interpretation of the directions in the latent space • From statistics, the utilization of L2 norm in LSA corresponds to a Gaussian Error assumption which is hard to justify in the context of count variables • Polysemy problem

  15. pLSA to the rescue • Probabilistic Latent Semantic Analysis • pLSA relies on the likelihood function of multinomial sampling and aims at an explicit maximization of the predictive power of the model

  16. Decomposition into Probabilities! Observed word distributions Topic distributions per document word distributions per topic pLSA to the rescue Slide credit: Josef Sivic

  17. Learning the pLSA parameters Observed counts of word i in document j Unlike LSA, pLSA does not minimize any type of ‘squared deviation.’ The parameters are estimated in a probabilistically sound way. Maximize likelihood of data using EM. Minimize KL divergence between empirical distribution and model Slide credit: Josef Sivic

  18. EM for pLSA (training on a corpus) • E-step: compute posterior probabilities for the latent variables • M-step: maximize the expected complete data log-likelihood

  19. z d w Graphical View of pLSA • pLSA is a generative model • Select a document di with prob P(di) • Pick latent class zk with prob P(zk|di) • Generate word wj with prob P(wj|zk) Observed variables Latent variables Plates

  20. How does pLSA deal with previously unseen documents? • “Folding-in” Heuristic • First train on Corpus to obtain • Now re-run same training EM algorithm, but don’t re-estimate and let D={dunseen}

  21. Problems with pLSA • Not a well-defined generative model of documents; d is a dummy index into the list of documents in the training set (as many values as documents) • No natural way to assign probability to a previously unseen document • Number of parameters to be estimated grows with size of training set

  22. LDA pLSA LDA to the rescue • Latent Dirichlet Allocation treats the topic mixture weights as a k-parameter hidden random variable and places a Dirichlet prior on the multinomial mixing weights • Dirichlet distribution is conjugate to the multinomial distribution (most natural prior to choose: the posterior distribution is also a Dirichlet!)

  23. Corpus-Level parameters in LDA • Alpha and beta are corpus-level documents that are sampled once in the corpus creating generative model (outside of the plates!) • Alpha and beta must be estimated before we can find the topic mixing proportions belonging to a previously unseen document LDA

  24. 1 2 K z1 z2 z3 zN z1 z2 z3 zN z1 z2 z3 zN w1 w2 w3 wN w1 w2 w3 wN w1 w2 w3 wN b Getting rid of plates Thanks to Jonathan Huang for the un-plated LDA graphic

  25. Inference in LDA • Inference = estimation of document-level parameters • Intractable to compute  must employ approximate inference

  26. Approximate Inference in LDA • Variational Methods: Use Jensen’s inequality to obtain a lower bound on the log likelihood that is indexed by a set of variational parameters • Optimal Variational Parameters (document-specific) are obtained by minimizing the KL divergence between the variational distribution and the true posterior Variational Methods are one way of doing this. Gibbs sampling (MCMC) is another way. Variational distribution

  27. Look at some P(w|z) produced by LDA • Show some pLSI and LDA results applied to text • An LDA project by Tomasz Malisiewicz and Jonathan Huang • Search for the word ‘drive’

  28. pLSA and LDA applied to Images • How can one apply these techniques to the images?

  29. Hierarchical Bayesian text models z d w N D  z c w N D Probabilistic Latent Semantic Analysis (pLSA) Hoffman, 2001 Latent Dirichlet Allocation (LDA) Blei et al., 2001

  30. Hierarchical Bayesian text models z d w N D “face” Probabilistic Latent Semantic Analysis (pLSA) Sivic et al. ICCV 2005

  31. Hierarchical Bayesian text models “beach”  z c w N D Latent Dirichlet Allocation (LDA) Fei-Fei et al. ICCV 2005

  32. A Bayesian Hierarchical Model for Learning Natural Scene Categories

  33. Flow Chart: Quick Overview

  34. How to Generate an Image? Choose a scene (mountain, beach, …) Given scene generate an intermediate probability vector over ‘themes’ For each word: Determine current theme from mixture of themes Draw a codeword from that theme

  35. How to Generate an Image?

  36. Inference • How to make decision on a novel image • Integrate over latent variables to get: • Approximate Variational Inference (not easy, but Gibbs sampling is supposed to be easier)

  37. Codebook • 174 Local Image Patches • Detection: Evenly Sampled Grid Random Sampling Saliency Detector Lowe’s DoG Detector • Representation: Normalized 11x11 gray values 128-dim SIFT

  38. Results: Average performance 64% • Confusion Matrix 100 training examples and 50 test examples Rank statistic test:the probability of a test scene correctly belong to one of the top N most probable categories

  39. Results: The Distributions Theme distribution Codeword distribution

  40. The peak at 174

  41. Summary of detection and representation choices • SIFT outperforms pixel gray values • Sliding grid, which creates the largest number of patches, does best

  42. Discovering objects and their location in images

  43. Visual Words • Vector Quantized SIFT descriptors computed in regions • Regions come from elliptical shape adaptation around interest point, and from the maximally stable regions of Matas et al. • Both are elliptical regions at twice their detected scale

  44. Building a Vocabulary

  45. K-means clustering of 300K regions to get about 1K clusters for each of Shape Adapted and Maximally Stable regions … Building a Vocabulary Vector quantization Slide credit: Josef Sivic

  46. pLSA Training • Sanity Check: Remember what quantities must be estimated?

  47. Results #1: Topic Discovery • This is just the training stage • Obtain P(zk|dj) for each image, then classify image as containing object k according to the max of P(zk|dj) over k 4 object categories Plus background

  48. Results #1: Topic Discovery

  49. Results #2: Classifying New Images • Object Categories learned on a corpus, then object categories found in new image Anybody remember how this is done? Remember the index d in the graphical model

More Related