1 / 41

Efficiently handling discrete structure in machine learning

Efficiently handling discrete structure in machine learning. Stefanie Jegelka MADALGO summer school. bla blablala oh bla bl abl lba bla gggg hgt dfg uyg sd djfkefbjal odh wdbfeowhjkd fenjk jj bla blablala oh bla dw bl abl lba bla gggg hgt dfg uyg sd djfkefbjal odh wdbfeowhjkd

beata
Download Presentation

Efficiently handling discrete structure in machine learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficiently handling discrete structurein machine learning Stefanie Jegelka MADALGO summer school

  2. blablablala oh blablabllbabla gggghgtdfguygsddjfkefbjal odhwdbfeowhjkdfenjkjj blablablala oh bladw blabllbablagggghgtdfguyg sddjfkefbjalodhwdbfeowhjkd fenjkjj blablablala oh blablabllbabla gggghgtdfguygefefm o sddjfkefbjalodhwdbfeowhjkd fenjkjjef owskfwu

  3. Overview • discrete labeling problems (MAP inference) • (structured) sparse variable selection • finding informative / influential subsets Recurrent themes: • submodularity • convexity • polyhedra

  4. 1. Distributions for discrete labels label • Define distribution • Maximum a posteriori inference: find pixel When can we solve this efficiently?

  5. Structure: Graphical models • independent: • Graphical model: • a node for each variable • not connected  variables independent • shows conditional dependence

  6. Graphical models

  7. Graphical models hidden label pixel i priorspatial coherence likelihood “data fit”

  8. Efficient MAP inference Discrete optimization problem • structure:tree-structured graphical models • functional form of log P: e.g. submodularity, supermodularity • structure + functional form: graph cuts, decomposition

  9. 2. Sparsity height age

  10. 2. Sparsity • high dimensional? • prior knowledge: w is sparse! • interpretability • statistical benefits: fewer samples needed • computational benefits • generalization: sparsitypatterns loss: “data fit” regularizer

  11. 3. Informative subsets Place sensors to monitor temperature

  12. y5 y2 y1 y4 y3 y6 Sensing x1 x2 x3 Ys: temperatureat location s Xs: sensor valueat location s x6 x4 x5 Xs= Ys+ noise Where to measure to maximize information about y?

  13. Maximizing influence probabilistic model of propagation in social network maximize expected spread

  14. Overview • discrete labeling problems (MAP inference) • (structured) sparse variable selection • finding informative / influential subsets Recurrent questions: • how model prior knowledge / assumptions? structure • efficient optimization? Recurrent themes: • convexity • submodularity • polyhedra

  15. Basics: convexity & submodularity

  16. Convex functions is convex if for all x, y:

  17. Set functions ground set

  18. Submodular set functions for all and • modular / linear function: satisfies this with equality • we will make additional assumption:

  19. Submodular set functions • Diminishing gains: for all • Union-Intersection: for all A + e + e B

  20. The big picture

  21. The big picture graph theory (Frank 1993) electrical networks (Narayanan 1997) game theory (Shapley 1970) combinatorial optimization submodular functions matroid theory (Whitney, 1935) machine learning stochastic processes (Macchi 1975, Borodin 2009)

  22. Example: cover

  23. Y2 Y6 Y3 Y4 Y1 Y5 Likelihood Prior Sensing X1 X2 X3 Xs: temperatureat location s Ys: sensor valueat location s X6 X4 X5 Ys= Xs+ noise Joint probability distribution P(X1,…,Xn,Y1,…,Yn) = P(X1,…,Xn) P(Y1,…,Yn| X1,…,Xn)

  24. Sensing • discrete random variables uncertainty about temperature before sensing uncertainty about temperature after sensing Claim: If the are conditionally independent given then F is submodular.proof: discrete entropy is submodular

  25. Example: costs cost: time to reach shop + price of items Market 3 breakfast?? t3 ground set t1 t2 Market 1 Market 2 each item 1 $

  26. Example: economies of scale cost: time to shop + price of items Market 3 breakfast?? F( ) = cost() + cost( , ) = t1+ 1 + t2+ 2 = #shops + #items Market 1 Market 2 submodular?

  27. Graph cuts • Graph cut for a single edge is submodular • Graph cut for entire graph: sum of submodular functions Closedness property IA sum of submodular functions is submodular:all submodularsubmodular

  28. Other closednessproperties submodular on and . Thenthefollowingare also submodular: • Restriction: S V W S V

  29. Other closednessproperties submodular on and . Thenthefollowingare also submodular: • Restriction: • Conditioning: S V S W V

  30. Other closednessproperties submodular on and . Thenthefollowingare also submodular: • Restriction: • Conditioning: • Reflection: S V

  31. Submodularity … discrete convexity …. … or concavity?

  32. Convex aspects • convex extension • duality • efficient minimization

  33. Concave aspects • submodularity: • concavity: A + s + s B F(A) “intuitively” |A|

  34. Submodularity and concavity • suppose and submodularif and only if … is concave

  35. Maximum of submodular functions • submodular. What about ? F(A) = max(F1(A),F2(A)) F1(A) F2(A) |A| max(F1,F2) not submodular in general!

  36. Minimum of submodular functions Well, maybe ? min(F1,F2) not submodular in general!

  37. Overview • discrete labeling problems (MAP inference) • (structured) sparse variable selection • finding informative / influential subsets Recurrent questions: • how model prior knowledge / assumptions? structure • efficient optimization? Recurrent themes: • convexity • submodularity • polyhedra

  38. Efficient MAP inference Discrete optimization problem • structure:tree-structured graphical models • functional form of log P: e.g. submodularity, supermodularity • structure + functional form: graph cuts, decomposition

  39. Set functions and energy functions 1 any set function with . … is a function on binary vectors! a 1 pseudo-boolean function b 0 c 0 d A a b c d binary labeling problems = subset selection problems!

  40. Submodularity • set function • pseudo-boolean function • is also a polynomial, of degree up to n: alternative representation

  41. MAP inference “energy function”

More Related