410 likes | 554 Views
Efficiently handling discrete structure in machine learning. Stefanie Jegelka MADALGO summer school. bla blablala oh bla bl abl lba bla gggg hgt dfg uyg sd djfkefbjal odh wdbfeowhjkd fenjk jj bla blablala oh bla dw bl abl lba bla gggg hgt dfg uyg sd djfkefbjal odh wdbfeowhjkd
E N D
Efficiently handling discrete structurein machine learning Stefanie Jegelka MADALGO summer school
blablablala oh blablabllbabla gggghgtdfguygsddjfkefbjal odhwdbfeowhjkdfenjkjj blablablala oh bladw blabllbablagggghgtdfguyg sddjfkefbjalodhwdbfeowhjkd fenjkjj blablablala oh blablabllbabla gggghgtdfguygefefm o sddjfkefbjalodhwdbfeowhjkd fenjkjjef owskfwu
Overview • discrete labeling problems (MAP inference) • (structured) sparse variable selection • finding informative / influential subsets Recurrent themes: • submodularity • convexity • polyhedra
1. Distributions for discrete labels label • Define distribution • Maximum a posteriori inference: find pixel When can we solve this efficiently?
Structure: Graphical models • independent: • Graphical model: • a node for each variable • not connected variables independent • shows conditional dependence
Graphical models hidden label pixel i priorspatial coherence likelihood “data fit”
Efficient MAP inference Discrete optimization problem • structure:tree-structured graphical models • functional form of log P: e.g. submodularity, supermodularity • structure + functional form: graph cuts, decomposition
2. Sparsity height age
2. Sparsity • high dimensional? • prior knowledge: w is sparse! • interpretability • statistical benefits: fewer samples needed • computational benefits • generalization: sparsitypatterns loss: “data fit” regularizer
3. Informative subsets Place sensors to monitor temperature
y5 y2 y1 y4 y3 y6 Sensing x1 x2 x3 Ys: temperatureat location s Xs: sensor valueat location s x6 x4 x5 Xs= Ys+ noise Where to measure to maximize information about y?
Maximizing influence probabilistic model of propagation in social network maximize expected spread
Overview • discrete labeling problems (MAP inference) • (structured) sparse variable selection • finding informative / influential subsets Recurrent questions: • how model prior knowledge / assumptions? structure • efficient optimization? Recurrent themes: • convexity • submodularity • polyhedra
Convex functions is convex if for all x, y:
Set functions ground set
Submodular set functions for all and • modular / linear function: satisfies this with equality • we will make additional assumption:
Submodular set functions • Diminishing gains: for all • Union-Intersection: for all A + e + e B
The big picture graph theory (Frank 1993) electrical networks (Narayanan 1997) game theory (Shapley 1970) combinatorial optimization submodular functions matroid theory (Whitney, 1935) machine learning stochastic processes (Macchi 1975, Borodin 2009)
Y2 Y6 Y3 Y4 Y1 Y5 Likelihood Prior Sensing X1 X2 X3 Xs: temperatureat location s Ys: sensor valueat location s X6 X4 X5 Ys= Xs+ noise Joint probability distribution P(X1,…,Xn,Y1,…,Yn) = P(X1,…,Xn) P(Y1,…,Yn| X1,…,Xn)
Sensing • discrete random variables uncertainty about temperature before sensing uncertainty about temperature after sensing Claim: If the are conditionally independent given then F is submodular.proof: discrete entropy is submodular
Example: costs cost: time to reach shop + price of items Market 3 breakfast?? t3 ground set t1 t2 Market 1 Market 2 each item 1 $
Example: economies of scale cost: time to shop + price of items Market 3 breakfast?? F( ) = cost() + cost( , ) = t1+ 1 + t2+ 2 = #shops + #items Market 1 Market 2 submodular?
Graph cuts • Graph cut for a single edge is submodular • Graph cut for entire graph: sum of submodular functions Closedness property IA sum of submodular functions is submodular:all submodularsubmodular
Other closednessproperties submodular on and . Thenthefollowingare also submodular: • Restriction: S V W S V
Other closednessproperties submodular on and . Thenthefollowingare also submodular: • Restriction: • Conditioning: S V S W V
Other closednessproperties submodular on and . Thenthefollowingare also submodular: • Restriction: • Conditioning: • Reflection: S V
Submodularity … discrete convexity …. … or concavity?
Convex aspects • convex extension • duality • efficient minimization
Concave aspects • submodularity: • concavity: A + s + s B F(A) “intuitively” |A|
Submodularity and concavity • suppose and submodularif and only if … is concave
Maximum of submodular functions • submodular. What about ? F(A) = max(F1(A),F2(A)) F1(A) F2(A) |A| max(F1,F2) not submodular in general!
Minimum of submodular functions Well, maybe ? min(F1,F2) not submodular in general!
Overview • discrete labeling problems (MAP inference) • (structured) sparse variable selection • finding informative / influential subsets Recurrent questions: • how model prior knowledge / assumptions? structure • efficient optimization? Recurrent themes: • convexity • submodularity • polyhedra
Efficient MAP inference Discrete optimization problem • structure:tree-structured graphical models • functional form of log P: e.g. submodularity, supermodularity • structure + functional form: graph cuts, decomposition
Set functions and energy functions 1 any set function with . … is a function on binary vectors! a 1 pseudo-boolean function b 0 c 0 d A a b c d binary labeling problems = subset selection problems!
Submodularity • set function • pseudo-boolean function • is also a polynomial, of degree up to n: alternative representation
MAP inference “energy function”