600 likes | 715 Views
VISUAL GEOMETRY GROUP. Exploiting Duality (Particularly the dual of SVM). M. Pawan Kumar. PART I : General duality theory. Basics of Mathematical Optimization The algebra The geometry Examples. PART II : Solving the SVM dual. General Decomposition Algorithm Good Working Set
E N D
VISUAL GEOMETRY GROUP Exploiting Duality(Particularly the dual of SVM) M. Pawan Kumar
PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details
Mathematical Optimization min f0(x) Objective function s.t. fi(x) ≤ 0 Inequality constraints hi(x) = 0 Equality constraints x is a feasible point fi(x) ≤ 0, hi(x) = 0 x is a strictlyfeasible point fi(x) < 0, hi(x) = 0 Feasible region - set of all feasible points
Convex Optimization min f0(x) Objective function s.t. fi(x) ≤ 0 Inequality constraints hi(x) = 0 Equality constraints Feasible region is convex Objective function is convex Convex set??? Convex function???
Convex Set Line Segment x1 x2 c x1 + (1 - c) x2 c [0,1] Endpoints
Convex Set x1 x2 All points on the line segment lie within the set For all line segments with endpoints in the set
Non-Convex Set x1 x2
Examples of Convex Sets x1 x2 Line Segment
Examples of Convex Sets x1 x2 Line
Examples of Convex Sets Hyperplane aTx - b = 0
Examples of Convex Sets Halfspace aTx - b ≤ 0
Examples of Convex Sets t x2 x1 Second-order Cone ||x|| ≤ t
Operations that Preserve Convexity Intersection Polyhedron / Polytope
Operations that Preserve Convexity Intersection
Operations that Preserve Convexity Affine Transformation x Ax + b
Convex Function f(x) x1 x2 x Blue point always lies above red point
Convex Function f(x) x1 x2 x f( c x1 + (1 - c) x2 ) ≤ c f(x1) + (1 - c) f(x2) Domain of f(.) has to be convex
Convex Function f(x) x1 x2 x f( c x1 + (1 - c) x2 ) ≤ c f(x1) + (1 - c) f(x2) -f(.) is concave
Convex Function Once-differentiable functions f(y) + f(y)T (x - y) ≤ f(x) f(x) (y,f(y)) f(y) + f(y)T (x - y) x Twice-differentiable functions 2f(x) 0
Convex Function and Convex Sets f(x) x Epigraph of a convex function is a convex set
Examples of Convex Functions Linear function aTx p-Norm functions (x1p + x2p + xnp)1/p, p ≥ 1 Quadratic functions xTQx Q 0
Operations that Preserve Convexity Non-negative weighted sum f1(x) f2(x) + w2 + …. w1 x x xTQx + aTx + b Q 0
Operations that Preserve Convexity Pointwise maximum f1(x) f2(x) , max x x Pointwise minimum of concave functions is concave
Convex Optimization min f0(x) Objective function s.t. fi(x) ≤ 0 Inequality constraints hi(x) = 0 Equality constraints Feasible region is convex Objective function is convex
PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details
Lagrangian min f0(x) s.t. fi(x) ≤ 0 hi(x) = 0 f0(x) L(x,,) + ∑i i fi(x) i ≥ 0 + ∑i i hi(x)
Lagrangian Dual f0(x) L(x,,) + ∑i i fi(x) i ≥ 0 + ∑i i hi(x) g(,) minx L(x,,) x belongs to intersection of domains of f0, fi and hi x D
Lagrangian Dual g(,) = f0(x) minx + ∑i i fi(x) i ≥ 0 + ∑i i hi(x) Pointwise minimum of affine (concave) functions Dual function is concave
Lagrangian Dual p* = min f0(x) ≥ s.t. fi(x) ≤ 0 For all (,) hi(x) = 0 g(,) = f0(x) minx + ∑i i fi(x) i ≥ 0 + ∑i i hi(x)
The Dual Problem The lower bound could be far from p* Best lower bound? Easy to obtain d* = max, f0(x) minx + ∑i i fi(x) i ≥ 0 + ∑i i hi(x) p* - d* ≥ 0 Duality Gap
G u The Geometric Interpretation u v t (fi(x), hi(x), f0(x)) G x D t p*
The Geometric Interpretation (, ,1)T (u, v, t) ≥ g(, ) t G p* d* u g()
The Duality Gap p* = min f0(x) ≥ s.t. fi(x) ≤ 0 hi(x) = 0 d* = f0(x) + ∑i i fi(x) i ≥ 0 + ∑i i hi(x) max, minx
The Duality Gap p* - d* Duality Gap p* - d* ≥ 0 Weak Duality p* - d* = 0 Strong Duality
Strong Duality Problem is convex There exists a strictly feasible point Taken care of by most solvers Slater’s Condition
At Strong Duality f0(x*) = g(*, *) = minx ( f0(x) + ∑i i*fi(x) + ∑ii*hi(x)) ≤ f0(x*) + ∑i i*fi(x*) + ∑ii*hi(x*) ≤ f0(x*) Inequalities hold with equality x* minimizes the Lagrangian at (*, *)
At Strong Duality f0(x*) = g(*, *) = minx ( f0(x) + ∑i i*fi(x) + ∑ii*hi(x)) ≤ f0(x*) + ∑i i*fi(x*) + ∑ii*hi(x*) ≤ f0(x*) Inequalities hold with equality i*fi(x*) = 0
KKT Conditions fi(x*) ≤ 0 hi(x*) = 0 Primal feasible i* ≥ 0 Dual feasible i*fi(x*) = 0 Complementary Slackness f0(x*) + ∑i i*fi(x*) + ∑i i*hi(x*) = 0 Necessary conditions for strong duality
KKT Conditions fi(x*) ≤ 0 hi(x*) = 0 Primal feasible i* ≥ 0 Dual feasible i*fi(x*) = 0 Complementary Slackness f0(x*) + ∑i i*fi(x*) + ∑i i*hi(x*) = 0 Necessary and sufficient for convex problems
PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details
Linear Program min cTx s.t. Ax = b x ≥ 0
QCQP min (1/2)xTP0x + q0x + r0 s.t. (1/2)xTPix + qix + ri
Entropy Maximization min ∑i xi log(xi) s.t. Ax ≤ b ∑i xi= 1
2/||w|| The SVM Framework wTx + b = 0 min 1/2 wTw + C i yi (wTxi + b) ≥ 1 - i i ≥ 0 Points X = {xi} Convex Quadratic Program Labels y= {yi} yi {-1, +1}
The SVM Dual min (1/2) TQ - T1 s.t. Ty = 0 0 ≤ ≤ C1 Qij = yiyjxiTxj = yiyj k(xi,xj)
PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details
The SVM Dual min (1/2) TQ - T1 s.t. Ty = 0 0 ≤ ≤ C1 Choose ‘q’ variables. Fix the rest. Best set B? Change unfixed variables, satisfying constraints, to decrease objective function (small problem). Repeat. Minimum ‘q’ ??? Till When ???
KKT Conditions min (1/2) TQ - T1 s.t. Ty = 0 0 ≤ ≤ C1 eq iup ilo g() -1 + Q + eqy - lo + up = 0 ilo i = 0 iup (i - C) = 0 ilo ≥ 0 iup ≥ 0
KKT Conditions -1 + g() + eqy - lo + up = 0 ilo i = 0 iup (i - C) = 0 ilo ≥ 0 iup ≥ 0 For all 0 < i < C -1 + gi() + eqyi = 0 For all i = 0 -1 + gi() + eqyi - ilo = 0 For all i = C -1 + gi() + eqyi + iup = 0
KKT Conditions -1 + g() + eqy - lo + up = 0 ilo i = 0 iup (i - C) = 0 ilo ≥ 0 iup ≥ 0 gi() = yi∑j jyj k(xi,xj) git() = gi(t-1) + yi∑j B (jt - jt-1)yj k(xi,xj) Best set of ‘q’ variables (Working set)