1 / 30

Understanding Vapnik-Chervonenkis Dimension: Definition and Lower Bound

Explore the Vapnik-Chervonenkis dimension in PAC learning models, its definition, lower bounds, and applications in learning theory, geometry, and more. Learn about the projection concept and practical examples.

szimmerman
Download Presentation

Understanding Vapnik-Chervonenkis Dimension: Definition and Lower Bound

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour

  2. PAC Learning model • There exists a distribution D over domain X • Examples: <x, c(x)> • use c for target function (rather than ct) • Goal: • With high probability (1-d) • find h in H such that • error(h,c ) < e • e arbitrarily small.

  3. VC: Motivation • Handle infinite classes. • VC-dim “replaces” finite class size. • Previous lecture (on PAC): • specific examples • rectangle. • interval. • Goal: develop a general methodology.

  4. The VC Dimension • C collection of subsets of universe U • VC(C) = VC dimension of C: • size of largest subset T Ushattered by C • T shattered if every subset T’T expressible as • T (an element of C) • Example: • C = {{a}, {a, c}, {a, b, c}, {b, c}, {b}} • VC(C) = 2 {b, c} shattered by C • Plays important role in learning theory, finite automata, comparability theory, computational geometry

  5. Definitions: Projection • Given a concept c over X • associate it with a set (all positive examples) • Projection (sets) • For a concept class C and subset S • PC(S) = { c  S | c  C} • Projection (vectors) • For a concept class C and S = {x1, … , xm} • PC(S) = {<c(x1), … , cxm)> | c  C}

  6. Definition: VC-dim • Clearly |PC(S) |  2m • C shatters S if |PC(S) | =2m (S is shattered by C) • VC dimension of a class C: • The size d of the largest set S that shatters C. • Can be infinite. • For a finite class C • VC-dim(C)  log |C|

  7. Example S is Shattered by C VC: A combinatorial measure of a function class complexity

  8. Calculating VC dimensionality • The VC dimension is at least d if there exists some sample |S| = d which is shattered by C. • This does not mean that all samples of size d are shattered by C. (Three point on a single line in 2d) • Conversely, in order to show that the VC dimension is at most d, one must show that no sample of size d + 1 is shattered. • Naturally, proving an upper bound is more difficult than proving the lower bound on the VC dimension.

  9. Example 1: Interval 1 0 C1={cz | z  [0,1] } cz(x) = 1  x  z

  10. Example 2: line C2={cw | w=(a,b,c) } cw(x,y) = 1  ax+by  c

  11. Line: Hyperplane VC dim > 3

  12. VC dim < 44 points can not be shattered

  13. Example 3: Parallel Rectangle

  14. VC Dim of Rectangles

  15. Example 4: Finite union of intervalsAny set of points can be covered Thus VC dim =

  16. Example 5 : Parity • n Boolean input variables • T  {1, …, n} • fT(x) = iT xi • Lower bound: n unit vectors • Upper bound • Number of concepts • Linear dependency

  17. Example 6: OR • n Boolean input variables • Pand N subsets {1, …, n} • fP,N(x) = ( iP xi)  ( iN  xi) • Lower bound: n unit vectors • Upper bound • Trivial 2n • Use ELIM (get n+1) • Show second vector removes 2 (get n)

  18. Example 7: Convex polygons

  19. Example 7: Convex polygons

  20. Example 8: Hyper-plane C8={cw,c | wd} cw,c(x) = 1  <w,x>  c • VC-dim(C8) = d+1 • Lower bound • unit vectors and zero vector • Upper bound!

  21. Complexity Questions • Given C, compute VC(C) • since VC(C)  log |C|, can compute in O(nlog n) time • (Linial-Mansour-Rivest 88) • probably can’t do better: problem is LOG NP-complete • (Papadimitriou-Yannakakis 96) • Often C has a small implicit representation: • C(i, x) is a polynomial-size circuit such that • C(i, x) = 1 iff x belongs to set i • implicit version is 3-complete(Schaefer 99) • (as hard as abc (a, b, c) for CNF formula )

  22. Sampling Lemma Lemma: Let W X be chosen randomly such that |W| ε|X|. A set of O(1/ε ln(1/δ)) points sampled independently and uniformly at random from X intersects W with probability at least (1- δ) Proof: Any sample x is in W with probability at leastε. Thus, the probability that all samples do not intersect with W is at most δ:

  23. ε-Net Theorem Theorem: Let VC-dimension of (X,C) be d 2 and 0 ε ½. ε-net for (X,C) of size at most O(d/ε ln(1/ε)). If we choose O(d/ε ln(d/ε) + 1/ε ln(1/δ)) points at random from X, then the resulting set N is an ε-net with probability δ. Exercise 3, Submission next week A polynomial bound on the sample size for PAC learning

  24. Radon Theorem • Definitions: • Convex set. • Convex hull: conv(S) • Theorem: • Let T be a set of d+2 points in Rd • There exists a subset S of T such that • conv(S)  conv(T \ S)  • Proof!

  25. Hyper-plane: Finishing the proof • Assume d+2 points T can be shattered. • Use Radon Theorem to find S such that • conv(S)  conv(T \ S)  • Assign point in S label 1 • points not in S label 0 • There is a separating hyper-plane • How will it label conv(S)  conv(T \ S)

  26. Lower bounds: Setting • Static learning algorithm: • asks for a sample S of size m(e,d) • Based on S selects a hypothesis

  27. Lower bounds: Setting • Theorem: • if VC-dim(C) = then C is not learnable. • Proof: • Let m = m(0.1,0.1) • Find 2m points which are shattered (set T) • Let D be the uniform distribution on T • Set ct(xi)=1 with probability ½. • Expected error ¼. • Finish proof!

  28. Lower Bound: Feasible • Theorem • VC-dim(C)=d+1, then m(e,d)=W(d/e) • Proof: • Let T be a set of d+1 points which is shattered. • D samples: • z0 with prob. 1-8e • zi with prob. 8e/d

  29. Continue • Set ct(z0)=1 and ct(zi)=1 with probability ½ • Expected error 2e • Bound confidence • for accuracy e

  30. Lower Bound: Non-Feasible • Theorem • For two hypoth. m(e,d)=W((log 1/d)/e2) • Proof: • Let H={h0, h1}, where hb(x)=b • Two distributions: • D0: Prob. <x,1> is ½ - g and <y,0> is ½ + g • D1: Prob. <x,1> is ½ + g and <y,0> is ½ - g

More Related