1 / 29

Support Vector Machine (SVM)

Support Vector Machine (SVM). Based on Nello Cristianini presentation http:// www.support-vector.net/tutorial.html. Basic Idea. Use Linear Learning Machine (LLM). Overcome the linearity constraints: Map to non-linearly to higher dimension. Select between hyperplans Use margin as a test

rowena
Download Presentation

Support Vector Machine (SVM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machine (SVM) Based on Nello Cristianini presentation http://www.support-vector.net/tutorial.html

  2. Basic Idea • Use Linear Learning Machine (LLM). • Overcome the linearity constraints: • Map to non-linearly to higher dimension. • Select between hyperplans • Use margin as a test • Generalization depends on the margin.

  3. General idea Transformed Problem Original Problem

  4. Kernel Based Algorithms • Two separate learning functions • Learning Algorithm: • in an imbedded space • Kernel function • performs the embedding

  5. Basic Example: Kernel Perceptron • Hyperplane classification • f(x)=<w,x>+b = <w’,x’> • h(x)= sign(f(x)) • Perceptron Algorithm: • Sample: (xi,ti), ti{-1,+1} • If ti <wk,xi> < 0 THEN /* Error*/ • wk+1 = wk + ti xi • k=k+1

  6. Recall • Margin of hyperplan w • Mistake bound

  7. Observations • Solution is a linear combination of inputs • w =  ai ti xi • where ai >0 • Mistake driven • Only points on which we make mistake influence! • Support vectors • The non-zero ai

  8. Dual representation • Rewrite basic function: • f(x) = <w,x> +b =  ai ti <xi , x> +b • w =  ai ti xi • Change update rule: • IF tj ( ai ti <xi , xj> +b) < 0 • THEN aj = aj+1 • Observation: • Data only inside inner product!

  9. Limitation of Perceptron • Only linear separations • Only converges for linearly separable data • Only defined on vectorial data

  10. Transformed Problem Original Problem The idea of a Kernel • Embed data to a different space • Possibly higher dimension • Linearly separable in the new space.

  11. Kernel Mapping • Need only to compute inner-products. • Mapping: M(x) • Kernel: K(x,y) = < M(x) , M(y)> • Dimensionality of M(x): unimportant! • Need only to compute K(x,y) • Using it in the embedded space: • Replace <x,y> by K(x,y)

  12. Example x=(x1 , x2); z=(z1 ,z2); K(x,z) = (<x,z>)2

  13. Polynomial Kernel Transformed Problem Original Problem

  14. Kernel Matrix

  15. Example of Basic Kernels • Polynomial • K(x,z)= (<x,z> )d • Gaussian • K(x,z)= exp{- ||x-z||2 /2}

  16. Kernel: Closure Properties • K(x,z) = K1(x,z) + c • K(x,z) = c*K1(x,z) • K(x,z) = K1(x,z) * K2(x,z) • K(x,z) = K1(x,z) + K2(x,z) • Create new kernels using basic ones!

  17. Support Vector Machines • Linear Learning Machines (LLM) • Use dual representation • Work in the kernel induced feature space • f(x) =  ai ti K(xi , x) +b • Which hyperplane to select

  18. Generalization of SVM • PAC theory: • error = O( Vcdim / m) • Problem: Vcdim >> m • No preference between consistent hyperplanes

  19. Margin based bounds • H: Basic Hypothesis class • conv(H): finite convex combinations of H • D: Distribution over X and {+1,-1} • S: Sample of size m over D

  20. Margin based bounds • THEOREM: for every f in conv(H)

  21. Maximal Margin Classifier • Maximizes the margin • Minimizes the overfitting due to margin selection. • Increases margin • Rather than reduce dimensionality

  22. SVM: Support Vectors

  23. Margins • Geometric Margin: mini ti f(xi)/ ||w|| Functional margin: mini ti f(xi) f(x)

  24. Main trick in SVM • Insist on functional marginal at least 1. • Support vectors have margin 1. • Geometric margin = 1 / || w|| • Proof.

  25. SVM criteria • Find a hyperplane (w,b) • That Maximizes: || w ||2 = <w,w> • Subject to: • for all i • ti (<w,xi>+b)  1

  26. Quadratic Programming • Quadratic goal function. • Linear constraint. • Unique Maximum. • Polynomial time algorithms.

  27. Dual Problem • Maximize • W(a) =  ai - 1/2 i,j ai ti aj tj K(xi , xj) +b • Subject to • i ai ti =0 • ai  0

  28. Applications: Text • Classify a text to given categories • Sports, news, business, science, … • Feature space • Bag of words • Huge sparse vector!

  29. Applications: Text • Practicalities: • Mw(x) = tfw log (idfw) / K • ftw= text frequency of w • idfw= inverse document frequency • idfw = # documents / # documents with w • Inner product <M(x),M(z)> • sparse vectors • SVM: finds a hyperplan in “document space”

More Related