1 / 37

Fuzzy Support Vector Machines (FSVM s )

Fuzzy Support Vector Machines (FSVM s ). Weijia Wang , Huanren Zhang , Vijendra Purohit , Aditi Gupta. Outline. Review of SVMs Formalization of FSVMs Training algorithm for FSVMs Noisy distribution model Determination of heuristic function Experiment results. SVM – brief review.

edan
Download Presentation

Fuzzy Support Vector Machines (FSVM s )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta

  2. Outline • Review of SVMs • Formalization of FSVMs • Training algorithm for FSVMs • Noisy distribution model • Determination of heuristic function • Experiment results

  3. SVM – brief review • Classification technique • Method: • Maps points into high-dimensional feature space • Finds a separating hyperplane that maximizes the margin

  4. Each point belongs to one of the two classes, Let be feature space vector, with mapping from to feature space Set S of labeled training points: Then equation of hyperplane: For linearly separable data, Optimization problem: Subject to

  5. For non-linearly separable data (soft margin), introduce slack variables -> some measure of amount of misclassifications Optimization problem: Limitation: All training points are treated equal

  6. Fuzzy membership: si : how much point xi belongs to one class (amount of meaningful information in the data point) : amount of noise in the data point FSVM – Fuzzy SVM • each training point belongs exactly to no more than one class • some training points are more important than others- these meaningful data points must be classified correctly (even if some noisy, less important points, are misclassified).

  7. Set S of labeled training points: Optimization problem: - Regularization constant large C -> narrower margin, less misclassifications

  8. Taking derivatives: Lagrange function:

  9. Optimization problem: Kuhn-Tucker conditions : λ – lagrange multiplier g(x) – inequality constraint

  10. Points with are support vectors (lie on red boundary). Two types of support vectors: lies on margin of hyperplane misclassified if > 1 => Points with same could be different types of support vectors in FSVM due to => SVM – one free parameter (C) FSVM - number of free params = C, si (~ number of training points) 

  11. Training algorithm for FSVMs • Objective function for optimization • Minimization of the error function • Maximization of the margin • The balance is controlled by tuning C

  12. Selection of error function • Least absolute value in SVMs • Least square value in LS-SVMs • Suykens and Vanewalle, 1999 • the QP is transformed to solving a linear system • the support values are mostly nonzero

  13. Selection of error function • maximum likelihood method • when the underlying error probability can be estimated • optimization problem becomes

  14. Maximum likelihood error • limitation • the precision of estimation of hyperplane depends on estimation of error function • the estimation of error is reliable only when the underlying hyperplane is well estimated

  15. Selection of error function • Weighted least absolute value • each data is associated with a cost or importance factor • when the noise distribution model of data given • px(x) is the probability that point x is not a noise • optimization becomes

  16. Weighted least absolute value • Relation with FSVMs take px(x) as a fuzzy membership, i.e px(x) = s

  17. Selection of max margin term • Generalized optimal plane (GOP)‏ • Robust linear programming(RLP)‏

  18. Implementation of NDM • Goal • build a probability distribution model for data • Ingredients • a heuristic function: highly relevant to px(x) • confident factor: hC • trashy factor: hT

  19. Density function for data

  20. Density function for data

  21. Heuristic function • Kernel-target alignment • K-nearest neighbors Basic idea: Outliers have higher probability to be noise

  22. Kernel-target alignment Measurement of how likely the point xi is noise.

  23. K-nearest neighbors: example Gaussian kernel can be written as the cosine of the angel between two vectors in the feature space

  24. The outlier data point xi will have smaller value of fK(xi,yi) • Use fK(x,y) as a heuristic function h(x)

  25. K-nearest neighbors (k-NN) • For each xi, the set Sikconsists k nearest neighbors of xi • niis the number of data points in the set Sik that the class label is the same as the class label of data pointxi • Heuristic function h(xi)=ni

  26. Comparison of two heuristic function • Kernel-target alignment • Operate in the feature space, use the information of all data points to determine the heuristic for one point • k-NN • Operate in the original space, use the information of k data points to determine the heuristic for one point • How about combine them two?!

  27. Overall Procedure for FSVMs • 1. Use SVM algorithm to get the optimal kernel parameters and the regularization parameter C • 2. Fix the kernel parameters and the regularization parameter C, determine heuristic function h(x), and use exhaustive search to choose the confident factor hc and trashy factor hT, mapping degree d and the fuzzy membership lower bound σ

  28. Experiments Data with time property

  29. FSVM results for data with time property SVM results for data with time property

  30. Experiments Two classes with different weighting

  31. Results from FSVM Results from SVM

  32. Experiments Using class center to reduce effect of outliers.

  33. Results from FSVM Results from SVM

  34. Experiments (setting fuzzy membership) Kernel Target Alignment Two step strategy Fix fUBk and fLBk as following: fUBk = maxi fk (xi, yi) and fLBk = mini fk (xi, yi) Find σ and d using a two-dimensional search. Now, find fUBk and fLBk

  35. Experiments (setting fuzzy membership) k-Nearest Neighbor Perform a two-dimensional search for parameters σ and k. kUB = k/2 and d=1 are fixed.

  36. Experiments Comparison of results from KTA and k-NN with other classifiers (Test Errors)

  37. Conclusion FSVMs work well when the average training error is high, which means it can improve performance of SVMs for noisy data. No. of free parameters for FSVMs is very high C, si for each data point. Results using KTA and k-NN are similar but KTA is more complicated and takes more time to find optimal values of parameters. This papers studies FSVMs only for two classes, multi-class scenarios are not explored.

More Related