1 / 18

Semi-Supervised Classification by Low Density Separation

Semi-Supervised Classification by Low Density Separation. Olivier Chapelle, Alexander Zien. Student: Ran Chang. Introduction. Goal of semi-supervised classification: Use unlabeled data to improve the generalization Cluster assumption:

ayala
Download Presentation

Semi-Supervised Classification by Low Density Separation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang

  2. Introduction • Goal of semi-supervised classification: Use unlabeled data to improve the generalization • Cluster assumption: The decision boundary should not cross high density regions, but instead lie in low density regions

  3. Algorithm • N labeled data points • M unlabeled data points • Labels

  4. Algorithm (cont): Graph-based similarities

  5. Graph-based similarities (cont) • Principle: Assign low similarities to pairs of points that lie in different clusters • If two points in same cluster: exit a continuous connecting curve that only goes through regions of high density • If two points in different clusters: such curve has to traverse a density valley. • Definition of similarity of 2 points: maximizing over all continuous connecting curves the minimum density along the connection

  6. Graph-based similarities (cont) • Build nearest neighbor graph G from all (labeled and unlabeled) data. • Compute the n x (n + m) distance matrix of minimal -path distances according to from all labeled points to all points

  7. Graph-based similarities (cont) 3. Perform a non-linear transformation on to get kernel K 4. Train a SVM with K and predict

  8. Graph-based similarities (cont) • Usage of p: the accuracy of this approximation depends on the value of the softening parameter p: for p -> 0, the direct connection is always shortest, so that every deletion of an edge can cause the corresponding distance to increase; for p->infinity, shortest paths almost never contain any long edge, so that edges can safely be deleted. • For large values of p, the distance between points in the same cluster are decreased; in contrast, the distances between points from different clusters are still dominated by the gaps between the clusters.

  9. Transductive Support Vector Machine ( TSVM )

  10. Gradient TSVM • The last term make this problem non-convex and it is not differentiable. So we replace it by

  11. Gradient TSVM (cont)

  12. Gradient TSVM (cont) • initially set C* to a small value and increase it exponentionally to C • The choice of setting the final value of C* to C is somewhat arbitrary. Ideally, it would be preferable to consider this value as a free parameter of the algorithm.

  13. Multidimensional Scaling (MDS) • Reason: The derived kernel is not positive definite. • Goal: Find a Euclidean embedding of before applying Gradient TSVM.

  14. Parameters

  15. Low Density Separation (LDS)

  16. Experiment • Data Sets • g50c and g10n are from two standard normal multi-variant Gaussians. • g50c: the labels correspond to the Gaussians, and the means are located in 50-dimensional space such that the Bayes error is 5% • Similarly, g10n is in 10 dimensions • Coil20:gray-scale images of 20 different objects taken from different angles, in steps of 5 degrees • Text: the classes mac and mswindows of the Newsgroup20 dataset preprocessed. • Uspst: data part of the well-known USPS data on handwritten digit recognition.

  17. Experiment parameters and results

  18. Appendix (Dijkstra algorithm) • Dijkstra's algorithm is known to be a good algorithm to find a shortest path. • Set i=0, S0= {u0=s}, L(u0)=0, and L(v)=infinity for v <> u0. If |V| = 1 then stop, otherwise go to step 2. • For each v in V\Si, replace L(v) by min{L(v), L(ui)+dvui}. If L(v) is replaced, put a label (L(v), ui) on v. • Find a vertex v which minimizes {L(v): v in V\Si}, say ui+1. • Let Si+1 = Si cup {ui+1}. • Replace i by i+1. If i=|V|-1 then stop, otherwise go to step 2.

More Related