120 likes | 145 Views
Learning from Labeled and Unlabeled Data using Graph Mincuts. Avrim Blum and Shuchi Chawla May 24, 2001. Utilizing unlabeled data. Cheap and available in large amounts Gives no obvious information about classification Gives information about distribution of examples Useful with a prior
E N D
Learning from Labeled and Unlabeled Data using Graph Mincuts Avrim Blum and Shuchi Chawla May 24, 2001
Utilizing unlabeled data • Cheap and available in large amounts • Gives no obvious information about classification • Gives information about distribution of examples • Useful with a prior • Our prior: ‘close’ examples have a similar classification
+ - Mincut Classification using Graph Mincut
Why not nearest neighbor? Classification by 1-nearest neighbor
Why not nearest neighbor? Classification by Graph Mincut
Self-consistent classification • Mincut minimizes leave-one-out cross validation error of nearest neighbor • May not be the best classification • But, theoretically interesting!
Assigning edge weights • Several approaches: • Decreasing function in distance eg. Exponential decrease with appropriate slope • Unit weights but connect only ‘nearby’ nodes How near is ‘near’? • Connect every node to k-nearest nodes What is a good value of k? • Need an appropriate distance metric
How near is ‘near’? • All pairs within distance are connected • Need a method of finding a ‘good’ • As increases, cut value increases • Cut value = 0 supposedly no-error situation (Mincut- 0)
Mincut- 0 does not allow for noise in the dataset • Allow longer distance dependencies • Grow till the graph becomes sufficiently well connected • Growing till the largest component contains half the nodes seems to work well (Mincut- ½ )
Other ‘hacks’ • Weigh edges to labeled and unlabeled examples differently • Weigh different attributes differently eg. Use information gain as in decision trees • Weigh edges to positive and negative example differently: for a more balanced cut • Use mincut value as an indicator of performance