Exploring learning with labeled and unlabeled data: Graph Mincuts

Learning from Labeled and Unlabeled Data using Graph Mincuts Avrim Blum and Shuchi Chawla May 24, 2001

Utilizing unlabeled data • Cheap and available in large amounts • Gives no obvious information about classification • Gives information about distribution of examples • Useful with a prior • Our prior: ‘close’ examples have a similar classification

+ - Mincut Classification using Graph Mincut

Why not nearest neighbor?

Why not nearest neighbor? Classification by 1-nearest neighbor

Why not nearest neighbor? Classification by Graph Mincut

Self-consistent classification • Mincut minimizes leave-one-out cross validation error of nearest neighbor • May not be the best classification • But, theoretically interesting!

Assigning edge weights • Several approaches: • Decreasing function in distance eg. Exponential decrease with appropriate slope • Unit weights but connect only ‘nearby’ nodes How near is ‘near’? • Connect every node to k-nearest nodes What is a good value of k? • Need an appropriate distance metric

How near is ‘near’? • All pairs within  distance are connected • Need a method of finding a ‘good’  • As  increases, cut value increases • Cut value = 0  supposedly no-error situation (Mincut- 0)

Mincut- 0 does not allow for noise in the dataset • Allow longer distance dependencies • Grow  till the graph becomes sufficiently well connected • Growing till the largest component contains half the nodes seems to work well (Mincut- ½ )

Other ‘hacks’ • Weigh edges to labeled and unlabeled examples differently • Weigh different attributes differently eg. Use information gain as in decision trees • Weigh edges to positive and negative example differently: for a more balanced cut • Use mincut value as an indicator of performance

Some results

Exploring learning with labeled and unlabeled data: Graph Mincuts

Exploring learning with labeled and unlabeled data: Graph Mincuts

Presentation Transcript

Self-taught Learning Transfer Learning from Unlabeled Data

Text Classification from Labeled and Unlabeled Documents using EM

Learning using Graph Mincuts

Stochastic Unsupervised Learning on Unlabeled Data

Clustering tagged documents with labeled and unlabeled documents

Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

Semi-Supervised Learning Using Randomized Mincuts

Learning from Positive and Unlabeled Examples

Combining Labeled and Unlabeled Data for Multiclass Text Categorization

Learning with Ambiguously Labeled Training Data

Learning from labelled and unlabeled data

Motivation : Graph on labeled and unlabeled data W; Laplacian

Text Classification from Labeled and Unlabeled Documents using EM

Improving the Graph Mincut Approach to Learning from Labeled and Unlabeled Examples

Learning from Partially Labeled Data

Incorporating Unlabeled Data in the Learning Process

Text Classification from Labeled and Unlabeled Documents using EM

Incorporating Unlabeled Data in the Learning Process

A Theoretical Model for Learning from Labeled and Unlabeled Data

Semi-Supervised Learning Using Randomized Mincuts