1 / 44

EM Algorithm: Expectation Maximazation Clustering Algorithm book: “ DataMining, Morgan Kaufmann, Frank ”

EM Algorithm: Expectation Maximazation Clustering Algorithm book: “ DataMining, Morgan Kaufmann, Frank ”. DataMining, Morgan Kaufmann, p218-227 Mining Lab. 김완섭 2004 년 10 월 27 일. Content. Clustering K-Means via EM Mixture Model EM Algorithm Simple examples of EM EM Application; WEKA

mili
Download Presentation

EM Algorithm: Expectation Maximazation Clustering Algorithm book: “ DataMining, Morgan Kaufmann, Frank ”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EM Algorithm:Expectation MaximazationClustering Algorithmbook: “DataMining, Morgan Kaufmann, Frank” DataMining, Morgan Kaufmann, p218-227 Mining Lab. 김완섭 2004년 10월 27일

  2. Content • Clustering • K-Means via EM • Mixture Model • EM Algorithm • Simple examples of EM • EM Application; WEKA • References

  3. Clustering (1/2) • Clustering ? • Clustering algorithms divide a data set into natural groups (clusters). • Instances in the same cluster are similar to each other, they share certain properties. • e.g Customer Segmentation. • Clustering vs. Classification • Supervised Learning • Unsupervised Learning • Not target variable to be predicted.

  4. Clustering (2/2) • Categorization of Clustering Methods • Partitioning mehtods • K-Means / K-medoids / PAM / CRARA / CRARANS • Hierachical methods • CURE / CHAMELON / BIRCH • Density-based methods • DBSCAN / OPTICS • Grid-based methods • STING / CLIQUE / Wave-Cluster • Model-based methods • EM / COBWEB / Bayesian / Neural Model-Based Clustering Probability-based Clustering Statistical Clustering

  5. K-Means (1)Algorithm • Step 0 : • Select K objects as initial centroids. • Step 1 : (Assignment) • For each object compute distances to k centroids. • Assign each object to the cluster to which it is the closest. • Step 2 : (New Centroids) • Compute a new centroid for each cluster. • Step 3: (Converage) • Stop if the change in the centroids is less than the selected covergence criterion. • Otherwise repeat Step 1.

  6. K-Means (2)simple example Input Data New Centroids & (Check) Random Centroids Assignment New Centroids & (check) Assignment Centroids & (check) Assignment

  7. K-Means (3)weakness on outlier (noise)

  8. K-Means (4)Calculation 1. (4,4), (3,4) 0. (4,4), (3,4) (4,2), (0,2), (1,1), (1,0) (100, 0) (4,2), (0,2), (1,1), (1,0) 1. 1)<3.5, 4> <21, 1> 1. 1) <3.5, 4> <1.5, 1.25> 2)<3.5, 4> - (0,2), (1,1), (1,0),(3,4),(4,4),(4,2) <21, 1> - (100,1) 2) <3.5, 4> - (3, 4), (4, 4), (4, 2) <1.5, 1.25> - (0, 2) (1, 1), (1, 0) 2. 1)<2.1, 2.1> <100, 0> 2. 2) <3.67, 3.3> <0.67, 1> 2)<2.1, 2.1> - (0, 2),(1,1),(1,0),(3,4),(4,4),(4,2) <100, 1> - (100, 1) 3) <3.67, 3.3> - (3, 4), (4, 4), (4, 2) <0.67, 1> - (0, 2) (1, 1), (1, 0)

  9. K-Means (5)comparison with EM C1 • K-Means • Hard Clustering. • A instance belong to only one Cluster. • Based on Euclidean distance. • Not Robust on outlier, value range. • EM • Soft Clustering. • A instance belong to several clusters with membership probability. • Based on density probability. • Can handle both numeric and nominal attributes. I C2 C1 0.7 0.3 I C2

  10. Mixture Model (1) • A Mixture is a set of k probability distributions, repesenting k clusters. • A probability distribution have mean and variances. • The mixture model combines several normal distributions.

  11. Mixture Model (2) • Only one numeric attribute • five parameter

  12. Mixture Model (3) Simple Example • Probability that an instance x belongs to cluster A Probability Density Function

  13. Mixture Model (4)Probability Density Function • Normal Distribution • Gaussian Density Function • Poisson Distribution

  14. Mixture Model (5)Probability Density Function • Iteration Iteration

  15. EM Algorithm (1) • Step 1. (Initialization) • Random probability • Step 2. (Maximization Step) • Re-create cluster model • Re-compute the parameter Θ(mean, variance) • normal distribution. • Step 3. (Expectation Step) • Update record’s weight • Step 4. • Calculate log-likelihood • If the value saturates, exit • If not, Go to Step 2. Parameter Adjustment Weight Adjustment

  16. EM Algorithm (2)Initialization • Random Probability • M-Step • Example

  17. EM Algorithm (3)M-Step : Parameter (Mean, Dev) • Estimating parameters from weighted instances • Parameters • means, deviations.

  18. EM Algorithm (3)M-Step : Parameter (Mean, Dev)

  19. EM Algorithm (4)E-Step : Weight • compute weight • here

  20. EM Algorithm (5)E-Step : Weight • compute weight • here

  21. EM Algorithm (6)Objective Function (check) • Log-likelihood Function • For all instances, it’s probability belong to cluster A, • Use log for analysis 1-Dimensional data 2-Cluster A,B N-Dimensional data K-cluster - Mean vector - Covariance matrix

  22. EM Algorithm (7)Objective Function (check) - Covariance Matrix - Mean Vector

  23. EM Algorithm (8)Termination • Termination • Procedure stops when log-likelihood saturates. Q4 Q3 Q2 Q1 Q0 # of Iteration

  24. EM Algorithm (1)Simple Data • EM example • 6 data (3 sample per 1 class) • 2 class (circle, rectangle)

  25. EM Algorithm (2) Likelihood function of two component means Θ1, Θ2

  26. EM Algorithm (3)

  27. EM Example (1) • Example dataset • 2 Column(Math, English), 6 record

  28. EM Example (2) • Distri. Of Math • mean : 56.67 • variance : 776.73 • Distri. Of Eng • mean : 82.5 • variance : 197.50 100 50 0 100 50

  29. EM Example (3) • Random Cluster Weight

  30. Iteration 1 EM Example (4) Maximization Step (parameter adjustment)

  31. EM Example (4)

  32. Iteration 2 EM Example (5) Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)

  33. Iteration 3 EM Example (6) Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)

  34. Iteration 3 EM Example (6) Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)

  35. EM Application (1)Weka • Weka • Waikato University in Newzealand • Open Source Mining Tool • http://www.cs.waikato.ac.nz/ml/weka • Experiment Data • Iris data • Real Data • Department Customer Data • Modified Customer Data

  36. EM Application (2)IRIS Data • Data Info • Attribute Information: • sepal length in cm / sepal width / petal length / petal width in cm • class : Iris Setosa / Iris Versicolour / Iris Virginica

  37. EM Application (3)IRIS Data

  38. EM Application (4)Weka Usage • Weka Clustering Packages • Command line Execution • GUI Execution Weka.clusterers Java weka.clusterers.EM –t iris.arff –N 2 Java weka.clusterers.EM –t iris.arff –N 2 -V Java –jar weka.jar

  39. EM Application (4)Weka Usage • Options for clustering in weka

  40. EM Application (5)Weka usage

  41. EM Application (5)Weka usage – input file format % Summary Statistics: % Min Max Mean SD Class Correlation % sepal length: 4.3 7.9 5.84 0.83 0.7826 % sepal width: 2.0 4.4 3.05 0.43 -0.4194 % petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) % petal width: 0.1 2.5 1.20 0.76 0.9565 (high!) @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa

  42. EM Application (6)Weka usage – output format Number of clusters: 3 Cluster: 0 Prior probability: 0.3333 Attribute: sepallength Normal Distribution. Mean = 5.006 StdDev = 0.3489 Attribute: sepalwidth Normal Distribution. Mean = 3.418 StdDev = 0.3772 Attribute: petallength Normal Distribution. Mean = 1.464 StdDev = 0.1718 Attribute: petalwidth Normal Distribution. Mean = 0.244 StdDev = 0.1061 Attribute: class Discrete Estimator. Counts = 51 1 1 (Total = 53) 0 50 ( 33%) 1 48 ( 32%) 2 52 ( 35%) Log likelihood: -2.21138

  43. EM Application (6)Result Visualization

  44. References • DataMining • Morgan Cauffmann. IAN H. p218-p255. • DataMining, Concepts and Techiques. • Jiawei Han. Chapter 8. • The Expectation Maximization Algorithm • Frank Dellaert, Febrary 2002. • A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models.

More Related