150 likes | 499 Views
Expectation Maximization for GMM. Comp344 Tutorial Kai Zhang. GMM. Model the data distribution by a combination of Gaussian functions Given a set of sample points, how to estimate the parameters of the GMM?. EM Basic Idea. Given data X, and initial parameter Θ t Assume a hidden variable Y
E N D
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang
GMM • Model the data distribution by a combination of Gaussian functions • Given a set of sample points, how to estimate the parameters of the GMM?
EM Basic Idea • Given data X, and initial parameter Θt • Assume a hidden variable Y • 1. Study how Y is distributed based on current knowledge (X and Θt), i.e., p(Y|X, Θt) • Compute the expectation of the joint data likelihood under this distribution (called Q function) • 2. Maximize this expectation w.r.t. the to-be-determined parameter Θt+1 • Iterate step 1 and 2 until convergence
EM with GMM • In the context of GMM • X: data points • Y: which Gaussian creates which data points • Θ:parameters of the mixture model • Constraint: Pk’s must sum up to 1, so that p(x) is a pdf
How to write the Q function under GMM setting • Likelihood of a data set is the multiplication of all the sample likelihood, so
The Q function specific for GMM is • Plug in the definition of p(x|Θk), compute derivative w.r.t. the parameters, we obtain the iteration procedures • E step • M step
Posteriors • Intuitive meaning of • The posterior probability that xi is created by the kth Gaussian component (soft membership) • The meaning of • Note that it is the summation of all having the same k • So it means the strength of the kth Gaussian component
Comments • GMM can be deemed as performing a • density estimation, in the form of a combination of a number of Gaussian functions • clustering, where clusters correspond to the Gaussian component, and cluster assignment can be achieved through the bayes rule • GMM produces exactly what are needed in the Bayes decision rule: prior probability and class conditional probability • So GMM+Bayes rule can compute posterior probability, hence solving clustering problem
Initialization • Perform an initial clustering and divide the data into m clusters (e.g., simply cut one dimension into m segments) • For the kth cluster • Its mean is the kth Gaussian component mean (μk) • Its covariance is the kth Gaussian component covariance (Σk) • The portion of samples is the Prior for the kth Gaussian component (pk)