1 / 33

Independent Component Analysis: The Fast ICA algorithm

Independent Component Analysis: The Fast ICA algorithm. Jonathan Kam EE 645. Overview. The Problem Definition of ICA Restrictions Ways to solve ICA NonGaussianity Mutual Information Maximum Likelihood Fast ICA algorithm Simulations Conclusion. The Problem. Cocktail Problem

jpardue
Download Presentation

Independent Component Analysis: The Fast ICA algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Independent Component Analysis: The Fast ICA algorithm Jonathan Kam EE 645

  2. Overview • The Problem • Definition of ICA • Restrictions • Ways to solve ICA • NonGaussianity • Mutual Information • Maximum Likelihood • Fast ICA algorithm • Simulations • Conclusion

  3. The Problem • Cocktail Problem • Several Sources • Several Sensors • Ex: Humans hear mixed signal, but able to unmix signals and concentrate on a sole source • Recover source signals given only mixtures • No prior knowledge of sources or mixing matrix • aka Blind Source Separation (BSS)

  4. Assumptions • Source signals are statistically independent • Knowing the value of one of the components does not give any information about the others • ICs have nongaussian distributions • Initial distributions unknown • At most one Gaussian source • Recovered sources can be permutated and scaled

  5. Definition of ICA • Observe N linear mixtures x1,…,xn of n independent components xj = aj1s1 + aj2s2 + … + ajnsn, for all j • aj is the column of the mixing matrix A • Assume each mixture xj and each IC sk is a random variable • Time difference between mixes dropped • Independent components are latent variables • Cannot be directly observed

  6. Definition of ICA • ICA Mixture model: x=As • A is mixing matrix; s is matrix of source signals • Goal • Find some matrix W, so that s = Wx • W = inverse of A

  7. Definition: Independence • Two functions independent if E{h1(y1)h2(y2)} = E{h1(y1)} E{h2(y2)} • If variables are independent, they are uncorrelated • Uncorrelated variables • Defined: E{y1y2} = E{y1} E{y2} = 0 • Uncorrelation doesn’t equal independence • Ex: (0,1),(0,-1),(1,0),(-1,0) • E{y12y22} = 0 ≠ ¼ = E{y12} E{y22} • ICA has to prove independence

  8. ICA restrictions • Cannot determine variances • s and A are unknown • Scalar multipliers on s could be canceled out by a divisor on A • Multiplier could even be -1 • Cannot determine order • Order of terms can changed.

  9. ICA restrictions • At most 1 Gaussian source • x1 and x2 Gaussian, uncorrelated, and unit variance • Density function is completely symmetric • Does not contain info on direction of the columns of the mixing matrix A.

  10. ICA estimation • Nongaussianity estimates independent • Estimation of y = wT x • let z = AT w, so y = wTAs = zTs • y is a linear combination of si, therefore zTs is more gaussian than any of si • zTs becomes least gaussian when it is equal to one of the si • wTx = zTs equals an independent component • Maximizing nongaussianity of wTx gives us one of the independent components • Maximizing nongaussianity by measuring nongaussiantiy • Minimizing mutual information • Maximum Likelihood

  11. Measuring nongaussianity • Kurtosis • Fourth order cumulant • Classical measure of nongaussianity • kurt(y) = E{y4} – 3(E{y2})2 • For gaussian y, fourth moment = 3(E{y2})2 • Kurtosis for gaussian random variables is 0 • Con – not a robust measure of nongaussianity • Sensitive to outliers

  12. Measuring nongaussianity • Entropy (H): degree of information that an observation gives • A Gaussian variable has the largest entropy among all random variables of equal variance • Negentropy J • Based on the information theoretic quantity of differential entropy • Computationally difficult

  13. Negentropy approximations • Classical method using higher-order moments • Validity is limited to nonrobustness of kurtosis

  14. Negentropy approximations • Hyvärinen 1998b: maximum-entropy principle • G is some contrast function • v is a Gaussian variable of zero mean and unit variance • Taking G(y) = y4 makes the equation kurtosis based approximation

  15. Negentropy approximations • Instead of kurtosis function, choose a contrast function G that doesn’t grow too fast Where 1≤a1≤2

  16. Minimizing mutual information • Mutual information I is defined as • Measure of the dependence between random variables • I = 0 if variables are statistically independent • Equivalent to maximizing negentropy

  17. Maximum Likelihood Estimation • Closely related to infomax principle • Infomax (Bell and Sejnowski, 1995) • Maximizing the output entropy of a neural network with non-linear outputs • Densities of ICs must be estimated properly • If estimation is wrong ML will give wrong results

  18. Fast ICA • Preprocessing • Fast ICA algorithm • Maximize non gaussianity • Unmixing signals

  19. Fast ICA: Preprocessing • Centering • Subtract its mean vector to make x a zero-mean variable • ICA algorithm does not need to estimate the mean • Estimate mean vector of s by A-1m, where m is the mean the subtracted mean

  20. Fast ICA: Preprocessing • Whitening • Transform x so that its components are uncorrelated and their variances equal unity • Use eigen-value decomposition (EVD) of the covariance matrix E • D is the diagonal matrix of its eigenvalues • E is the orthogonal matrix of eigenvectors

  21. Fast ICA: Preprocessing • Whitening • Transforms the mixing matrix into Ã. • Makes à orthogonal • Lessons the amount of parameters that have to be estimated from n2 to n(n-1)/2 • In large dimensions an orthogonal matrix contains approximately ½ the number of parameters

  22. Fast ICA Algorithm • One-unit (component) version 1. Choose an initial weight vector w. 2. Let w+ = E{xg(wTx)} – E{g′(wTx)}w • Derivatives of contrast functions G g1(u) = tanh(a1u), g2(u) = u exp (-u2/2) 3. w = w+/||w+||. (Normalization step) 4. If not converged go back to 2 -converged if norm(wnew – wold) > ξ or norm(wold-wnew)> ξ - ξ typically around 0.0001

  23. Fast ICA Algorithm • Several unit algorithm • Define B as mixing matrix and B′ as a matrix whose columns are the previously found columns of B • Add projection step before step 3 • Step 3 becomes 3. Let w(k) = w(k) - B′B′Tw(k). w = w+/||w+||

  24. Simple Simulation • Separation of 2 components • Figure 1: Two independent non gaussian wav samples

  25. Simple Simulation • Figure 2: Mixed signals

  26. Simple Simulation • Recovered signals vs original signals Figure 3: Recovered signals Figure 4: Original signals

  27. Simulation Results • IC 1 recovered in 6 steps and IC 2 recovered in 2 steps • Retested with 20000 samples • Requires approximately the same number of steps

  28. Gaussian Simulation Figure 5: 2 wav samples and noise signal

  29. Gaussian Simulation Figure 6: 3 mixed signals

  30. Gaussian Simulation • Comparison of recovered signals vs original signals Figure 7: Recovered signals Figure 8: Original signal

  31. Gaussian Simulation 2: • Tried with 2 gaussian components • Components were not estimated properly due to more than one Gaussian component Figure 10: Original signals Figure 11: Recovered signals

  32. Conclusion • Fast ICA properties • No step size, unlike gradient based ICA algorithms • Finds any non-Gaussian distribution using any non linear g contrast function. • Components can be estimated one by one • Other Applications • Separation of Artifacts in image data • Find hidden factors in financial data • Reduce noise in natural images • Medical signal processing – fMRI, ECG, EEG (Mackeig)

  33. References • [1] Aapo Hyvärinen and Erkki Oja, Independent Component Analysis: Algorithms and Applications. Neural Networks Research Centre Helsinki University of Technology; Neural Networks, 13 (4-5): 411-430, 2000 • [2] Aapo Hyvärinen and Erkki Oja, A Fast Fixed-Point Algorithm for Independent Component Analysis. Helsinki University of Technology Laboratory of Computer and Information Science, Neural Computation, 9:1483:1492, 1997 • [3] Anthony J. Bell and Terrence J. Sejnowski, The ‘Independent Components’ of Natural Scenes are Edeg Filters. Howard Hughes Medical Institute Computational Neurobiology Laboratory • [4] Te-Won Lee, Mark Girolami, Terrence J. Sejnowski, Independent Component Analysis Using and Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources. 1997 • [5] Antti Leino, Independent Component Analysis An Overview. 2004 • [6] Erik G. Learned-Miller, John W. Fisher III, ICA Using Spacings Estimates of Entropy Journal of Machine Learning Research 4 (2003) 1271-1295.

More Related