1 / 127

Class 8. 24 Feb 2015 Instructor: Bhiksha Raj

Machine Learning for Signal Processing Principal Component Analysis & Independent Component Analysis. Class 8. 24 Feb 2015 Instructor: Bhiksha Raj. Recall: Representing images. The most common element in the image: background Or rather large regions of relatively featureless shading

marypharris
Download Presentation

Class 8. 24 Feb 2015 Instructor: Bhiksha Raj

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning for Signal ProcessingPrincipal Component Analysis &Independent Component Analysis Class 8. 24 Feb 2015 Instructor: Bhiksha Raj 11755/18797

  2. Recall: Representing images • The most common element in the image: background • Or rather large regions of relatively featureless shading • Uniform sequences of numbers 11-755/18-797

  3. “Bases” • “Bases” are the “standard” units such that all instances can be expressed a weighted combinations of these units • Ideal requirements: Bases must be orthogonal • Checkerboards are one choice of bases • Orthogonal • But not “smooth” • Other choices of bases: Complex exponentials, Wavelets, etc.. B1 B2 B3 B4 B5 B6 11-755/18-797

  4. Recap: Data specific bases? • Checkerboards, Complex exponentials, Wavelets are data agnostic.. • We use the same bases regardless of the data we analyze • Image of face vs. Image of a forest • Segment of speech vs. Seismic rumble • How about data specific bases • Bases that consider the underlying data • E.g. is there something better than checkerboards to describe faces • Something better than complex exponentials to describe music? 11-755/18-797

  5. Recap: The Energy Compaction Property • Define “better”? • The description • The ideal: • If the description is terminated at any point, we should still get most of the information about the data • Error should be small 11-755/18-797

  6. A collection of least squares typical faces • Assumption: There are a set of K “typical” faces that captures most of all faces • Approximate every face f as f = wf,1 V1+ wf,2 V2 + wf,3 V3 +.. + wf,k Vk • V = [V1 V2 V3] • Estimate V to minimize the squared error • How? What is V? 11-755/18-797

  7. Abstracting the problem: Finding the FIRST typical face • The problem of finding the first typical face V1:Find the V for which the total projection error is minimum! • This “minimum squared error” V is our “best” first typical face V1 Pixel 2 Pixel 1 11-755/18-797

  8. Formalizing the Problem: Error from approximating a single vector v vvTx • Projection of a vector x on to a vector • Assuming v is of unit length: Approximating: x = wv y x-vvTx x x 11-755/18-797

  9. With multiple bases Represents a K-dimensional subspace V • Projection for a vector • Error vector = • Error length = V=[v1 v2..vK] VVTx y x-VVTx x x 11-755/18-797

  10. With multiple bases V • Error for one vector: • Error for many vectors • Goal: Estimate V to minimize this error! V=[v1 v2..vK] y x 11-755/18-797

  11. The correlation matrix • The encircled term is the correlation matrix X = Data Matrix Correlation = XT = TransposedData Matrix 11-755/18-797

  12. The best “basis” v • The minimum-error basis is found by solving • v is an Eigen vector of the correlation matrix R • l is the corresponding Eigen value y x 11-755/18-797

  13. Minimizing error • With constraint VTV = I, objective to minimize • L is a diagonal matrix • Constraint simply ensures that vTv = 1 • Differentiating w.r.t Vand equating to 0 11-755/18-797

  14. Finding the optimal K bases • Total error = • Select K eigen vectors corresponding to the K largest Eigen values 11-755/18-797

  15. Eigen Faces! • Arrange your input data into a matrix X • Compute the correlation R = XXT • Solve the Eigen decomposition: RV = LV • The Eigen vectors corresponding to the K largest eigen values are our optimal bases • We will refer to these as eigen faces. 11-755/18-797

  16. Energy Compaction: Principle • Find the directions that capture most energy Y X 11-755/18-797

  17. What about Prediction? • Does X predict Y? Y X X1 11-755/18-797

  18. What about Prediction? • Does X predict Y? Y X X1 11-755/18-797

  19. What about Prediction? • Does X predict Y? Y X X1 11-755/18-797

  20. What about Prediction? • Does X predict Y? Y Linear or Affine? X X1 11-755/18-797

  21. Linear vs. Affine • The model we saw • Approximate every face f as f = wf,1 V1+ wf,2 V2 +... + wf,kVk • Linear combination of bases • If you add a constant f = wf,1 V1+ wf,2 V2 +... + wf,kVk + m • Affine combination of bases 11-755/18-797

  22. Estimation with the Constant • Estimate all parameters of • f = wf,1 V1+ wf,2 V2 +... + wf,kVk + m • Parameters: • wf,1,V1,wf,2,V2, wf,k,Vk, m 11755/18797

  23. Problem • f = wf,1 V1+ wf,2 V2 +... + wf,k Vk + m • Find the slope of the line • Find the projection of points on the line • Find intercept m • Problem: Any “m” on the line will work (w’s vary with m) Y Linear or Affine? X X1 11-755/18-797

  24. Proof by assertion • Estimate all parameters of • f = wf,1 V1+ wf,2 V2 +... + wf,kVk + m • The mean of all the vectors “f” will lie on the plane! 11755/18797

  25. Estimation the remaining 11-755/18-797

  26. Estimating the Affine model 11-755/18-797

  27. Estimating the Affine model 11-755/18-797

  28. Properties of the affine model 11-755/18-797

  29. Linear vs. Affine • The model we saw • Approximate every face f as f = wf,1 V1+ wf,2 V2 +... + wf,kVk • TheKarhunenLoeve Transform • Retains maximum Energy for any order k • If you add a constant f = wf,1 V1+ wf,2 V2 +... + wf,kVk + m • Principal Component Analysis • Retains maximum Variance for any order k 11-755/18-797

  30. How do they relate • Relationship between correlation matrix and covariance matrix R = C + mmT • KarhunenLoevebases are Eigen vectors of R • PCA bases are Eigen vectors of C • How do they relate • Not easy to say.. 11-755/18-797

  31. The Eigen vectors • The Eigen vectors of C are the major axes of the ellipsoid Cv, where v are the vectors on the unit sphere 11-755/18-797

  32. The Eigen vectors mmT • The Eigen vectors of Rare the major axes of the ellipsoid Cv + mmTv • Note that mmT has rank 1 and mmTv is a line 11-755/18-797

  33. The Eigen vectors mmT • The principal Eigenvector of Rlies between the principal Eigen vector of C and m • Similarly the principal Eigen value • Similar logic is not easily extendable to the other Eigenvectors, however 11-755/18-797

  34. Eigenvectors • Turns out: Eigenvectors of the correlation matrix represent the major and minor axes of an ellipse centered at the origin which encloses the data most compactly • The SVD of data matrix X uncovers these vectors • KLT Pixel 2 Pixel 1 11-755/18-797

  35. Eigenvectors • Turns out: Eigenvectors of the covariance represent the major and minor axes of an ellipse centered at the mean which encloses the data most compactly • PCA uncovers these vectors • In practice, “Eigen faces” refers to PCA faces, and not KLT faces Pixel 2 Pixel 1 11-755/18-797

  36. What about sound? • Finding Eigen bases for speech signals: • Look like DFT/DCT • Or wavelets • DFTs are pretty good most of the time 11-755/18-797

  37. Eigen Analysis • Can often find surprising features in your data • Trends, relationships, more • Commonly used in recommender systems • An interesting example.. 11-755/18-797

  38. Eigen Analysis • Cheng Liu’s research on pipes.. • SVD automatically separates useful and uninformative features 11-755/18-797

  39. Correlation vs. Causation • The consumption of burgers has gone up steadily in the past decade • In the same period, the penguin population of Antarctica has gone down Correlation, not Causation (unless McDonalds has a top-secret Antarctica division) 11755/18797

  40. The concept of correlation • Two variables are correlated if knowing the value of one gives you information about the expected value of the other penguins Penguin population burgers Burger consumption Time 11755/18797

  41. The statistical concept of correlatedness • Two variables X and Y are correlated if If knowing X gives you an expected value of Y • X and Y are uncorrelated if knowing X tells you nothing about the expected value of Y • Although it could give you other information • How? 11755/18797

  42. A brief review of basic probability • Uncorrelated: Two random variables X and Y are uncorrelated iff: • The average value of the product of the variables equals the product of their individual averages • Setup: Each draw produces one instance of X and one instance of Y • I.e one instance of (X,Y) • E[XY] = E[X]E[Y] • The average value of Y is the same regardless of the value of X 11755/18797

  43. Correlated Variables • Expected value of Y given X: • Find average of Y values of all samples at (or close) to the given X • If this is a function of X, X and Y are correlated P1 Penguin population P2 b1 b2 Burger consumption 11755/18797

  44. Uncorrelatedness • Knowing X does not tell you what the average value of Y is • And vice versa Average Income b1 b2 Burger consumption 11755/18797

  45. Uncorrelated Variables X as a function of Y Y as a function of X • The average value of Y is the same regardless of the value of X and vice versa Average Income Burger consumption 11755/18797

  46. Uncorrelatedness • Which of the above represent uncorrelated RVs? 11755/18797

  47. The notion of decorrelation • So how does one transform the correlated variables (X,Y) to the uncorrelated (X’, Y’) ? Y Y’ X X’ 11755/18797

  48. What does “uncorrelated” mean Assuming0 mean • E[X’] = constant (0) • E[Y’] = constant (0) • E[X’|Y’] = 0 • E[X’Y’] = EY’[E[X’|Y’]] = 0 0 • If Y is a matrix of vectors, YYT = diagonal Y’ X’ 11755/18797

  49. Decorrelation • Let X be the matrix of correlated data vectors • Each component of X informs us of the mean trend of other components • Need a transform M such that if Y = MX such that the covariance of Y is diagonal • YYT is the covariance if Y is zero mean • YYT = Diagonal • MXXTMT = Diagonal • M.Cov(X).MT = Diagonal 11755/18797

  50. Decorrelation • Easy solution: • Eigen decomposition of Cov(X): Cov(X) = ELET • EET = I • Let M = ET • MCov(X)MT = ETELETE = L = diagonal • PCA: Y = MX • Diagonalizesthe covariance matrix • “Decorrelates” the data 11755/18797

More Related