1 / 48

Multivariate Methods

Multivariate Methods. Nels Johnson and Matt Williams Laboratory for Interdisciplinary Statistical Analysis. Outline. Principal Component Analysis Factor Analysis Multivariate T Tests MANOVA Multidimensional Scaling Correspondence Analysis. PCA – Motivating Examples.

ismet
Download Presentation

Multivariate Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Methods Nels Johnson and Matt Williams Laboratory for Interdisciplinary Statistical Analysis

  2. Outline • Principal Component Analysis • Factor Analysis • Multivariate T Tests • MANOVA • Multidimensional Scaling • Correspondence Analysis

  3. PCA – Motivating Examples • You have measured a number of variables concerning the size of aphids. You’d like to reduce the number of variables used for classification. • You have a bunch of football statistics for teams and would like to organize related teams based on these statistics.

  4. What is it? • Based on an eigenvalue decomposition of the covariance matrix S (or correlation matrix R) of the variables. • Goal: Maximizes the variance of linear combinations of the variables. • Obtained by transforming the variables so that the covariance of the new variables is diagonal. • These new variables are called the principal components (PCs) and their covariance matrix contains the eigenvalues along the diagonal. • This transformation can be thought of as a rotation of the axes. • Note: No variables are designated as dependent.

  5. What do we get out of it? • We can form an index measure (i.e. a score) or a weighted average of variables based on a subset of the PCs. • This reduces the number of variables we have to work with. • With some subject matter area knowledge we might be able to interpret the meaning of some of the PCs based on correlations.

  6. How to reduce the number of PCs? • Pick a proportion of variation you want to explain ahead of time, pick the number of PCs so that the sum of their eigenvalues (the proportion of variation explained by those PCs) is at least that amount. • Scree Plots • All PCs with eigenvalue >1 (Kaiser’s Rule) • Broken stick method

  7. What are some issues? • The scale variables are measured on matters. • Standardize variables so they are all on same scale. • Variables with a high amount of variability (i.e. large variance) will naturally steer the decomposition. • Again, standardize the variables. • When separation occurs perpendicular to an axis (i.e. PC) it might not be picked up without looking at other axes. • Plot the pairwise scores for each PC. This may require looking at too many graphs to be feasible.

  8. Scree Plot

  9. Biplot of Scores

  10. Factor Analysis – Some Motivating Examples • You have the ratings people give to their family members in areas such as Kindness, Intelligent, Happy, etc. Want to associate family members with some sort of overall construct of these words. • You have conducted a survey and want to group question based on a topic they address.

  11. What is it? • We assume the variables Y can be summarized by some underlying, unobserved, and reduced set of variables called factors (you must pick how many factors). • Goal is to estimate the factors. • After the factors are estimated, the next goal is to orthogonally rotate the solution to get simpler factors. • For Principal Factor Solution (more later): • Model : Y-μ = loadings*factors + error • var(Y-μ) or corr(Y-μ) = V = loadings*loadingsT + Ψ • The diagonals of H = V – Ψ are called the communalities. They are R2-like numbers. • Ψ is called the specific variance.

  12. How to Estimate the Factors? • Two main ways: • Principal Component Solution (Not PCA!) • Focuses on the diagonal of V (the variance). • Does poorly on the off diagonal (the covariance). • Principal Factor Solution • Focuses on the off diagonals of V and pretty much ignores the diagonal. • Maximum Likelihood Method • Assume normality of error and estimate the factors and loadings using an iterative MLE method. • May give nonsensical answers (i.e. Haywood case). • Can adjust iterative method so this doesn’t happen. • Rotations are unique.

  13. More On Rotations • If the rotation is orthogonal then • loadings*loadingsT = loadings*rotation*rotationT*loadingsT = (loadings*rotation)*(loadings*rotation)T • So we can redistribute the total variance and variation explain by each variable differently among the factors without actually changing them. • Lots of methods to pick rotations.

  14. Interpreting Analysis • Loadings represent the covariance (or correlation) between factors and variables. • So we look for high loadings to represent how underlying factors influence variables. • With some subject matter knowledge we can name factors based off of these loadings (when they make sense).

  15. Some Issues • Results can change depending on model choices (This is a big deal)! • Number of factors • Estimation method • Rotation method • Haywood cases when using MLE. • Existence of actual factors is suspect.

  16. Example

  17. Multivariate T Tests • Univariatet-test • Normal data, with unknown mean and variance • Hotelling’s T2 Test • Multivariate Normal data with unknown mean and Covariance

  18. One Sample Test • Assumptions • Observations are independent and multivariate normal • Testing • Null Hypothesis: μ = μ0 (vectors) • Alternative: μ ≠ μ0 (vectors)

  19. Example: One Sample Test • We are interested in 3 different types of calcium in the soil • We wish to test if our observed means are the true means (15,6,2.85)

  20. Two Sample Test • Assumptions • Two groups of multivariate normal data • Observations are independent • Means may be different but covariance is the same for both groups • Testing • Null Hypothesis: μ1 = μ2 (vectors) • Alternative: μ1 ≠ μ2 (vectors)

  21. Example: Two Sample Test • Four psychological tests were given to 32 men and 32 women • We are interested in seeing if the mean vectors are the same

  22. Other Tests • Two sample paired test • Use difference vector D = X1 – X2 • Partial Tests • Testing μi = μi0 in the presence of the other (p-1) means • What about more than 2 groups? • We had ANOVA instead of a t-test • Now we have MANOVA instead of a T2

  23. Multivariate Analysis of Variance MANOVA • Suppose we have data organized into several groups, with each observation giving a vector of responses • We would like to test the hypothesis that all the means for each of the groups are equal • We can do this in a manner very similar to the univariate Analysis of Variance (ANOVA)

  24. MANOVA • In ANOVA • We compare Sums of Squares within groups to Sums of Squares between groups • Sums of Squares are the sums of the squared differences between the observed values and the means • In MANOVA • We compare Sums of Squares matrices from within the groups to those between the groups • E is the “within” Sums of Squares matrix • H is the “between” Sums of Squares matrix

  25. Four Tests • There are four tests based on the eigenvalues of E-1H: λ1 > λ2 > … > λs with s ≤ pd • Pillai: • Lawley-Hotelling: • Wilk’s Lambda: • (reject for small values) • Roy’s Largest Root:

  26. Comparison of the Four Tests • In the collinear case • The groups have means that lie on a line in space (approximately) • θ ≥ U(s) ≥ Λ ≥ V(s) in terms of power • In the diffuse case • The groups means are spread out in a higher dimensional space (not a line) • θ ≤ U(s) ≤ Λ ≤ V(s) in terms of power

  27. Post-Test Analysis • Just like with ANOVA, after the test we can • Do pair-wise comparisons or contrasts • In MANOVA we can also • Do tests for the p individual variables • F tests to identify which variables are different

  28. Example: Rootstock Data • We wish to compare apple trees of different rootstocks • We have 8 trees from each of 6 rootstocks • Our four measurements are • Trunk girth at 4 years (y1) • Extension growth at 4 years (y2) • Trunk girth at 15 years (y3) • Extension growth at 15 years (y4)

  29. Rootstock Data • Test Results • Λ = .154 < Λ.05,4,5,40 = .455 • V(s)= 1.305 > V(s).05 = .645 • U(s) = 2.921 > U(s).05 • θ = .652 > θ.05 = .377 • Follow-up tests for individual variables • Y1 : F = 1.93, p = .1094 • Y2 : F = 2.91, p = .024 • Y3 : F = 11.97, p < .0001 • Y4 : F = 12.16, p < .0001

  30. Extensions • Two-way MANOVA • Multivariate Contrasts • Mixed Models • Split plot designs • Profile Analysis • Different R2-like numbers

  31. Multidimensional Scaling (MDS) • Data is a distance or similarity matrix • Many ways to generate • Goal is to reduce dimension and visualize • Often look at only 2 or 3 dimensions • Motivating Examples • Number of teeth for different species of mammals • Discriminating between colors (red vs. orange) • Distances between cities

  32. Two Kinds of MDS • Metric scaling (principal coordinates analysis) • Distances (Euclidean) in the reduced dimension are close to those measured in the full dimension • Non-metric scaling • Rank order of distances in the reduced dimension are close to those measured in the full dimension

  33. Types of Measures • There are MANY measures that can be used • Depends on type of data • Depends on interest in observations vs. variables • Properties • Minimum of 0, D(x,y) = 0 if x = y • Positive otherwise, D(x,y) > 0 • Symmetric, D(x,y) = D(y,x) • Triangle Inequality, D(x,y) + D(y,z) > D(x,z)

  34. Types ofMeasures • Measures that satisfy 1-4 are called Metrics • Measures satisfying 1-3 are Semi-metrics • Some measures have negative values and are called Non-metrics • Certain measures can be plotted or visualized in a Euclidean space • Distances and relationships plotted are meaningful • This is a stronger property than the triangle inequality

  35. Measures for our Examples • Mammal teeth - counts of teeth types • Manhattan (city block) distance • Total teeth different between two species • Difference between colors (Ekman) • Similarity measure – converted to distance • How well people distinguish between colors • We use the Kruskal measure (non-metric) • Distances between cities • Euclidean distance • Miles between cities

  36. Basic Procedure for MDS • Metric Scaling • Eigenvalue/eigenvector decomposition • Choose a reduced number of components that still preserves distances • Create new coordinates based on reduced components • Non-metric scaling • Reduce dimensions but preserve rank order • Done using Isotonic regression and iterative algorithms

  37. Examples: Teeth Data • 32 mammals and 8 categories of teeth • We are interested in how “close” these mammals are based on their teeth counts • We use city block distance and look at want to reduce things to 2 dimensions (from 8)

  38. Teeth Data

  39. Example: Ekman Color Study • 14 different wavelengths • 31 subjects asked to rate how well they could distinguish between different pairs • Ratings were averaged and scaled to get a similarity index between 0 and 1 • We use non-metric scaling and look at a reduction to 2 dimensions (from 14)

  40. Color Study

  41. Example: Distances between cities • We have 10 U.S. cities and distances between all pairs • Can we reduce this distance matrix to a lower dimension like 2 (from 10).

  42. City Distances

  43. Comments on MDS • There are MANY measures we can use • Some make more sense than others • It depends on the data and what you are interested in • Different measures can lead to different results • How many dimensions should you use? • It’s easiest to explain 2-3 three dimensions • There are different criteria or guidelines for metric and non-metric scaling

  44. One More Example • Supposed we have data that can be organized into a two-way table or binary or count values. • For a small table we can do some contingency table analyses like tests for homogeneity or independence. • For large tables we might like to reduce or summarize the table • One method is called Correspondence Analysis

  45. Correspondence Analysis • Our distance measure is the Pearson chi-square measure between the observed cell value and its expected value. • As before, we need to decide if we are interested in our subjects or our variables • Similar or analogous to PCA and MDS in terms of dimension reduction and interpretation. • Unfortunately, the terminology is a little different. So be careful.

  46. Example: Postal Employees • Postal employees for 6 positions were drug tested • Results include negative, marijuana, cocaine, and other • We are interested in identifying any patterns or trends

  47. Postal Employees

  48. Sources • We compiled the information from this talk from Methods of Multivariate Analysis 2nd ed. by Alvin C. Rencherand from our notes from STAT 5504 compiled by Dr. Eric Smith, Dept. of Statistics. • Thanks! Any questions?

More Related