1 / 51

Class 3 Relationship Between Variables

SKEMA Ph.D programme 2010-2011. Class 3 Relationship Between Variables. Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr. Relationship Between Variables. Which variables are we looking at ?. Qualitative × Qualitative. Qualitative ×

amadis
Download Presentation

Class 3 Relationship Between Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SKEMA Ph.D programme 2010-2011 Class 3Relationship Between Variables Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

  2. Relationship Between Variables Which variables are we looking at ? Qualitative × Qualitative Qualitative × Quantitative Quantitative × Quantitative ANOVA

  3. ANOVA

  4. ANOVA: ANalysis Of VAriance • ANOVA is a generalization of Student t test • Student test applies to two categories only: • H0: μ1 = μ2 • H1: μ1≠ μ2 • ANOVA is a method to test whether group means are equal or not. • H0: μ1 = μ2 = μ3 = ... = μn • H1: At least one mean differs significantly

  5. ANOVA This method is called after the fact that it is based on measures of variance. The F-statistics is a ratio comparing the variance due to group differences (explained variance) with the variance due to other phenomena (unexplained variance). Higher F means more explanatory power, thus more significance of groups.

  6. Revenues (in million of US $ )

  7. Revenues (in million of US $ )

  8. Revenues (in million of US $ ) Do sectors differ significantly in their revenues? H0 : μ1 = μ2 = μ3 = ... = μn H1: At least one mean differs significantly.

  9. residual df = (k – 1) df = n – k df = n – 1 ANOVA This decomposition produces Fisher’s Statistics as follows:

  10. The ANOVA decomposition on Revenues The result tells me that I can reject the null Hypothesis H0 with 0.03% chances of rejecting the null Hypothesis H0 while H0 holds true (being wrong). I WILL TAKE THE CHANCE!!!

  11. Comparison of Means Using Student t with STATA • We still use the same command ttest • ttest var1, by(varcat) • For example: • ttest lnassets, by(type) • ttest lnrd, by(year) • ttest lnrdi, by(type) • Beware! Unlike ANOVA, Student t test can only be perfomed to compare two categories.

  12. ANOVA under STATA We still use the same command anova anova var1 varcat For example: anova lnassets isic anova lnrd isic anova lnrdi isic anova cours titype

  13. STATA Application: ANOVA Stata Instruction P value F-Stat Sum of Squares

  14. Anova Example in Published Paper

  15. SPSS Application: ANOVA • Verify that US companies are larger than those from the rest of the world with an ANOVA • Are there systematic • Sectoral differences in terms of labour; R&D, sales • Write out H0 and H1for each variables • Analyse  Comparer les moyennes  ANOVA à un fateur • What do you conclude at 5% level? • What do you conclude at 1% level?

  16. SPSS Application: t test comparing means

  17. SPSS Application: t test comparing means

  18. Relationship Between Variables Which variables are we looking at ? Qualitative × Qualitative Qualitative × Quantitative Quantitative × Quantitative Chi-Square Independence Test

  19. Chi-Square Independence Test

  20. Introduction to Chi-Square • This part devoted to the study of whether two qualitative (categorical) variables are independent: H0: Independent:the two qualitative variables do not exhibit any systematic association. H1: Dependent: the category of one qualitative variable is associated with the category of another qualitative variable in some systematic way which departs significantly from randomness.

  21. The Four Steps Towards The Test • Build the cross tabulation to compute observed joint frequencies • Compute expected joint frequencies under the assumption of independence • Compute the Chi-square (χ²) distance between observed and expected joint frequencies • Compute the significance of the χ²distance and conclude on H0 and H1

  22. 1. Cross Tabulation • A cross tabulation displays the joint distribution of two or more variables. They are usually referred to as a contingency tables. • A contingency table describes the distribution of two (or more) variables simultaneously. Each cell shows the number of respondents that gave a specific combination of responses, that is, each cell contains a single cross tabulation.

  23. 1. Cross Tabulation • We have data on two qualitative and categorical dimensions and we wish to know whether they are related • Region (AM, ASIA, EUR) • Type of company (DBF, LDF)

  24. 1. Cross Tabulation • We have data on two qualitative and categorical dimensions and we wish to know whether they are related • Region (AM, ASIA, EUR) • Type of company (DBF, LDF)

  25. 1. Cross Tabulation • We have data on two qualitative and categorical dimensions and we wish to know whether they are related • Region (AM, ASIA, EUR) • Type of company (DBF, LDF)

  26. 1. Cross Tabulation • Crossing Region (AM, ASIA, EUR) × Type of company (DBF, LDF) • tabulate continent type

  27. 2. Expected Joint Frequencies • In order to say something on the relationship between two categorical variables, it would be nice to produce expected, also called theoretical, frequencies under the assumption of independence between the two variables.

  28. 2. Expected Joint Frequencies • tabulate continent type , expected

  29. 3. Computing the χ² statistics • We can now compare what we observe with what we should observe, would the two variables be independent. The larger the difference, the less independent the two variables. This difference is termed a Chi-Square distance. With a contingency table ofn lines and m columns, the statistics follows a χ²distribution with (n-1)×(m-1) degree of freedom, with the lowest expected frequency being at least 5.

  30. 3. Computing the χ² statistics • tabulate continent type , expected chi2

  31. 4. Conclusion on H0 versus H1 • We reject H0 with 0.00% chances of being wrong • I will take the chance, and I tentatively conclude that the type of companies and the regional origins are not independent. • Using our appreciative knowledge on biotechnology, it makes sense: biotechnology was first born in the USA, with European companies following and Asian (i.e. Japanese) companies being mainly large pharmaceutical companies. • Most DBFs are found in the US, then in Europe. This is less true now.

  32. 2. SPSS : Expected Joint Frequencies • AnalyseStatistiques descriptivesTableaux CroisésCelluleObservé & Théorique

  33. 3. SPSS : Computing the χ² statistics • AnalyseStatistiques descriptivesTableaux CroisésStatistiqueChi-deux

  34. Relationship Between Variables Which variables are we looking at ? Qualitative × Qualitative Qualitative × Quantitative Quantitative × Quantitative Correlations

  35. Correlations

  36. Introduction to Correlations • This part is devoted to the study of whether–and the extent to which– two or more quantitative variables are related:  Positively correlated: the values of one variable “varying somewhat in step” with the values of another variable Negatively correlated: the values of one continuous variable “varying somewhat in opposite step” with the values of another variable  Not correlated:the values of one continuous variable “varying randomly” when the values of another variable vary.

  37. Scatter Plot of R&D and Patents (log)

  38. Scatter Plot of R&D and Patents (log)

  39. Pearson’s Linear Correlation Coefficient r • The Pearson product-moment correlation coefficient is a measure of the co-relation between two variables x and y. • Pearson's r reflects the intensity of linear relationship between two variables. It ranges from +1 to -1. • r near 1 : Positive Correlation • r near -1 : Negative Correlation • r near 0 :No or poor correlation

  40. Pearson’s Linear Correlation Coefficient r Cov(x,y) : Covariance between x and y x et y : Standard deviation of x and Standard deviation of y n : Number of observations

  41. Pearson’s Linear Correlation Coefficient r • corrlpatlassetslrdlrdilpat_assets

  42. Pearson’s Linear Correlation Coefficient r • Is  significantly different from 0 ? • H0 : rx,y= 0 • H1: rx,y 0  t* : if t* > t with (n – 2) degree of freedom and critical probability α (5%), we reject H0 and conclude that r significantly different from 0.

  43. Pearson’s Linear Correlation Coefficient r • pwcorrlpatlassetslrdlrdilpat_assets, sig

  44. Pearson’s Linear Correlation Coefficient r • Assumptions of Pearson’s r • There is a linear relationships between x and y • Both x and y are continuous random variables • Both variables are normally distributed • Equal differences between measurements represent equivalent intervals. We may want to relax (one of) these assumptions

  45. Spearman’s Rank Correlation Coefficient ρ • Spearman's rank correlation is a non parametric measure of the intensity of a correlation between two variables, without making any assumptions about the distribution of the variables, i.e. about the linearity, normality or scale of the relationship. •   near 1 : Positive Correlation •   near -1 : Negative Correlation •   near 0 : No or poor correlation

  46. Spearman’s Rank Correlation Coefficient ρ d² : the difference between ranks of paired values of x and y n : Number of observations • ρ is simply a special case of the Pearson product-moment coefficient in which the data are converted to ranks before calculating the coefficient.

  47. Spearman’s Rank Correlation Coefficient ρ • spearmanlpatlassetslrdlrdilpat_assets

  48. Spearman’s Rank Correlation Coefficient ρ • spearmanlpatlassetslrdlrdilpat_assets, stats(rho p)

  49. Pearson’s r or Spearman’s ρ? • Relationship between tastes and levels of consumption on a large sample? (ρ) • Relationship between income and Consumption on a large sample? (r) • Relationship between income and Consumption on a small sample? Both (ρ) and (r)

  50. Pearson’s Linear Correlation Coefficient r • Analyse  Corrélation  Bivariée • Click on Pearson

More Related