320 likes | 456 Views
Doing Quantitative Research 26E02900, 6 ECTS Cr. Olli-Pekka Kauppila Daria Volchek. Lecture II - May 14, 2014. Today’s lecture. AM session Descriptive statistics, assumptions for regression analyses PM session Introduction to regression analysis. Learning objectives – AM session.
E N D
Doing Quantitative Research26E02900, 6 ECTS Cr. Olli-Pekka Kauppila Daria Volchek Lecture II - May 14, 2014
Today’slecture AM session • Descriptive statistics, assumptions for regression analyses PM session • Introduction to regression analysis
Learning objectives – AM session Deepen the understanding of research design and measures Improve skills at using SPSS Understand different ways of dealing with missing values Learn more about computing variables Learn the assumptions for multivariate analyses Learn to make graphs to examine and illustrate your data Learn to identify and deal with outlier observations Learn more about interpreting correlations
Opening an excelfile in SPSS Open SPSS software File → Open → Data Find and open your excel file When the dataset is open, save it in .sav form • File → Save as
How to deal with missingvalues? No matter where and how you collect your data, you are likely to have missing values Missing values are not a problem, provided that there are not too many of them, and you deal with them appropriately • Remove any cases with a high number of missing values • Do not use variables with a high number of missing values • Common remedies • Deletion: if missing values are few and randomly distributed • Replace with means: if missing values are relatively few • Multiple imputation: use software to estimate the missing values based on what is known about the case
Computing variables to make them analyzable Before analyses, you usually need to modify your variables Examples: • Transform reversely coded items: item 4B on job satisfaction scale is reversely coded. Thus, 4B_re: 5→1; 4→2; 3→3; 2→4; 1→5 • Compute summated scales: Job satisfaction = (4A + 4B_re + 4C + 4D) / 4 • Create dummies: e.g. Firm 1: Firm 1 = 1; others = 0 • Transform other variables: e.g. Employee age = data collection year - year of birth SPSS: Transform → Compute variable
Focus on variables that you use in your analyses We need to use the following primary variables • Job satisfaction (4A, 4B*, 4C, 4D) • Risk avoidance (11A*, 11B*, 11C*, 11D*, 11F) • Perceived managerial support (16A, 16B, 16C, 16D, 16E, 16F) * = Reverse-coded items • And the following background variables • Age (= Data collection year - Birth year) • Gender (1 = Female, 0 = others) • Firm membership (three different firms → Create 3 dummy variables)
Analyze reliability of the perceptual measures Analyze → Scale → Reliability analysis Go to “statistics;” select all items from “descriptives for” • Job satisfaction (4A, 4B_re, 4C, 4D): Alpha = .79 → ok • Risk avoidance (11A_re, 11B_re, 11C_re, 11D_re, 11F): Alpha = .65; but .82 ifwedropitem 11F → dropit→ ok • Perceived managerial support (16A, 16B, 16C, 16D, 16E, 16F): Alpha = .96 → ok • Transform → Computevariable • E.g. Target variable: Jobsat; Numericexpression: (@4B_re+@4A+@4C+@4D)/4
Assumptions for multivariate analyses Normal distribution Homoscedasticity • I.e. equal levels of variance across the range of predictor variables Linearity Absence of uncorrelated errors • I.e. relevant but unmeasured variables do not bias the results
Graphical examination of data SPSS: Graphs → Chart builder Like in Excel, you find bar, line, area, and pie charts that you may use to depict your data Particularly useful graphs: • Scatter plots • Histograms • Boxplots • Correlation analysis gives you an overview of how different variables are related one another
Scatterplot Linear regression line Locally weighted regression line How employee age relates to role clarity?
How job satisfaction is related to managerial support? Or, perhaps the effect is not linear after all…
Histograms How values for employee role clarity are distributed?
Boxplots Are distributions of role clarity any different for male and female employees?
Skewness and kurtosis Analyze → Descriptive statistics → Descriptives → Options → Skewness and kurtosis When there is no skewness or kurtosis, the variable is normally distributed
Skewness Positively skewed distribution Negatively skewed distribution Common remedies; transform the variable by taking: Logarithm or squared term Squared or cubed terms
Kurtosis Peaked distribution - positive value Flat distribution - negative value Common remedies; transform the variable by taking: Try all transformations Inverse of the variable (1 / X or Y)
How risk avoidance is related to employee age?What can you tell about the distribution of job satisfaction?Does the level of perceived managerial support vary between firms? Classroom exercise
How risk avoidance is related to employee age? Older employees tend to be more risk averse than younger employees
What can you tell about the distribution of job satisfaction?
Does the level of perceived managerial support vary between firms?
Outliers Outliers are observations that deviate substantially from other observations The key question is: why is it that the outlier observation is so different? In general, if the outlier observation seems to be caused by a mistake, then it should be deleted • I.e. a respondent’s birth year is marked as 1776 If the outlier observation is substantially different from other observations, but nevertheless a “legitimate member” of the sample, it should be retained • I.e. annual salaries of some (very few) individuals are millions of euros
Outliers Why these three individuals have such a low level of role clarity? Should we remove these outliers from the analysis?
Correlationanalysis Correlation analysis shows you how different variables are related to one another When the sample size increases, even relatively weak correlations become statistically significant Because of multicollinearity, you do not want to include strongly correlated independent variables into the same model Note: correlation does not imply causation! SPSS: Analyze → Correlate → Bivariate
Model Dependent variable: • Job satisfaction Independent variables: • Risk avoidance • Perceived managerial support • Control variables: • Gender • Age • Firm affiliation
Correlationtable - output In most datasets, correlations above .2 are statistically significant. Correlations above .5 are very strong Usual cutoff values: p < .001 p < .01 p < .05
This correlation is not significant. Will that be a problem? Correlationtable - output What do these correlations tell us? Should we exclude a firm dummy from the regression model?
Correlationtable - output Will risk avoidance cause job satisfaction? What is the role of employee age?
Key descriptives Besides correlations, researchers usually report means and standard deviations of the variables • Mean is informative as it gives you a fairly good understanding of the overall level of variables. I.e. It is quite different if the mean value of job satisfaction is 2.2, rather than 3.9 (on a 5-point Likert scale) • Standard deviation is informative, because it helps you interpret high and low values. Values one standard deviation above the mean value are usually considered as “high,” and values one standard deviation below the mean value are “low” I.e. If job satisfaction’s mean value is 3.9 and standard deviation 0.4. Thus, job satisfaction of 4.3 is “high” and 3.5 is “low.”