190 likes | 319 Views
Introduction to L ogistic R egression. Simple linear regression. Table 1 Age and systolic blood pressure (SBP) among 33 adult women. SBP (mm Hg). Age (years). adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974. Simple l inear regression.
E N D
Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women
SBP (mm Hg) Age (years) • adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974
Simple linear regression • Relation between 2 continuous variables (SBP and age) • Regression coefficient b1 • Measures associationbetween y and x • Amount by which y changes on average when x changes by one unit • Least squares method y Slope x
Multiple linear regression • Relation between a continuous variable and a setofi continuous variables • Partial regression coefficients bi • Amount by which y changes on average when xi changes by one unit and all the other xis remain constant • Measures association between xi and y adjusted for all other xi • Example • SBP versus age, weight, height, etc
Multiple linear regression Predicted Predictor variables Response variable Explanatory variables Outcome variable Covariables Dependent Independent variables
General linear models • Family of regression models • Outcome variable determines choice of model • Outcome Model • Continuous Linear regression • Counts Poisson regression • Survival Cox model • Binomial Logistic regression
Logistic regression • Models relationship betweenset of variables xi • dichotomous (yes/no) • categorical (social class, ...) • continuous (age, ...) and • dichotomous (binary) variable Y • Dichotomous outcome very common situation in many fields
Logistic regression (1) Table 2 Age and signs of coronary heart disease (CD)
How can we analyse these data? • Compare mean age of diseased and non-diseased • Non-diseased: 38.6 years • Diseased: 58.7 years (p<0.0001) • Linear regression?
Logistic regression (2) Table 3Prevalence (%) of signs of CD according to age group
Dot-plot: Data from Table 3 Diseased % Age group
Logistic function (1) Probability ofdisease x
{ logit of P(y|x) Transformation • a = log odds of disease in unexposed • b = log odds ratio associated with being exposed • e b = odds ratio
Fitting equation to the data • Linear regression: Least squares • Logistic regression: Maximum likelihood • Likelihood function • Estimates parameters a and b • Practically easier to work with log-likelihood
Maximum likelihood • Iterative computing • Choice of an arbitrary value for the coefficients (usually 0) • Computing of log-likelihood • Variation of coefficients’ values • Reiteration until maximisation (plateau) • Results • Maximum Likelihood Estimates (MLE) for and • Estimates of P(y) for a given value of x
Multiple logistic regression • More than one independent variable • Dichotomous, ordinal, nominal, continuous … • Interpretation of bi • Increase in log-odds for a one unit increase in xi with all the other xis constant • Measures association between xi and log-odds adjusted for all other xi
Interpreting Odds Odds tells us that a single unit increase in x1, holding x2; : : : ; x12 constant, is associated with an increase in the odds that a customer accepts the offer by a factor of exp(β1) Odd interpretation applies to continuous and dummy variables