290 likes | 633 Views
Multivariate analysis : Introduction. Third training Module EpiSouth Madrid, 15 th to 19 th June, 2009 Dr D. Hannoun National Institute of Public Health Algeria. Introduction: Generality. Stratification allows us: Control confounding Reveal effect modification
E N D
Multivariateanalysis: Introduction Third training Module EpiSouth Madrid, 15th to 19th June, 2009 Dr D. Hannoun National Institute of Public Health Algeria
Introduction: Generality Stratification allows us: • Control confounding • Revealeffect modification Limits of stratification: • Only a few numberof confounderscouldbecontrolledsimultaneously • The joint effectof confounderscannotbeanalysedcorrectly+++ • Choice of classes withquantitative variables Othertools: MULTIVARIATE ANALYSIS Assess the reality of the effect of exposure on the disease
Introduction: Joint effect Example: Hepatitis B SEP Potential confounders: Age (children/adults), immunity (good/deficient) Jointeffect: the effect of two/more factorscombined together Marginal effect: the effect of one confounderalonewithout taking in consideration the other potential confounders Age (F1) 2,0 2,0 2,0 Immunity (F2) 2,0 2,0 2,0 Factors 1+2 1,01,0 1 11,0
Multivariateanalysis: Definition Procedures, at the analysis phase, that Definition: • Simultaneously, adjust for severalvariables • Simultaneously, controlfor several potential confounders Several models: • Multiple linear regression • Logistic regression • Cox regression …. Vocabulary • Disease Y= dependant variable • Risk factors= independant variables or predictors
Multivariateanalysis: Definition By modelling the relationship studied How: • Representation of the disease Y as a functionof other variables • Risk factors • Potential confounders Thebest Subset of variables describes the relationship between RF and disease Statisticalprocedures: Multivariateanalysis: To describe the disease via an equation Set of variables Measure of the relationship: parameters The best model fitting the data
Multivariateanalysis: Definition Writing Model: • E(Y/E, X1, X2…, Xp) = f(E, X1, X2…, Xp) • Y: a given Disease • E: Exposure • X1,X2…: other variables Example: • F= linear function E(Y/E,X1, X2…, Xp) =α+βE +β1X1 + β2X2+ … +βpXp • β, β1, β2… measure the relation between the exposure E, the others risk factors X1, X2… and the disease Y controlled on the other variables • If β =0 No relationship between exposure and the disease
Multivariateanalysis: Definition The adjusted measures of association we obtain from multivariable analysis are: For each variable in the model, weobtain the effectmeasure of the relationshipbetweenthis variable and the diseasecontrolled on the other variables Directeffects and not total effects
Multivariateanalysis: Advantages Advantages/techniques: • Estimation of effects and controlling for more than one confounder simultaneously • Study of the jointeffectof several risk factors and quantify the intensity of interaction • Possibility to havecontinuous risk factor • Study the dose-response relationship: interest for causality and the specific risk at intermediary levels • Study the trend effect according to the level of the risk factor • Prediction of the disease
Multivariateanalysis: Step Several steps: • Choosing the appropriate model to summarize data • Define the strategy variable selection • Estimate the model coefficients • Method of least squares (LS) estimation • Method of maximum likelihood (ML) estimation • Writing and interpreting the model • Study the adequation of the model
Multivariateanalysis: Choice of the model Depends on the form of the function f: • Nature of the outcome variable • Continuous outcome Multiple linear Regression • Categorical outcome Logistic regression (LR) • Outcome time to an event Cox regression • Nature of joint effect • Additif Multiple linear regression • Multiplicatif Logistic regression Cox regression • Form of the variable-distribution • Normally distributed… • Assumption
Multivariate analysis: variables selection The final model depends on the variables will be selected: • At the study design: • Decide which variables to adjust or to control for • How the variable will be coded • Which interaction should be considered • At the analytical phase: • Which variables must be entered in the model • Variables must be forced • P value • E.g.: 7 variables coded 0/1 with all interaction terms 27 = 128 coefficients to estimate in the final model! Neccesity of STRATEGY
Multivariate analysis: Parameters estimation Purpose of multivariate analysis: • To obtain some measure of the effect that describes the exposure-outcome relationship adjusted for relevant extraneous factors Parameters estimation depends on the model used: • In MLR regression coefficientsβ • In LR odds ratio • In Cox hazard ratio
Multivariateanalysis: Modeladequation Verify the adequation of the model: • Capacity of the model to represent correctly the value of the disease given the value of subset of risk factors Steps: • Adequation of the model: • Graphical methods +++ • Statistical tests • Interpreting the test: be careful to the outlier • The best model is necessary not the best statistical model: choose the model with the best understanding of the disease The fitting model could be used for prediction
MLR: Introduction = multivariate model used in case of continuous data Principle: • Describe one variable as a linear functionof one or more other variables • Form: E(Y)=f(E,X1,X2…) F= linear function • E(Y/X) = α + βXSimple linear regression model • E(Y/X1, , Xp)= α + β 1X1 + … + βpXpMulti. linear regression model • • • E(Y) = α + βX Disease • • • • • • • • • • • •
MLR: Introduction In simple linear regression: β= slope of the straight line • Estimate the change in Y for one unit of X • E.g. when pollution atmospheric increases 1%, the incidence rate of ARI increases by 2 cas/100.000 person α= intercept which correspond to the value of disease when the exposure equal 0, or more generally describes the baseline ε= error term in the model • • Statistical model Y = α + βX + ε • • Incidence rate of ARI • • • • • • • ^ ^ ^ • Y = α + βX Atmopsheric pollution: density of PM10
MLR: Introduction In Multiple linear regression: • Statistical model: Y = α + β1X1+β2X2+ … +βpXp+ ε • E.g.: • Variation of incidence rate of ARI with atmospheric pollution • Potential confounders: age and smoking • X1 = density of PM10 • X2 = age of person • X3 = smoking ARI Inc. Rate =α+β1densityof PM10 +β2Age + β3smoking +ε
MLR: Introduction ARI Inc. Rate = α + β1density of PM10 + β2 Age + β3 smoking + ε In Multiple linear regression: • β1=slopealong the X1dimension: variation of ARI with the change of 1 unit of PM10 densitycontrolled on the other variables • β2=slopealong the X2 dimension: variation of ARI with the change of one unit of AGEcontrolled on the other variables • β3=slopealong the X3 dimension: variation of ARI with the change of one unit of smoking (person/year) controlled on the other variables • α=intercept, value of the disease when there is no risk factor… • ε= error term in the model
MLR: Parameters estimation Method used: least squares estimation Principle: • Identify the best straight line that minimizes the sum of squared residuals Least squared line fit (Xi,Yi,) • • • SSR = Σ(Yi - Ŷi)2 = Σ(Yi - α – βX)2 • Yi • • Ŷi (Xi,Ŷi,) Xi
MLR: Variables selection Decide whichvariables tocontrol for: • Prediction of the risk of the disease • We haven’t to take in consideration all confounders but the best group of predictors • Importance in term of public Health +++ • E.g.: incidence rate of ARI – Exposure: atmospheric pollution – Predictors: age and smoking • Estimation of the relation between exposure and disease • We have to take in consideration ALL confounders to control confounding • Importance in term of causal association • E.g.: incidence rate of ARI – Exposure: atmospheric pollution – Predictors: age, smoking, breastfeeding, ROR…
MLR: Variables selection Which variables must be entered in the initial model:2 situations • Some are obligatory in the model because there are recognized as risk factor: exposure • Other variables significant relationship between the variable and the disease in the bivariate analysis • All candidate variables to modelling
MLR: Variables selection Which interaction should be considered: • Problem of interaction must be approached in a manner wich facilitatesunderstanding of the nature of the causal effect • Statistical considerationshould serverather than determine our objectives • Adjonction of an interaction term • Addition of an other regressioncoefficient in the equation • More difficultiesto interpret the model • For a given interaction, you must ensure that the variables which are in the term interaction are contained in the model
MLR: Variables selection ARI Inc. Rate = α + β1density of PM10 + β2 Age + β3 smoking + β2,3 Age smoking + β4 breastfeeding + β5 ROR + ε ARI Inc. Rate = α + β1density of PM10 + β2 Age + β2,3 Age smoking + ε Example: Incidence rate of ARI • Model WITH an interaction term: • Interaction BETWEENsmoking and age: β2,3X2X3
MLR: Variables selection Which variables must be entered in the initial model:2 situations • … • How the variables must be entered in the initial model:Strategymust be defined • Startwith ALL variables Backward elimination • Startwith NOvariable Forwardselection • Mixed thetwo previous methods Stepwiseselection
MLR: Variables selection First part of analytical phase Candidate variables to modeling At The stud design • Significant variables • Pollution • Age • Smoking • Breastfeeding • ROR • V. must be forced • Pollution sexe age ROR Age*smoking smoking Bivariate analysis and stratification Pollution Profession breastfeeding region Second part of analytical phase The largest possible model Backward Final model: Pollution Age Smoking Multivariate analysis Rules Define how the V. could be entered in the model Forward Stepwise
MLR: Backwards strategy Principle : • Begins with ALL candidate variables in the model largest POSSIBLE model • At each step,Drop one variable, the choice of this variable is based on statistical rules remains variable which is not significant • Continue until no more variables can be dropped, meaning all remaining variables are relevant Advantages: Evaluate the joint confounding effects of all variables Limits: With many risk factors, strata could provide no information
MLR: Forward strategy Principle : • Begins with NO variable in the model smallest POSSIBLE model • At each step,Keep one variable in the model, the choice of this variable is based on statistical rules • Start with the variable that has the biggest change-in-estimate impact when evaluated individually • Keep the var. which changes meangfully the adjusted estimate • Continue untilno other variables can be added Advantages: Avoids the initial sparse cell problem of backwards approach Limits: Does not evaluate joint confounding effects of many variables
MLR: Conclusion Goal of modeling: To obtain The smallest subset of relevant risk factors to describes the disease With thebest understandingof the disease Like for stratification, you must identify: First, significant interaction term: don’t forget to verifiy that the v. which are in the term interaction are contained in the model statistical significance + biological consideration Secondly, test the confounding effect No statistical test Retain significant risk factors, confounder risk factors and interaction term that help us to understand and to explain the occurrence of disease
Conclusion Multivariate analysis allows to control and adjust the effect of exposure with several extraneaous factors simultaneously The adjusted measures of association are direct effects and not total effects Multivariate analysis is a useful tool but it could be very dangerous if we haven’t preliminary defined the strategy Purpose of the study Method of variable selection Assumption Adequation of the model…
Conclusion As with stratification method, statistical considerations should serve rather than determine our objectives Multivariate analysis requires computer to run the statistical programme The choice of the model depends upon of a lot of factors: outcome variable, form of the relationship between exposure and disease…