Polygenic and multifactorial diseases

Polygenic and multifactorial diseases -key features and isolation of responsible genes Newcastle; 13th December 2007

Early genetics • Two camps • Followers of Mendel • Dichotomous traits • Believed that traits presented in predictable patterns of inheritance (AD, AR, XLD, XLR & Y linked) • Biometricians • Continuous/qualitative traits eg height, weight • Not amenable to Mendel’s laws as variable from individual to individual • Two camps married by demonstration that these continuous traits could be governed by a large number of independent genes each governed by Mendel’s law

Polygenic/Multifactorial inheritance • These findings form the basis of polygenic and multifactorial inheritance and are now known to account for qualitative traits such as height, weight, blood pressure ….. • …..and also for the common diseases that do not follow Mendelian inheritance patterns and represent major health problems such as heart disease, obesity, asthma, diabetes and cancer • Estimated that lifetime risk of genetically influenced common disorder in Western population is 60%

Mendelian and complex diseases Mendelian No of disease proteins In utero to puberty Puberty of age 50 Over 50 Lung complex Diabetes Affecteds per 1000 popn Heart/circulatory Athritis/musculoskeletal 18-44 55-64 45-54 Age range Bjornnson et al 2004; TIGS Vol 20(8):350-358

Polygenic/Multifactorial diseases • Following success in identification of the majority of single gene disorders eg CF, HD, DMD etc • New interest in identifying genes responsible for these common complex diseases • Improved technologies • Pharmaceutical company interest • Hope for designer medicine

Key concept of complex diseases • Multiple distinct loci interact with/without other factors including the environment to result in end stage phenotype • Expressed in population as a continuously variable susceptibility that follows Gaussian distribution • Effectively creates a gradient of susceptibility - phenotype presents beyond a certain threshold population mean Threshold of liability Affected individuals

Key concepts of complex disease • Familial concentration of disease without specific pattern of inheritance • Absence of clear biochemical defect resulting from single abnormal gene • Considerable variation in severity and expression of phenotype (between and within families) • Most affected individuals have unaffected parents • Often sex differences

Familial clustering popn mean General population Threshold of liability Affected individuals popn mean sib mean Siblings of affected individuals Threshold of liability Affected sibs

Recurrence risks • Affected individuals have inherited combination of high susceptibility alleles. • Relatives share these alleles • Thus cousins, aunts, uncles etc also at higher risk than general population • Parents with affected child have higher than average number of high risk alleles • Recurrence risk is higher if >1 family member affected • Greater the severity of the disease the higher the recurrence risk

Different thresholds between the sexes eg congenital pyloric stenosis 5x more common in boys than girls

Sex differences • Recurrence risk is greater if proband is of less commonly affected sex • Eg Congenital pyloric stenosis • Male probands • Recurrence in brothers 3.8% • Recurrence in sisters 2.7% • Female proband • Recurrence in brothers 9.2% • Recurrence in sisters 3.8%

Cleft lip and palate

Measurement of risk • The level of risk is measured by relative risk lR - R is the risk to a relative of affected individual compared to population risk eg sibling risk in diabetes; lS is 15 sibling risk in autsim; lS is 145 (91-93% genetic influence) The higher the value of l the stronger the genetic effect in disease • Relative risk for 1st degree relatives is approximately equal to square root of population incidence • Risk decreases rapidly in remotely related individuals • Eg in schizophrenia • siblings - 10.6 • first cousins - 3.6 • aunts/uncles - 2.5

Genetic basis of complex disease • Classic scale of human disease Single gene disorder polygenic multifactorial (single major locus (several loci (many loci and environment >phenotype) >phenotype) >phenotype Now seen as an oversimplification

Single gene disorders • “single gene disorders” are now known to represent disorders in which a single major locus is necessary and sufficient to result in the phenotype • Usually extensive allelic and non allelic heterogeneity • but where the phenotype can differ in intra and inter familial manner due to modifier genes and environment • eg in CF • many mutations identified in the CFTR gene • good degree of correlation with CF mutation and pancreatic disease • severity of pulmonary disease shows no such correlation • now known that expression of CFTR gene is modified by a number of other factors including variants at NOS1, HLAIII, TNFA, TGBF1, CFM etc

Polygenic diseases • digenic inheritance where defects at two loci are necessary to result in phenotype • eg retinitis pigmentosa; peripherin-RDS, ROM1 • trigenic inheritance, where defects in 3 loci/ alleles are necessary ro result in phenotype • eg Bardet Biedl; BBS2/BBS6 • oligogenic inheritance where several loci are involved in resulting phenotype

Multifactorial diseases • Increasing no of loci involved • ~18 different loci identified as being involved in diabetes • Approximately 40% familial clustering due to the HLA loci (lS =3); other involved include INS (1.9), CTL4 (1.2) • Each locus involved is neither sufficient or necessary to result in the phenotype • Also environmental factors

Monogenic and complex disease

Models to explain multifactorial disease • 2 basic models • Common disease - common variant(restricted polymorphism model) Proposed that there is a small number of loci with risk alleles that are common in the population (>1%) and each exerts a considerable genetic effecteg. APOEe4 allele in Alzheimer disease; Factor V Leiden in deep venous thrombosis • Common disease - rare variant Suggested that there are a large number of loci with risk alleles that are rare in the population (<1%) and each exerts little or moderate effect • In both models, different alleles at these loci may increase or decrease risk • leads to complex patterns of susceptibility

Other factors • Environmental influence - diet, exposure to toxins, exercise etc • Epistasis - interaction between different loci;where one particular allele at locus 1 prevents particular allele at locus 2 from manifesting its effect • Somatic changes • Epigenetics - methylation, imprinting etc All combine to produce disease state in individual

An example of epistasis

Identification of responsible genes Linkage analysis and association analysis are the 2 main strategies Linkage refers to the physical coupling between 2 loci (marker locus and disease locus indicates close proximity of marker to disease locus Association studies refers to the co occurrence of two variants (particular allele of marker and phenotype) indicates particular allele of marker (or one very close to it) is causative in disease

Linkage vs Association

Linkage studies • Classic linkage studies require • Large mulitgenerational single family (or multiple smaller families in clear homogeneous disease) • Defined mode of inheritance • Single locus responsible • Known penetrance • Genetic homogeneity • Clearly not the case in complex diseases • however can still be used in non parametric models under different modes of inheritance, allowing for heterogeneity etc • thus likelihood of detecting causative locus less than in single major locus disorders • alleles of low or moderate genetic effect unlikely to be identified

Association studies • Several methods available including • Case control series • Affected sib pair • Affected pedigree member • TDT • Each has different advantages, disadvantages and limitations • Complex statistical analysis • Can be influenced by • Sample size, selection of controls • Population stratification, admixture • Epistasis, age of disease • Problems in multiple testing • Informativeness, density of markers • Level of risk alleles effect in disease

Association studies- population based • Case-control study • Most widely applied strategy • Series of affected patients vs series of matched controls • Cases readily obtained; genotyping easy • Most prone to producing false positive results - usually due to incorrect control population selection. Any difference in allele frequency between groups may be due to differences between populations (independent of disease) • Require significant numbers to adequately power study (1000s vs 100s); especially important in study of multiple variables • Case -cohort study • Cases and controls drawn from selected population under study to investigate broad spectrum of diseases and factors • Prospective; takes longer to select sufficient numbers

Association studies in diabetes type1

Association studies - family based methods • Affected sib pair method • Tests for increased inheritance of particular allele in sibs vs controls • Identity by descent more powerful than identity by state • IBD requires parental testing (IBS does not) • Affected pedigree member method • Additional family members tested; similar to ASP • Discordant trios • Useful if parental samples unavailable

TDT • Transmission disequilibrium testing • Requires testing of parental and affected offspring and classifies alleles transmitted to affected children and not transmitted • Test for Requires that parents are heterozygous • Power lost if parents are homozygous • Provides a joint test of linkage and association • Eliminates stratification effects • Can be modified to account for multi-allelic markers, multiple siblings, missing parental data and quantitative straits

Factors influencing success of studies Control populations (stratification) family based controls vs matched controls Study population inbred populations reduce number of segregating loci and non allelic heterogeneity admixture; eg Latin American, African American; creates disequilibrium that breaks down rapidly for unlinked markers; utilised in MALD (mapping by admixture linkage disequilibrium) Epistasis; interaction between alleles can be accounted for by statisitical models (Markov chain Monte Carlo based methodology)

Factors influencing success of studies • Age of disease • old ancient diseases (restricted polymorphism model) have low range of linkage disequilibrium (~3kb). Requires high density map of markers to detect association • new diseases have high range of disequilibrium ( 10kb). Low density scans required but low power to detect • Genetic effects of risk allele • Few loci exerting considerable effect • Power to detect reduces with increasing no of loci • Informativeness of markers • power to detect decreases with reduced heterozygosity • Inference of linear distance • Distance between marker and disease not easy to predict due to non linear relationship between LD and distance below 60kb

Identification of risk alleles • Linkage/association studies to locate regions of interest • Finer mapping of regions • Sequence analysis of candidate genes within interval • Numerous sequence variants likely to be present • Role of identified variants? Combination of variants?? Logistical challenge • Functional tests of candidates; cellular testing, knock out/in models etc

Polygenic/Multifactorial diseases • Encompass latter end of spectrum of human disease • Result from combination of numerous loci each of which is neither sufficient nor necessary to result in disease • Identified by combination of linkage and association studies • Statistical and logistical issues

Selected references • Bjornsson et al (2004) TIGS Vol 20(8):350-358 • Weeks and Lathrop (1995) TIGS Vol 11(12); 513-519 • Smith & O’Brien (2005) Nature Review Genetics online pub 12 July; 1-10 • Cardon & Bell (2001) Nature Review Genetics Vol 2; 91-99 • Laird & Lange (2006) Nature Review Genetics Vol 7; 385-394 • Wright et al (1999) Nature Genetics Vol 23; 397-404 • Badano & Katsanis (2002) Nature Review Genetics Vol 3; 779-789 • Wright et al (2003) TIGS Vol 19 (2); 97-106 • Antonarakis & Beckman (2006) Vol 7; 277-282 • Lanpher et al (2006) Nature Review Genetics Vol 7; 449-460 • Concannon et al (2005) Diabetes Vol 54; 2995-3001 • & Strachan & Read

Polygenic and multifactorial diseases