450 likes | 474 Views
Learn about non-synonymous SNPs and their impact on molecular functions, evolution, and disease. Explore how to predict mutation effects in proteins, understand selection forces, and identify disease variants.
E N D
Human non-synonymous SNP: molecular function, evolution and disease Shamil Sunyaev Genetics Division, Brigham & Women’s Hospital Harvard Medical School Harvard-M.I.T. Division of HST
Effect on molecular function Structural Biology Biochemistry Medical Genetics Evolutionary Genetics Phenotype Natural selection
Why is this useful? • Understanding variation in molecular function and structure • Evolutionary genetics: comparison of polymorphism and divergence rates between different functional categories is a robust way to detect selection
Linkage analysis Rare
Classical association studies Common Disease Control
Why is this useful? • Rare human developmental disorders / mouse mutagenesis screens: linkage studies are impossible • Genetics of complex disease: SNP prioritization • Genetics of complex disease: Rare variants
Quantitative trait Mendelists Biometricians Forces to maintain variation: Selection Mutation
Common disease / Common variant Trade off (antagonistic pleiotropy) Balancing selection Recent positive selection Reverse in direction of selection Examples APOE Alzheimer’s disease AGT Hypertension CYP3A Hypertension CAPN10 Type 2 diabetes
Individual human genome is a target for deleterious mutations ! Frequency of deleterious variants is directly proportional to mutation rate (q=m/s) ~40% of human Mendelian diseases are due to hypermutable sites
Multiple mostly rare variants Many deleterious alleles in mutation-selection balance Examples Plasma level of HDL-C Plasma level of LDL-C Colorectal adenomas
Function: damaging Evolution: deleterious Phenotype: detrimental Advantageous pseudogenization (Zhang et al. 2006) Gain of function disease mutations Sickle Cell Anemia Harmful mutations
protein multiple alignment profile
Prediction rate of damaging substitutions possibly probably 82% 57% Disease mutations 9% 3% Divergence Polymorphism 27% 15%
10% of PolyPhen false-positives are due to compensatory substitutions
Williamson et al., PNAS 2005 Estimate of selection coefficient -6.072* -11.732* Polyphen
NO DELETERIOUS POLYMORPHISM LOTS OF DELETERIOUS POLYMORPHISM de novo mutation effect spectrum Effect of new mutation may range from lethal, to neutral, to slightly beneficial
NO DELETERIOUS POLYMORPHISM LOTS OF DELETERIOUS POLYMORPHISM Mutation effect spectrum ?
Neutral mutation model Human ACCTTGCAAAT ChimpanzeeACCTTACAAAT Baboon ACCTTACAAAT Prob(TAC->TGC) Prob(TGC->TAC) Prob(XY1Z->XY2Z) 64x3 matrix
Mildly deleterious mutations 54 genes, 757 individuals inflammatory response 236 genes, 46-47 individuals DNA repair and cell cycle pathways 518 genes, 90-95 individuals
Frequency itself is a reliable predictor of function! The majority of missense mutations observed at frequency below 1% are deleterious
Wild type New mutation N1= 4 N2= 3 N2 Fitness 1 = 1 – s N1 Selection coefficient Fitness and selection coefficient
Mildly deleterious mutations 54 genes, 757 individuals inflammatory response 236 genes, 46-47 individuals DNA repair and cell cycle pathways 518 genes, 90-95 individuals
Estimation of selection coefficient - simulation present Human effective population size 1001001100111101010010010111010100001111001100011100010111001 past
Estimation of selection coefficient - simulation present Human effective population size Fsingl(s) FMAF>25%(s) SNP probability to be observed past Selection coefficient -log(s)
Classical association studies Common Disease Control
“Mutation enrichment” association studies Rare Disease Control
“Mutation enrichment” association studies Rare Disease Control
“Mutation enrichment” association studies Rare missense variants in NPC1L1 gene contributes to variability in cholesterol absorption and plasma levels of low-density lipoproteins (LDLs) Cohen J et al., PNAS 2006 in press Nonsynonymous sequence variants in ABCA1 gene were significantly more common in individuals with low HDL-C (<fifth percentile) than in those with high HDL-C (>95th percentile). Cohen J et al., Science 2004 Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas Fearnhead NS et al., PNAS 2004
What about common alleles of smaller effect? • Population of 3500 individuals with known plasma levels of HDL-C • Population includes both genders and three ethnic groups • 839 SNPs genotyped • Independent population of 800 individuals for validation
What about common alleles of smaller effect? • Introduce a linear model (ANCOVA) • Subsequently add SNPs to the linear model • Include SNPs based on the likelihood ratio test • Prioritizing SNPs based on conservation did not help
Acknowledgements The lab: Gregory Kryukov, Steffen Schmidt, Saurabh Asthana, Victor Spirin, Ivan Adzhubey Bioinformatics: Human genetics: Vasily Ramensky Jonathan Cohen