900 likes | 1.17k Views
Proteogenomics. Protein Identification by Mass Spectrometry. Samples. Peptides. MS/MS. Protein DB. Compare, score, test significance. Identified peptides and proteins. Tumor Specific Databases. Next-generation sequencing of the genome and transcriptome. Samples. Peptides. MS/MS.
E N D
Protein Identification by Mass Spectrometry Samples Peptides MS/MS Protein DB Compare, score, test significance Identified peptides and proteins
Tumor Specific Databases Next-generation sequencing of the genome and transcriptome Samples Peptides MS/MS Sample-specific Protein DB Compare, score, test significance Identified peptides and proteins
Sequencing by Synthesis Illumina Sequencing by synthesis
Genomics Data Analysis Images Intensities Reads Alignments
RNA-Seq Data Analysis Paired-end short reads De Novo Assembly Alignment to genome Transcript X Reference genome Exon 1 Exon 2
Tumor Specific Databases Ruggles KV et al., MCP 2015
Example of variant peptide Protein: NP_001138550 zinc finger protein 805 isoform 2 [Homo sapiens] Genome location: chr19:57764586+ 1485 0 DNA Variant: G183A Protein Variant: V62I MQGERLRPGLDSQKEKLPGKMSPKHDGLGTADSVCSRIIQDRVSLGDDVHDCDSHGSGKNPVIQEEENIFKCNECEKVFNKKRLLARHERIHSGVKPYECTECGKTFSKSTYLLQHHMVHTGEKPYKCMECGKAFNRKSHLTQHQRIHSGEKPYKCSECGKAFTHRSTFVLHNRSHTGEKPFVCKECGKAFRDRPGFIRHYIIHSGENPYECFECGKVFKHRSYLMWHQQTHTGEKPYECSECGKAFCESAALIHHYVIHTGEKPFECLECGKAFNHRSYLKRHQRIHTGEKPYVCSECGKAFTHCSTFILHKRAHTGEKPFECKECGKAFSNRADLIRHFSIHTGEKPYECMECGKAFNRRSGLTRHQRIHSGEKPYECIECGKTFCWSTNLIRHSIIHTGEKPYECSECGKAFSRSSSLTQHQRMHTGRNPISVTDVGRPFTSGQTSVNIQELLLGKNFLNVTTEENLLQEEASYMASDRTYQRETPQVSSL
Example of variant peptide Protein: NP_001138550 zinc finger protein 805 isoform 2 [Homo sapiens] Genome location: chr19:57764586+ 1485 0 DNA Variant: G183A Protein Variant: V62I MQGERLRPGLDSQKEKLPGKMSPKHDGLGTADSVCSRIIQDRVSLGDDVHDCDSHGSGKNPVIQEEENIFKCNECEKVFNKKRLLARHERIHSGVKPYECTECGKTFSKSTYLLQHHMVHTGEKPYKCMECGKAFNRKSHLTQHQRIHSGEKPYKCSECGKAFTHRSTFVLHNRSHTGEKPFVCKECGKAFRDRPGFIRHYIIHSGENPYECFECGKVFKHRSYLMWHQQTHTGEKPYECSECGKAFCESAALIHHYVIHTGEKPFECLECGKAFNHRSYLKRHQRIHTGEKPYVCSECGKAFTHCSTFILHKRAHTGEKPFECKECGKAFSNRADLIRHFSIHTGEKPYECMECGKAFNRRSGLTRHQRIHSGEKPYECIECGKTFCWSTNLIRHSIIHTGEKPYECSECGKAFSRSSSLTQHQRMHTGRNPISVTDVGRPFTSGQTSVNIQELLLGKNFLNVTTEENLLQEEASYMASDRTYQRETPQVSSL NPIIQEEENIFK ____________
Example of introduced stop codon Protein: NP_003499 frizzled-9 precursor [Homo sapiens] Genome location: chr7:72848337+ 1776 0 DNA Variant: C155A Protein Variant: Y52* MAVAPLRGALLLWQLLAAGGAALEIGRFDPERGRGAAPCQAVEIPMCRGIGYNLTRMPNLLGHTSQGEAAAELAEFAPLVQYGCHSHLRFFLCSLYAPMCTDQVSTPIPACRPMCEQARLRCAPIMEQFNFGWPDSLDCARLPTRNDPHALCMEAPENATAGPAEPHKGLGMLPVAPRPARPPGDLGPGAGGSGTCENPEKFQYVEKSRSCAPRCGPGVEVFWSRRDKDFALVWMAVWSALCFFSTAFTVLTFLLEPHRFQYPERPIIFLSMCYNVYSLAFLIRAVAGAQSVACDQEAGALYVIQEGLENTGCTLVFLLLYYFGMASSLWWVVLTLTWFLAAGKKWGHEAIEAHGSYFHMAAWGLPALKTIVILTLRKVAGDELTGLCYVASTDAAALTGFVLVPLSGYLVLGSSFLLTGFVALFHIRKIMKTGGTNTEKLEKLMVKIGVFSILYTVPATCVIVCYVYERLNMDFWRLRATEQPCAAAAGPGGRRDCSLPGGSVPTVAVFMLKIFMSLVVGITSGVWVWSSKTFQTWQSLCYRKIAAGRARAKACRAPGSYGRGTHCHYKAPTVVLHMTKTDPSLENPTHL
Example of introduced stop codon Protein: NP_003499 frizzled-9 precursor [Homo sapiens] Genome location: chr7:72848337+ 1776 0 DNA Variant: C155A Protein Variant: Y52* MAVAPLRGALLLWQLLAAGGAALEIGRFDPERGRGAAPCQAVEIPMCRGIGYNLTRMPNLLGHTSQGEAAAELAEFAPLVQYGCHSHLRFFLCSLYAPMCTDQVSTPIPACRPMCEQARLRCAPIMEQFNFGWPDSLDCARLPTRNDPHALCMEAPENATAGPAEPHKGLGMLPVAPRPARPPGDLGPGAGGSGTCENPEKFQYVEKSRSCAPRCGPGVEVFWSRRDKDFALVWMAVWSALCFFSTAFTVLTFLLEPHRFQYPERPIIFLSMCYNVYSLAFLIRAVAGAQSVACDQEAGALYVIQEGLENTGCTLVFLLLYYFGMASSLWWVVLTLTWFLAAGKKWGHEAIEAHGSYFHMAAWGLPALKTIVILTLRKVAGDELTGLCYVASTDAAALTGFVLVPLSGYLVLGSSFLLTGFVALFHIRKIMKTGGTNTEKLEKLMVKIGVFSILYTVPATCVIVCYVYERLNMDFWRLRATEQPCAAAAGPGGRRDCSLPGGSVPTVAVFMLKIFMSLVVGITSGVWVWSSKTFQTWQSLCYRKIAAGRARAKACRAPGSYGRGTHCHYKAPTVVLHMTKTDPSLENPTHL _____ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ _______________________________
Example of removed stop codon Protein: NP_899231 serine protease 48 precursor [Homo sapiens]. Genome location: chr4:152198324+ 52,163,266,170,390 0,2623,4975,5944,13945 DNA Variant: T984G Protein Variant: *329E MGPAGCAFTLLLLLGISVCGQPVYSSRVVGGQDAAAGRWPWQVSLHFDHNFIYGGSLVSERLILTAAHCIQPTWTTFSYTVWLGSITVGDSRKRVKYYVSKIVIHPKYQDTTADVALLKLSSQVTFTSAILPICLPSVTKQLAIPPFCWVTGWGKVKESSDRDYHSALQEAEVPIIDRQACEQLYNPIGIFLPALEPVIKEDKICAGDTQNMKDSCKGDSGGPLSCHIDGVWIQTGVVSWGLECGKSLPGVYTNVIYYQKWINATISRANNLDFSDFLFPIVLLSLALLCPSCAFGPNTIHRVGTVAEAVACIQGWEENAWRFSPRGRELTGEPLLTLGDFIYNLK Protein: NP_899231 serine protease 48 precursor [Homo sapiens]. Genome location: chr4:152198324+ 52,163,266,170,390 0,2623,4975,5944,13945 DNA Variant: T984G Protein Variant: *329E MGPAGCAFTLLLLLGISVCGQPVYSSRVVGGQDAAAGRWPWQVSLHFDHNFIYGGSLVSERLILTAAHCIQPTWTTFSYTVWLGSITVGDSRKRVKYYVSKIVIHPKYQDTTADVALLKLSSQVTFTSAILPICLPSVTKQLAIPPFCWVTGWGKVKESSDRDYHSALQEAEVPIIDRQACEQLYNPIGIFLPALEPVIKEDKICAGDTQNMKDSCKGDSGGPLSCHIDGVWIQTGVVSWGLECGKSLPGVYTNVIYYQKWINATISRANNLDFSDFLFPIVLLSLALLCPSCAFGPNTIHRVGTVAEAVACIQGWEENAWRFSPRGR
Tumor Specific Databases Ruggles KV et al., MCP 2015
Predicted and observed SNV peptides in two breast PDX’s Ruggles KV et al., Mol Cell Proteomics 15 (2016) 1060-71
Predicted and observed junction peptides in two breast PDX’s Ruggles KV et al., Mol Cell Proteomics 15 (2016) 1060-71
Variant peptides in 105 Breast Tumors • TP53: Tumor suppressor • 273 Arg Cys (rs121913343) • AAs 273-280 involved in DNA interaction • Somatic in 3 tumors • KRAS: Cell proliferation regulating GTPase • 12 GlyVal • Variant shown to cause constitutive activation • Somatic in 2 tumors • MYO1C: Unconventional myosin IC • 826 GlnArg (rs9905106) • Somatic in 1 tumor, germline in 83 Mertins P et al., Nature 2016
Variant peptides in 105 Breast Tumors Mertins P et al., Nature 2016
Effects of Sequence Variation on the Proteome Protein sequence changes A modification site is changed Protein sequence does not change but the protein level increases or decreases
Effects of Sequence Variation on the Proteome Protein sequence changes A modification site is changed Protein sequence does not change but the protein level increases or decreases How do we utilize cancer-specific variants?
Antibodies …… … … D1 V1 V2 Vn Dn J1 J2 Jn VDJ Recombination Variable heavy- chain domain CDR1 CDR1 CDR2 CDR2 CDR3 CDR3 (Fingerprint) • Somatic hypermutation
HIV Antibodies J.F. Scheid et al, “Sequence and structural convergence of broad and potent HIV antibodies that mimic CD4 binding”, Science, 333 (2011) 1633-1637
Antibodies A Functional IgG Requires Paired Light and Heavy Chains VL VH CL CH1 = Light CH2 CH3 Heavy Standard IgG
Single-Chain Llama Antibodies • Atypical single-chain IgG antibody produced in camelid family (e.g. llama) • Retain high affinity for antigen without light chain • Antigen binding domain can be cloned and expressed to make “Nanobodies”: • - Extremely Cheap & Unlimited Amounts • - Tiny (~15 kDa) , Fold well & Stable in Solution • - Easily Engineered for Special Needs VHH Nanobody CH2 CH3 Single-chain IgG Standard IgG
DNA Library Construction Trim Read 1: 301 bp Overlap: ~200 bp Read 2: 301 bp Trim Read 2 Quality Read 1 Quality 1 5 1 5 30-34 10-14 50-59 10-14 30-34 50-59 250-299 150-199 150-199 250-299
DNA Library Construction Trim Read 1: 301 bp Overlap: ~200 bp Read 2: 301 bp Trim Merging of reads Merged read length Merged read quality 1 5 10-14 30-34 50-59 150-199 250-299
Identifying full-length sequences from peptides Nanobody PrimarySequences with CDR Regions Annotated Identified Peptides Mapping Annotated Nanobody Sequences with MS coverage • CDR regions are identified based on approximate position in the sequence and the presence of specific leading and trailing amino acids. • Nanobody sequences ranked based on: MS coverage and length of individual CDR regions with CDR3 carrying highest weight; overall coverage including scaffold region; HT-Seq counts. • Nanobody sequences grouped by CDR3. One sequence is assigned to a group where its hamming distance to an existing member is 1. Ranking Ranked Nanobody Lists Grouping Ranked Nanobody Groups
Nanobody Production Scheme Sequence of Discovered Nanobody Candidates Gene synthesis & Codon optimization Expression Vector Cloning MAQVQLVESGGGLVQAGGSLRLSCVASGRTFSGYAMGWFRQTPGREREAVAAITWSAHSTYYSDSVKDRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS ~ $100 / sequence Transformation E.coliExpression One-Step Purification ~ 2 mg / 1 L
Using Anti-GFP Nanobodies GFP Homemade Nanobody
Creating Super-high-affinity Reagent Against GFP GFP: Clone A Clone B KD = 0.7 nM Overlay KD = 16 nM GFP Nano Nano Super-high-affinity KD = 0.03 nM
Central Dogma of Molecular Biology Transcription Replication Translation Modification P
Central Dogma of Molecular Biology Transcription Replication Translation Modification Functional Gene Products P