1 / 81

Accurate Reconstruction of Molecular Phylogenies Using United Codon-aa Sequence Alignments — The molecular clock has two

Accurate Reconstruction of Molecular Phylogenies Using United Codon-aa Sequence Alignments — The molecular clock has two hands Xiaolong Wang Ocean University of China Email: Xiaolong@ouc.edu.cn Website: www.DNAPlusPro.com. From the Cell to Protein Machines.

kaleb
Download Presentation

Accurate Reconstruction of Molecular Phylogenies Using United Codon-aa Sequence Alignments — The molecular clock has two

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accurate Reconstruction of Molecular Phylogenies Using United Codon-aa Sequence Alignments —The molecular clock has two hands Xiaolong Wang Ocean University of China Email: Xiaolong@ouc.edu.cn Website: www.DNAPlusPro.com

  2. From the Cell to Protein Machines

  3. NATURE | VOL 409 | 15 FEBRUARY 2001 | • SCIENCE| Vol 291 | 16 FEBRUARY 2001

  4. Margulies, et al,pyrosequencing, Nature, 2005.

  5. 已完成基因组测序的物种(部分) Rickettsiaprowazekii Ureaplasmaurealyticum Drosophila melanogaster Helicobacter pylori Bacillus subtilis human Buchnerasp. APS Escherichia coli Arabidopsis Thermotogamaritima Caenorhabitiselegans Thermoplasmaacidophilum Mouse Rat rat Borreliaburgorferi NeisseriameningitidisZ2491 Plasmodium falciparum Borreliaburgorferi Mycobacterium tuberculosis Aquifexaeolicus

  6. How many characters are in the “Heaven Book”? 3*109 10,000 books 1 book 100 pages 1 page 3,000 characters CCGGTCTCCCCGCCCGCGCGCGAAGTAAAGGCCCAGCGCAGCCCGCGCTCCTGCCCTGGGGCCTCGTCTTTCTCCAGGAAAACGTGGACCGCTCTCCGCCGACAGTCTCTTCCACAGACCCCTGTCGCCTTCGCCCCCCGGTCTCTTCCGGTTCTGTCTTTTCGCTGGCTCGATACGAACAAGGAAGTCGCCCCCAGCGAGCCCCGGCTCCCCCAGGCAGAGGCGGCCCCGGGGGCGGAGTCAACGGCGGAGGCACGCCCTCTGTGAAAGGGCGGGGCATGCAAATTCGAAATGAAAGCCCGGGAACGCCGAAGAAGCACGGGTGTAAGATTTCCCTTTTCAAAGGCGGGAGAATAAGAAATCAGCCCGAGAGTGTAAGGGCGTCAATAGCGCTGTGGACGAGACAGAGGGAATGGGGCAAGGAGCGAGGCTGGGGCTCTCACCGCGACTTGAATGTGGATGAGAGTGGGACGGTGACGGCGGGCGCGAAGGCGAGCGCATCGCTTCTCGGCCTTTTGGCTAAGATCAAGTGTAGTATCTGTTCTTATCAGTTTAATATCTGATACGTCCTCTATCCGAGGACAATATATTAAATGGATTGATCAATCCGCTTCAGCCTCCCGAGTAGCTGGGACTACAGACGGTGCCATCACGCCCAGCTCATTGTTGATTCCCGCCCCCTTGGTAGAGACGGGATTCCGCTATATTGCCTGGGCTGGTGTCGAACTCATAGAACAAAGGATCCTCCCTCCTGGGCCTGGGCGTGGGCTCGCAAAACGCTGGGATTCCCGGATTACAGGCGGGCGCACCACACCAGGAGCAAACACTTCCGGTTTTAAAAATTCAGTTTGTGATTGGCTGTCATTCAGTATTATGCTAATTAAGCATGCCCGGTTTTAAACCTCTTAAAACAACTTTTAAAATTACCTTTCCACCTAAAACGTTAAAATTTGTCAAGTGATAATATTCGACAAGCTGTTATTGCCAAACTATTTTCCTATTTGTTTCCTAATGGCATCGGAACTAGCGAAAGTTTCTCGCCATCAGTTAAAAGTTTGCGGCAGATGTAGACCTAGCAGAGGTGTGCGAGGAGGCCGTTAAGACTATACTTTCAGGGATCATTTCTATAGTGTGTTACTAGAGAAGTTTCTCTGAACGTGTAGAGCACCGAAAACCACGAGGAAGAGAGGTAGCGTTTTCATCGGGTTACCTAAGTGCAGTGTCCCCCCTGGCGCGCAATTGGGAACCCCACACGCGGTGTAGAAATATATTTTAAGGGCGCG

  7. 生物信息学 生物信息学家们面对的是堆集如山的DNA片段

  8. Genome sequences: • What to do? • Comparative genomics • Functional genomics • Structure biology • How useful? • Drug design • Personal genetics • Molecular breeding • Gene prediction and annotation • Non-coding RNA discovery • Molecular Phylogeny reconstruction • …

  9. I. Drug Designing • Understanding How Structures Bind Other Molecules (Function) • Designing Inhibitors • Docking, Structure Modeling Three-dimensional molecular structure is one of the foundations of structure-based drug design. Often, data are available for the shape of a protein and a drug separately, but not for the two together. Docking is the process by which two molecules fit together in 3D space.

  10. Many diseases are caused by genes

  11. 生物信息学与新药研制 基因 序列 表达 数据 数据 处理 关联 分析 确定 靶标 分子 • 设计 • 药物 现代药物研究是基于生物信息知识挖掘的过程

  12. FromFinding Homologsto drug design

  13. Proteininhibitors (Virusas an example) • attachment, entry and fusion inhibitors • DNA polymerase inhibitors • integrase inhibitors • interferons • maturation inhibitors • monoclonal antibodies • neuraminidase inhibitors • NS3 protease inhibitors • nucleoside reverse transcriptase inhibitors • protease inhibitors • reverse transcriptase inhibitors • RNA polymerase inhibitors

  14. Nucleic acid inhibitors (Antisense oligonucleotides or RNAi) • Targeting mRNA • Targeting microRNA • Targeting genomic DNA • Interfere RNA processing • Aptamers oligonucleotide or peptide molecules that bind to a specific target molecule

  15. Fomivirsen (Vitravene) — the first and only antisense antiviral drug approved by FDA $63.87 USD per injection Fomivirsen (ISIS 2922)

  16. SCIENCE VOL 327 8 JANUARY 2010

  17. Wang X, Gou D, Xu S-y (2010) Polymerase-Endonuclease Amplification Reaction (PEAR) for Large-Scale Enzymatic Production of Antisense Oligonucleotides. PLoS ONE 5(1): e8430. doi:10.1371/journal.pone.0008430

  18. Polymerase-Endonuclease Amplification Reaction (PEAR) for Enzymatic Production of Antisense Oligonucleotides

  19. Target X Annealing Probe X X’ R’ X’ Denaturation X’ R’ X’ X R X R X Denaturation Annealing X X’ R’ X’ R’ X’ Elongation X X R X X X’ Annealing X’ R’ X’ X’ R’ X’ X’ Annealing (slipping) Cleaving X R X Elongation dNTPs Taq polymerase Denaturation X’ R’ X’ Annealing Denaturation X R X X X PspGI X R X + Cleaving X’ R’ X’ X’ X’ X’ R’ X’

  20. Other potential applications of PEAR • (1). PEAR is a minimal DNA replication system, to study the origin and evolution of repetitive DNA in genome, as well as the origin and evolution of genetic material and life. • (2). The repeat PEAR product DNA can be transferred into cells or organisms to study • Function of repeat DNA sequences. • Repeat Gene Sponges  • Repeat Gene Missile • Repeat Gene probe

  21. Genome sequences: • What to do? • Comparative genomics • Functional genomics • Structure biology • How useful? • Drug design • Personal genetics • Molecular breeding • Gene prediction and annotation • Non-coding RNA discovery • Molecular Phylogeny reconstruction • …

  22. 约600万年前开始,源自同一个祖先,人类和黑猩猩走上了不同的进化道路。600万年后的今天,科学家们另辟蹊径,通过对人类的亲戚———黑猩猩的基因组序列分析,并将其与人类的基因组序列相比较,来解答人类起源和进化过程中的问题。

  23. Use ClustalW to do a progressive MSA http://www2.ebi.ac.uk/clustalw/

  24. Use ClustalW to do a progressive MSA http://www.clustal.org/

  25. Use ClustalWor ClustalX to do a progressive MSA

  26. ClustalW

  27. Praline

  28. MUSCLE

  29. Probcons

  30. TCoffee

  31. CLUSTAL MUSCLE MAFFT ProbCons Praline TCOFFEE  

  32. United Codon-aa Sequence Alignment CodingDNAsequences atggggataaat … tga atgataaatagt … tga Translate PeptideSequences MG I N … * M I N S … * Combine Combined DNA-protein Sequences atgMgggGataIaatN … tga* atgMataIaatNagtS … tga* Align Combined alignment atgMgggGataIaatN---- … tga* atgMgggGataI ---- agtS…tga*

  33. 2A

  34. 2B

  35. S1AClustalW Thompson J.D., Higgins, D.G. and Gibson, T.J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680.

  36. S1B MAFFT • Katoh K, Kuma K, Toh H, Miyata T., 2005. Nucleic Acids Res. • KatohK, Misawa K, Kuma K, Miyata T., 2002. Nucleic Acids Res.

  37. S1C MUSCLE • RobertC. Edgar, 2004. MUSCLE. Nucleic Acids Res. 32(5): 1792–1797. • Robert C Edgar, 2004. MUSCLE. BMC Bioinformatics.5: 113.

  38. S1D T-coffee • Poirot O, O'Toole E, Notredame C. 2003. Tcoffee@igs,Nucleic Acids Res.31(13):3503-6. • NotredameC, Higgins DG, Heringa J. 2000. T-Coffee:J Mol Biol.302(1):205-17.

  39. S1E PRANK • Löytynoja A, Goldman N. 2005. ProcNatlAcadSci USA.102(30): 10557–10562. • Löytynoja A, Goldman N. 2008. Science, 320 (5883):1632-1635.

  40. S1G • Codon Alignment back translate

  41. S1F DNA+PRO

  42. S2C MUSCLE S2A ClustalW S2BMAFFT ENV • S2D T-coffee • S2E PRANK • S2F DNA+PRO

  43. gag env 3A Protein env gag • 3B • DNA+PRO

  44. env gag • 3C • Back translate • combined env • gag 3D Codon alignment

  45. env S3A Protein • gag env gag • S3B • DNA+PRO

  46. 4A Protein • 4B • Codon alignment • 4C • DNA+PRO BamHI homologs

  47. 5A Protein • 5B • Codon • Alignment • 5C • 5C • DNA+PRO SAUSA300_2431 homologs ARobust multi-gene phylogenetic tree

More Related