1 / 31

Introduction of Genome Research

Introduction of Genome Research. Bioinformatics Research Center Institute of Biomedical S ciences ACADEMIA SINICA. 莊樹諄. www.sinica.edu.tw/~trees/bioinformatics E-mail: trees@gate.sinica.edu.tw. Introduction. Outline. Introduction. Some Research Topics. Related Links and Resources.

yukio
Download Presentation

Introduction of Genome Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction of Genome Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊樹諄 www.sinica.edu.tw/~trees/bioinformatics E-mail: trees@gate.sinica.edu.tw 中研院生物資訊中心 (BRC)

  2. Introduction Outline • Introduction • Some Research Topics • Related Links and Resources • Bioinformation Research Center (BRC) 中研院生物資訊中心(BRC) 2 90/4/9 pm

  3. Chromosome 中研院生物資訊中心(BRC) 3 90/4/9 pm

  4. DNA Sequence Gene 5‘ 3’ Exon(coding regions) Intron      mRNA cDNA Complement DNA 5‘UTR 3’UTR ORF Introduction DNA RNA Protein Function 4 90/4/9 pm

  5. Phosphoric acid(磷酸) • Deoxyribose (去氧核糖) • Nitrogenous base (含氮鹽基) • Purines : • Pyrimidine : Adenine (A, 腺嘌呤) Guanine (G, 鳥糞嘌呤) Cytosine (C, 胞嘧啶) Thymine (T, 胸腺嘧啶) Introduction DNA  nucleotide acid(核苷酸) • Nitrogenous base (含氮鹽基) Nitrogenous base (含氮鹽基) • DNA sequence: A, C, G, T --- 4 letters • RNA sequence: A, C, G, U (Uracil, (U), 尿嘧啶) --- 4 letters 中研院生物資訊中心 (BRC)

  6. 5‘ 3‘ TGGCACACCGTCACGTGTCCATAAACCGGTATCTGT 3‘ 5‘ Codon ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA Amino acid ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA 43 = 64  20 中研院生物資訊中心(BRC) 6 90/4/9 pm

  7. Second position First Position (5’) Third Position (3’) U C A G U C A G Phe (F) Ser (S) Tyr (Y) Cys (C) Phe (F) Ser (S) Tyr (Y) Cys (C) Leu (L)Ser (S) StopStop Leu (L)Ser (S) StopTrp (W) U Leu (L) Pro (P) His (H) Arg (R) Leu (L) Pro (P) His (H) Arg (R) Leu (L) Pro (P) Gln (Q) Arg (R) Leu (L) Pro (P)Gln (Q) Arg (R) U C A G C Ile (I) Thr (T) Asn (N) Ser (S) Ile (I) Thr (T) Asn (N) Ser (S) Ile (I) Thr (T) Lys (K) Arg (R) Met (M) Thr (T)Lys (K) Arg (R) U C A G A Val (V) Ala (A) Asp (D) Gly (G) Val (V) Ala (A) Asp (D) Gly (G) Val (V) Ala (A) Glu (E) Gly (G) Val (V) Ala (A)Glu (E) Gly (G) U C A G G Introduction DNA sequence: A, C, G, T --- 4 letters RNA sequence: A, C, G, U --- 4 letters Amino acid sequence: --- 20 letters StopStop Stop Met (M) 中研院生物資訊中心 7 90/4/9 pm

  8. Introduction • 6-frame translations aagctgatcgatcgattttagatagagaaaaaact K L I D R F - I E K K aagctgatcgatcgattttagatagagaaaaaact S - S I D F R - R K N aagctgatcgatcgattttagatagagaaaaaact A D R S I L D R E K T agttttttctctatctaaaatcgatcgatcagctt S F F S I - N R S I S agttttttctctatctaaaatcgatcgatcagctt V F S L S K I D R S A agttttttctctatctaaaatcgatcgatcagctt F F L Y L K S I D Q L 5'3' Frame 1 5'3' Frame 2 5'3' Frame 3 3'5' Frame 1 3'5' Frame 2 3'5' Frame 3 中研院生物資訊中心 8 90/4/9 pm

  9.     Introduction • Gene : Exon & Intron • cDNA Database • EST (Expressed Sequence Tags) DB • HGI (Human Gene Index) DB • UniGene DB 中研院生物資訊中心 (BRC)

  10. Draft 61.0 % Finished 32.5% Total 93.5 % Introduction Human Genome Sequencing (2/11/2001) 中研院生物資訊中心 10 90/4/9 pm

  11. gap Chromosome 中研院生物資訊中心 12 90/4/9 pm

  12. Introduction • Genome Database -- 3×109 HTGS (High Throughput Genomic Sequences) • Phase 0: Single-few pass reads of a single clone (not contigs) • Phase 1: Unfinished, may be unordered, unoriented contigs, with gaps. • Phase 2: Unfinished, ordered, oriented contigs, with or without gaps. • Phase 3: Finished, no gaps (with or without annotations). 中研院生物資訊中心 (BRC)

  13. Size range (kb) Contigs Aggregate size (kb) Percent of total <30 kb 44 666 0.1% 30-100 479 32172 4.9% 100-250 1628 260933 39.9% 250-500 421 144518 22.1% 500-1000 145 98623 15.1% >1000 kb 43 116557 17.8% total 2760 653471 100.0% Introduction 中研院生物資訊中心 (BRC)

  14. Some Research Topics Outline • Introduction • Some Research Topics • Related Links and Resources • Bioinformation Research Center (BRC) 中研院生物資訊中心 14 90/4/9 pm

  15. Gene number of human • Early estimate: 60,000~100,000 • By Ch22: ~45,000 • By EST: ~140,000 • By Ch22 & HGI-5.0: ~120,000 (1.38-fold gene rich and extremely cleaning and assemble process) • By 2/16/2001 Science: ~ 30,000 • There are many more genes awaiting discovery within the sequence 15 中研院生物資訊中心 90/4/9 pm

  16. Alternative Splicing • Human Diversity Some Research Topics • Genome Annotation • Gene Signature 中研院生物資訊中心 (BRC)

  17. Human Genome: 3x109 bp Genomic Sequence Variations Single Nucleotide Polymorphism (SNP) 106-107 gSNP Gene Coding Region Non-coding Region Inter-genic Region cSNP rSNP iSNP nSNP Functional Variants (5%) 17 中研院生物資訊中心 90/4/9 pm

  18. iSNP cSNP nSNP rSNP Gene-based SNPs Gene 2 Gene 1 exon P2 P1 Intron 18 中研院生物資訊中心 90/4/9 pm

  19. c g C H Non-synonymous (tgt C, tgg W) Synonymous (tgt tgc C) Silent C: polar W: nonpolar (Non-conservative) Human Diversity • SNP (Single Nucleotide Polymorphism) • cSNP (Coding SNP) acccgctcgtcgct tgtcggctaattgcgcgaat C tatY Y: polar (Conservative) 中研院生物資訊中心 (BRC)

  20. Human Diversity • SNP (Single Nucleotide Polymorphism) • cSNP (Coding SNP) Purines (A/G) & Pyrimidines (C/T) Transition: AG, CT Transversion: A/GC/T CD-CV: common diseases - common variants. 中研院生物資訊中心 (BRC)

  21. Pseudogene • Ch22: 134 pseudogenes (134/679  19%) • Pseudogene • Processed pseudogene (cDNAgenebank, 82% of 134 pseudogenes) • Single block • Lack characteristic intron – exon structure • Spliced pseudogene – segments of duplicated gene families 中研院生物資訊中心 (BRC)

  22. Telomere Centromere Mini Satellite (Variable Number Tandem Repeats (VNTR)): 15~100 bp Micro Satellite (Short Tandem Repeats (STR)): 2~5 bp α-Satellite: at centromere Telomere Repeats SINEs (Short Interspersed Elements): Alu, MIR, MER, LTR, PTR,  LINEs (Long Interspersed Elements): LINE1, LINE2,  Repetitive Sequence Tandem Repeats Interspersed Repeats 中研院生物資訊中心 (BRC)

  23. Related Links and Resources Outline • Introduction • Some Research Topics • Related Links and Resources • Bioinformation Research Center (BRC) 中研院生物資訊中心 23 90/4/9 pm

  24. Related Links and Resources • TIGR(The Institute for Genomic Research) http://www.tigr.org/ • NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ • Sanger --- http://www.ensembl.org/ • Japan Science and Technology Corporation - Advanced Lifescience Information System JST - ALIS ) http://www-alis.tokyo.jst.go.jp/HGS/top.pl 中研院生物資訊中心 (BRC)

  25. Gene Prediction Programs • http://www.bork.embl-heidelberg.de/genepredict.html • http://linkage.rockefeller.edu/wli/gene/programs.html • ExPASy_Traslate Tool http://expasy.nhri.org.tw/tools/dna.html • Bioinformatics Research Center, Academia Sinica http://www.sinica.edu.tw/~trees/bioinformatics/bioinformatics.html Related Links and Resources 中研院生物資訊中心 (BRC)

  26. Bioinformation Research Center (BRC) Outline • Introduction • Some Research Topics • Related Links and Resources • Bioinformation Research Center (BRC) 中研院生物資訊中心 26 90/4/9 pm

  27. Lab. 3 Bioinformatics Research Center Lab. 2 Lab. 1 Firewall Local Server 27 中研院生物資訊中心 90/4/9 pm

  28. cDNA database Genome Sequences: Chromosome1~22, X,Y CRASA: Complexity Reduction Algorithm for Sequence Analysis • Genome Annotation • Alternative Splicing • SNP (Single Nucleotide Polymorphism) 中研院生物資訊中心 (BRC)

  29. CRASA: Complexity Reduction Algorithm for Sequence Analysis • Environment • PC Clustering: 10 PC (PIII-667), 1 Server • Win2000 (NT) • HD: IDE support RAID • DB2 • Algorithm • Progressive Processing: Pyramid Structure • Pattern Match • Direct Search • Parallel Processing 中研院生物資訊中心 (BRC)

  30. Sorting & assembling: CPU bound Network I/O bound HD I/O bound • Parallel Processing Server p1 p2 p3 query 30 中研院生物資訊中心 90/4/9 pm

  31. Bioinformatics Computer Science Biology

More Related