1 / 19

Genome databases and webtools for genome analysis

Genome databases and webtools for genome analysis. Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit sites used in lab exercise #2. Major components of NCBI. GenBank PubMed Entrez BLAST Conserved Domain Database (CDD)

lorand
Download Presentation

Genome databases and webtools for genome analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome databases and webtools for genome analysis • Become familiar with microbial genome databases • Use some of the tools useful for analyzing genome • Visit sites used in lab exercise #2

  2. Major components of NCBI • GenBank • PubMed • Entrez • BLAST • Conserved Domain Database (CDD) • Cluster of orthologous groups (COGS) • OMIM

  3. GenBank • Database of DNA and protein sequences • Searchable • Caution: Sequences deposited by the community, not curated for accuracy. • RefSeq - verified by NCBI.

  4. Example of a GenBank record

  5. BLAST • Basic Local Alignment Search Tool • Comparing nucleotide sequences and protein sequences • Microbial specific BLAST page • Focus of a future lab

  6. OMIM • Online Mendelian Inheritance in Man. • Database that links diseases and genes

  7. TIGR • Comprehensive microbial resource (CMR). • Many genomes. • Tools to analyze genomes.

  8. SubtiList • Website for B. subtilis genome. • Features • Annotated genes • Gene region display • Updated similarity searches for every protein • BLAST and pattern search capabilities • Links to journal articles and protein databases

  9. RDP • Ribosomal database project • Curated at MSU • Contains a compilation of all ribosomal DNA sequences (currently over 100,000). • Second database contains information regarding copy number of ribosomal RNA.

  10. KEGG • Kyoto Encyclopedia of Genes and Genomes • Often changing database of gene content, metabolic pathways, etc. • Excellent resource for reconstructing pathways in organism of interest.

  11. Genome sequencing and annotation Week 2 reading assignments - pages 65-79, 110-122. Boxes 2.1, 2.2 and 2.3. Don’t worry about the details of HMM. Hughes Functional Genomics Review.

  12. Sequencing - dideoxy method for DNA sequencing. • Methods for sequencing genomes. • Methods for finding and annotating genes in microbial genomes.

  13. Dideoxy sequencing (Sanger method) • Developed by Frederick Sanger (for which he won his second Nobel Prize in 1980).

  14. Two types of labeling • Radioactive • 32P, 35S • Run out each dideoxy base in a separate reaction, lane on a gel. • No longer used • Fluorescent • Four different fluorophores for each base • Can be mixed. • Chromatograms - GTSF

  15. Cycle sequencing

  16. Phred • Method for automated quality assessment of DNA sequence traces. • Variance in peak spacing in 7 peak window • Ratio of largest uncalled peak to smallest called peak in 7 and 3 peak windows. • Number of bases between current base and nearest unresolved base. • Phred score = 10 x (-log(P)). • Phred scores of 20 or higher are considered good calls. Why?

  17. Sequencing of genomes • Hierarchical or contig based sequencing • Clone smaller segments of the genome. • Labor intensive, slow • Not needed for sequencing microbial genomes • Shotgun method • Randomly clone and sequence 1.5-2 kb fragments of DNA. 5-10 fold coverage. • Computationally intensive.

  18. Finding genes in a genome sequence • What to look for? • Glimmer - HMM algorithm for identifying genes. (TIGR). • ORF finder - NCBI. • Most automated annotation engines have ORF finding capabilities. • Much more difficult in eukaryotic genomes.

More Related