1 / 29

Tools for BioInformatics

Tools for BioInformatics. Eileen Kraemer Computer Science Dept. The University of Georgia. Sequence data. Types of Tools. Lab samples. Production Sequencing Software . Databases, Database Search Tools. Production Sequencing Software.

Mercy
Download Presentation

Tools for BioInformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tools for BioInformatics Eileen Kraemer Computer Science Dept. The University of Georgia

  2. Sequence data Types of Tools Lab samples Production Sequencing Software Databases, Database Search Tools

  3. Production Sequencing Software • used throughout the sequencing procedure from preparation of the DNA through to the finishing of clones.

  4. Example: Sanger Centre,Shotgun Sequencing of typical human clone • Data collection • Transfer to UNIX • Gel image processing • Sequence pre-processing • DNA Fragment Assembly • Editing • Finishing Services • Quality Control and Assesment

  5. Databases • Swiss-Prot • EMBL • Entrez • GDB • GenBank • GSDB • PDB • & more -- see links at: http://www.public.iastate.edu/~pedro/rt_1.html

  6. Species-specific Databases • See: http://genetics.about.com for both: • Non-human and human genome projects • Examples: • PomBase is a compilation of data relating to the organism Schizosaccharomyces pombe • Wormpep predicted proteins from the C. elegans genome sequencing project.

  7. Annotation Tools • Annotation of sequences with info such as homologies to known genes, possible gene locations, gene signals such as promoters, etc. • Example: Genotator (Nomi Harris) -- developing a workbench for automatic sequence annotation and annotation viewing and editing. The goal is to run a series of sequence analysis tools and display the results in such a way that the various predictions can be compared, and researcher makes decision of what to include.

  8. Database Software • ACEDB is an acronym for "A Caenorhabditis elegans DataBase". It can refer to a database and data concerning the nematode C. elegans, or to the database software alone. • Other groups may adapt existing, or create own. For example, David Hall’s workflow project at UGA for Neurospora

  9. Sequence Function Structure Types of Tools

  10. Gene Prediction • Caution: accuracy <= ~ 70% • Good review: Snyder and Stormo, (chapter 11 of the book Nucleic Acid and Protein Sequence Analysis: A Practical Approach, second edition, 1994. )

  11. Gene Prediction • GRAIL(Xgrail, JavaGrail, etc.) • Geneid • Netgene • GenMark • Fexon, Hexon • GENSCAN • xpound • Genefinder (University of Washington)

  12. GRAIL • Predicts coding regions • Uses a neural network which combines a series of coding prediction algorithms. • recognizes coding potential within a fixed size (100 base) window; evaluates coding potential without looking for additional features • later versions incorporate additional info • human and other species

  13. GeneMark • Based on inhomogeneous Markov models • predicts coding and non-coding regions based on statistical patterns in dinucleotide frequences … more next week from Mark B.

  14. Sequence Alignment • Pairwise alignments • Multiple sequence alignments

  15. Pairwise Alignments • SIM (Protein only) - k best non-intersecting alignments (EXPASY) • ALIGN - optimal global alignment with no short-cuts (EERIE) • LALIGN - calculates the N-best local alignments (EERIE) • LFASTA - local similarity searches showing local alignments (EERIE) • BLAST 2 - local alignment using BLAST (NCBI) • LAP2 - local DNA to protein alignment with LAP2 (MTU)

  16. Multiple Sequence Alignments • ClustalW 1.7 (DNA/Protein) - Global progressive (BCM) • CAP Sequence Assembly (DNA) - Contig Assembly • MAP (DNA/Protein) - Global progressive in linear space • PIMA 1.4 (Protein only) - Pattern-Induced (local) Multiple Alignment (BCM) • MSA 2.1 (Protein only) - Near-optimal sum-of-pairs global (WashU) • BLOCK MAKER (Protein only) - Finds conserved blocks in seq sets (FHCRC)ClustalW 1.7 (DNA/Protein) - Global progressive (BCM) • MEME 2.2 (DNA/Protein) - Multiple EM for Motif Elicitation (SDSC)

  17. Similarity Searching • BLAST -- (BLASTP, TBLASTN, etc.) • a nucleotide or protein sequence sent to the BLAST server is compared against and a summary of matches is returned to the user. • allows all combinations of DNA or protein query sequences with searches against DNA or protein databases:

  18. BLAST variations • blastp compares an amino acid query sequence against a protein sequence database. • blastn compares a nucleotide query sequence against a nucleotide sequence database. • blastx compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. • tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands). • tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

  19. Sequence Function Structure Types of Tools

  20. Protein Structure Prediction • Ab initio -- based on energy minimization • fold recognition -- sequence -> secondary structure, then align secondary structures with corresponding secondary structures in related proteins, etc. • statistical -- based on “hidden patterns”; similar patterns -> similar structure

  21. Protein Secondary Structure Prediction • Coils - prediction of coiled coil regions • nnPredict - uses a 2 layer neural network • PSSP / SSP - segment-oriented prediction • PSSP / NNSSP - nearest-neighbor prediction • SAPS - statistical analysis of protein sequences • Paircoil - coiled coil regions of pairwise residue correlations • Protein Hydrophilicity /Hydrophobicity • SOPM - self optimized prediction method

  22. Sequence Function Structure Types of Tools

  23. Protein Function Prediction • Pfam - • groups of similar function proteins aligned and HMMs generated for each “cluster” • HMM generated for unknown function protein and compared to HMMs of known proteins for predicted function classification

  24. Pfam components • PROTEIN HMM SEARCH - Analyze a protein query sequence to find Pfam domain matches. • DNA HMM SEARCH - Analyze a DNA query sequence to find Pfam domain matches. (Uses the GeneWise server at the Sanger Centre.) • BROWSE PFAM - View Pfam annotation and alignments. • TEXT SEARCH - Query Pfam by keywords. • BROWSE SWISSPFAM - View the domain organization of any SWISSPROT/TrEMBL sequence according to Pfam.

  25. Types of Tools Across organisms … Phylogeny Reconstruction Sequence Sequence Sequence Sequence

  26. Phylogeny Reconstruction • Construct evolutionary trees based on divergences that occur in related sequences • parsimony, minimum distance, etc. • parsimony -- construct tree so that number of mutation events is minimized • PHYLIP, PAUP, others, some interactive

  27. Visualization Tools • Database viewers • Sequence viewers • Molecular viewers

  28. Physical Mapping Software • used to physically locate genetic markers. • FPC Software for FingerPrinting Contigs. • Image 3.x Software for processing fingerprint gel images. • RHServer This web interface positions one or more markers on the 1998 International Gene Map (GB4). • SAM System for Assembling Markers. SAM takes as input a set of clones and their associated markers, and outputs a partially ordered marker map. • Z-RHMAPPER Extensions to the RHMAPPER (Whitehead) Radiation Hybrid Mapping Package.

  29. Good Resources • Pedro’s BioMolecular Research http://www.public.iastate.edu/~pedro/rt_1.html • BCM pages www.hgsc.bcm.tmc.edu/SearchLauncher/index.html • Sanger Center www.sanger.ac.uk/Software/Sequencing/overview.shtml • Mining Co. Web Site • genetics/miningco.com • & many others

More Related