510 likes | 583 Views
BCB 444/544. Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22. Hyperlink. BCB 444/544 - Website. http://bindr.gdcb.iastate.edu/bcb544 Updated Syllabus Lecture & Lab Schedules (with Homework Assignments) Lecture PPTs & PDFs
E N D
BCB 444/544 Finish: Lecture 1- What is Bioinformatics? Lecture 2 Biological Databases & ISU Resources #2_Aug22 BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Hyperlink BCB 444/544 - Website • http://bindr.gdcb.iastate.edu/bcb544 • Updated Syllabus • Lecture & Lab Schedules • (with Homework Assignments) • Lecture PPTs & PDFs • Lab Exercises • Practice Exams • Grading Policy • Project Guidelines, etc. • Links • Check regularly for updates! BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
BCB 444/544 - Computer Lab Meets in 1304 MBB every week EXCEPT this week: Current schedule: Thurs 1-3 PM Conflicts? See Drena 1st Lab meets in Library Rm 32 BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Assignment #1: Tell us about you Due: Today - Wed, Aug 22 1- Complete HW1_Aug20for Drena BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Required Reading (must read before lecture) • Wed Aug 22 - for Lecture #2 • Xiong Textbook: • Chp 1 - Introduction • Chp 2 - Biological Databases • Thurs Aug 23 - for Lab #1: • Literature Resources for BioinformaticsAndrea Dinkelman, see Lab Schedule for URL • Fri Aug 24 • Genomics & Its Impact on Science & Society: Genomics & Human Genome Project Primer • see Lecture Schedule for URL BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Assignment #2 (& for Fun): DNA Interactive "Genomes" http://www.dnai.org/c/index.html A tutorial on genomic sequencing, gene structure, genes prediction Howard Hughes Medical Institute (HHMI) Cold Spring Harbor Laboratory (CSHL) • Take the Tour • Read about the Project • Do some Genome Mining with: • Nothing to turn in - just do it! BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
#1- What is Bioinformatics? (cont.) Xiong: Chp 1 1 Introduction What Is Bioinformatics? Goal Scope Applications Limitations New Themes Further Reading BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
1st Draft Human Genome: "Finished" in 2001 BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Eric Green
Human Genome Sequencing • Two approaches: • Public (government) - International Consortium • (mainly 6 countries, NIH-funded in US) • Hierarchical cloning & BAC-to-BAC sequencing • Map-based assembly • Private (industry) - Celera, Craig Venter, CEO • Whole genome random "shotgun" sequencing • Computational assembly • (took advantage of public maps & sequences, too) Guess which human genome they sequenced? Craig's • ~ 20,000 (Science, May 2007) How many genes? BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Public Sequencing: International Consortium BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Eric Green
Comparison of Sequenced Genome Sizes Plants? Many have much larger genomes than human! BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Eric Green
"Complete" Human Genome Sequence: What next? BCB 444/544 F07 ISU Dobbs #2 - Biological Databases from Eric Green
Next Step after the Complete Sequence? Understanding Gene Function on a Genomic Scale • Expression Analysis • Structural Genomics • Protein Interactions • Network Analysis • Systems Biology • Evolutionary Implications of: • Intergenic Regions as "Gene Graveyard" • Introns & Exons BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
How can we begin to understand the complete Human Genome Sequence? BCB 444/544 F07 ISU Dobbs #2 - Biological Databases from Eric Green
Comparative Genomics: Compare entire genomes BCB 444/544 F07 ISU Dobbs #2 - Biological Databases from Eric Green
Comparing Genomes: Identifying functionalelements BCB 444/544 F07 ISU Dobbs #2 - Biological Databases from Eric Green
Gene Expression Data: the Transcriptome MicroArray Data • Yeast Expression Data: • Levels for all 6,000 genes! • Investigate how all genes respond to changes in environment or, in humans, e.g., how patterns of RNA expression change in normal vs cancerous tissue ISU's Biotechnology Facilities include state-of-the-art Microarray Instrumentation BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
Other "Omes"Proteome, Metabolome, Glycome, etc. ISU has state-of-the-art Proteomics Instrumentation ISU's has state-of-the-art Metabolomics Instrumentation BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Systems Biology seeks to integrate all of these to explain the complex behaviors of whole systems (cells, organisms, ecosystems) How are "Omes" related? BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Molecular Biology Information:Integrating Data Understanding the function of genomes requires integration of many diverse and complex types of information: • Metabolic pathways • Regulatory networks • Whole organism physiology • Evolution, phylogeny • Environment, ecology • Literature (MEDLINE) BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
Other Genome-Scale Experiments Systematic Knockouts: Make "knockout" (null) mutations in every gene - one at a time - and analyze the resulting phenotypes! For yeast: 6,000 KO mutants! 2-hybrid Experiments: For each (and every) protein, identify every other protein with which it interacts! For yeast: 6000 x 6000 / 2 ~ 18M interactions!! BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
Storing & Analyzing Geonomic Information:Exponential Growth of Data Coupled with Development of Fast Computer Technology • Increases in computer speed & starage capacity have been dramatic • Improved computing resources & more efficient algorithms have been driving forces in Bioinformatics & Computational Biology ISU's supercomputer "CyBlue" is among 100 most powerful computers in the world! BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
Bioinformatics is born! & more Bioinformaticists are needed! (Internet picture adaptedfrom D Brutlag, Stanford) BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
Databases Building & querying object-oriented & relational DBs String Comparison Text search Alignment Significance statistics Patterns Finding Machine Learning Data Mining Statistics Linguistics Computational Geometry Robotics Graphics (surfaces, volumes) Comparison & 3D matching Simulation & Modeling Newtonian mechanics Electrostatics Numerical algorithms Simulation Network modeling Population modeling “Informatics” techniques used in Bioinformatics BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Challenges in Organizing Information:Redundancy and Multiplicity • Different protein sequences can assume the same 3-D structure • Organisms have many similar genes with redundant functions • A single gene may have several different functions • Genes & proteins function in complex genetic & regulatory pathways • How do we organize all this information so that we can make sense of it? Functional Genomics & Systems Biology: sequences <> motifs <> genes <> RNAs <> proteins <> structures <> functions <> expression levels <> pathways <> regulatory networks <> functional systems BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
One Strategy:Molecular Parts = Conserved Domains BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
"Parts List" approach to bike maintenance: Where are the parts located? Which are the common parts (bolt, nut,washer, spring, bearing)? Which are unique parts (cogs, levers)? How flexible and adaptable are parts mechanically? BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
World of macromolecular structures is also finite, providing a valuable simplification Global surveys of a finite set of parts from different perspectives Same logic for pathways, functions, sequence families, blocks, motifs.... H. sapiens ~ 20,000 genes ~ 2,000 folds T. pallidum ~ 2,000 genes Modified from Mark Gerstein BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
BUT, what actually happens inside cells or within whole organisms is very complex - providing a challenging complication ! Exploring the Virtual Cell at ISU Virtual Cell projects elsewhere... NCBI's Bookshelf - a great resource! BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
So, having a list of parts is not enough! BIG QUESTION? How do parts work together to form a functional system? SYSTEMS BIOLOGY What is a system? Macromolecular complex, pathway, network, cell, tissue, organism, ecosystem… BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
So, this is Bioinformatics What is it good for? Just a few examples… BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Designing drugs • Understanding how proteins bind other molecules • Structural modeling & ligand docking • Designing inhibitors or modulators of key proteins Figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR Group Web page at Scripps, and from Computational Chemistry Page at Cornell Theory Center). BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
Finding homologs of "new" human genes BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
Finding WHAT?Homologs - "same genes" in different organisms • Human vs Mouse vs Yeast • Much easier to do experiments on yeast to determine function • Often, function of an ortholog in at least one organism is known Best Sequence Similarity Matches to Date Between Positionally Cloned Human Genes and S. cerevisiae Proteins Human Disease MIM # Human GenBank BLASTX Yeast GenBank Yeast Gene Gene Acc# for P-value Gene Acc# for Description Human cDNA Yeast cDNA Hereditary Non-polyposis Colon Cancer 120436 MSH2 U03911 9.2e-261 MSH2 M84170 DNA repair protein Hereditary Non-polyposis Colon Cancer 120436 MLH1 U07418 6.3e-196 MLH1 U07187 DNA repair protein Cystic Fibrosis 219700 CFTR M28668 1.3e-167 YCF1 L35237 Metal resistance protein Wilson Disease 277900 WND U11700 5.9e-161 CCC2 L36317 Probable copper transporter Glycerol Kinase Deficiency 307030 GK L13943 1.8e-129 GUT1 X69049 Glycerol kinase Bloom Syndrome 210900 BLM U39817 2.6e-119 SGS1 U22341 Helicase Adrenoleukodystrophy, X-linked 300100 ALD Z21876 3.4e-107 PXA1 U17065 Peroxisomal ABC transporter Ataxia Telangiectasia 208900 ATM U26455 2.8e-90 TEL1 U31331 PI3 kinase Amyotrophic Lateral Sclerosis 105400 SOD1 K00065 2.0e-58 SOD1 J03279 Superoxide dismutase Myotonic Dystrophy 160900 DM L19268 5.4e-53 YPK1 M21307 Serine/threonine protein kinase Lowe Syndrome 309000 OCRL M88162 1.2e-47 YIL002C Z47047 Putative IPP-5-phosphatase Neurofibromatosis, Type 1 162200 NF1 M89914 2.0e-46 IRA2 M33779 Inhibitory regulator protein Choroideremia 303100 CHM X78121 2.1e-42 GDI1 S69371 GDP dissociation inhibitor Diastrophic Dysplasia 222600 DTD U14528 7.2e-38 SUL1 X82013 Sulfate permease Lissencephaly 247200 LIS1 L13385 1.7e-34 MET30 L26505 Methionine metabolism Thomsen Disease 160800 CLC1 Z25884 7.9e-31 GEF1 Z23117 Voltage-gated chloride channel Wilms Tumor 194070 WT1 X51630 1.1e-20 FZF1 X67787 Sulphite resistance protein Achondroplasia 100800 FGFR3 M58051 2.0e-18 IPL1 U07163 Serine/threoinine protein kinase Menkes Syndrome 309400 MNK X69208 2.1e-17 CCC2 L36317 Probable copper transporter BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
Comparative Genomics: Genome/Transcriptome/Proteome/Metabolome Databases, statistics • Occurrence of a specific genes or features in a genome • How many kinases in yeast? • Compare Tissues • Which proteins are expressed in cancer vs normal tissues? • Diagnostic tools • Drug target discovery BCB 444/544 F07 ISU Dobbs #2 - Biological Databases Modified from Mark Gerstein
Molecular Recognition:Analyzing & Predicting Macromolecular Interfaces (in DNA, RNA & protein complexes) Drena Dobbs, GDCBJae-Hyung Lee Michael Terribilini Jeff Sander Pete ZabackVasant Honavar, Com S Feihong Wu Cornelia Caragea Fadi Towfic Jivo SinapovRobert Jernigan, BBMB Taner Sen Andrzej KloczkowskiKai-Ming Ho, Physics BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Designing Zinc Finger DNA-binding Proteins to Recognize Specific Sites in Genomic DNA Drena Dobbs, GDCBJeff Sander Pete ZabackDan Voytas, GDCB Fengli FuLes Miller, ComSVasant Honavar, ComSKeith Joung, Harvard BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Structure & Function of Human Telomerase:Predicting structure & functional sites in a clinically important but "recalcitrant" RNP Cell Biologist: Biochemist: Imagined structure: www.intl-pag.org/ www.chemicon.com Lingner et al (1997) Science 276: 561-567. How would a systems biologist study telomerase? BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
SUMMARY: #1- What is Bioinformatics? BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
#2- Biological Databases Xiong: Chp 2 2 Introduction to Biological Databases What Is a Database? Types of Databases Biological Databases Pitfalls of Biological Databases Information Retrieval from Biological Databases Summary Further Reading BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
What is a Database? OK: skip we'll skip that! Duh!! BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Types of Databases 3 Major types of electronic databases: 1- Flat files- simple text files no organization to facilitate retrieval 2- Relational- data organized as tables ("relations") shared features among tables allows rapid search 3- Object-oriented- data organized as "objects" objects associated hierarchically BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Biological Databases Currently - all 3 types, but MANY flat files What are goals of biological databases? 1- Information retrieval 2- Knowledge discovery Important issue: Interconnectivity BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Types of Biological Databases • 1- Primary • "simple" archives of sequences, structures, images, etc. • raw data, minimal annotations, not always well curated! • 2- Secondary • enhanced with more complete annotation of sequences, structures, images, etc. • usually curated! • 3- Specialized • focused on a particular research interest or organism • usually - not always - highly curated BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Examples of Biological Databases • 1- Primary • DNA sequences • GenBank - US • European Molecular Biology Lab - EMBL • DNA Data Bank of Japan - DDBI • Structures (Protein, DNA, RNA) • PDB - Protein Data Bank • NDB - Nucleic Acid Databank BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Examples of Biological Databases • 2- Secondary • Protein sequences • Swiss-Prot, TreEMBL, PIR • these recently combined into UniProt • 3- Specialized • Species-specific (or "taxonomic" specific) • Flybase, WormBase, AceDB, PlantDB • Molecule-specific,disease-specific BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Pitfalls of Biological Databases Errors! & Lack of documentation re: quality or reliability of data Limited mechanisms for "data checking" or preventing propagation of errors (esp. annotation errors!!) Redundancy Inconsistency Incompatibility (format, terminology, data types, etc.) BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Information Retrieval from Biological Databases • 2 most popular retrieval systems: • ENTREZ - NCBI • will use a LOT - Introduced in Lab 1 • SRS - Sequence Retrieval Systems - EBI • will use less, similar to ENTREZ • Both: • Provide access to multiple databases • Allow complex queries BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
Web Resources: Bioinformatics & Computational Biology • Wikipedia: Bioinformatics • NCBI - National Center for Biotechnology Information • ISCB - International Society for Computational Biology • JCB - Jena Center for Bioinformatics • UBC - Bioinformatics Links Directory • UWa - BioMolecules • Pitt - OBRC Online Bioinformatics Resources Collection • ISU - Bioinformatics Resources - Andrea Dinkelman • ISU - YABI = "Yet Another Bioinformatics Index" (from BCB Lab at ISU) BCB 444/544 F07 ISU Dobbs #2 - Biological Databases
ISU Resources & Experts ISU Research Centers & Graduate Training Programs: • BCB Lab - (Student-Led Consulting & Resources) • BCB - Bioinformatics & Computational Biology • LH Baker Center - Bioinformatics & Biological Statistics • CIAG - Center for Integrated Animal Genomics • CILD - Computational Intelligence, Learning & Discovery • NSF IGERT Training Grant - Computational Molecular Biology ISU Facilities: • Biotechnology - Instrumentation Facilities • PSI - Plant Sciences Institute • PSI Centers BCB 444/544 F07 ISU Dobbs #2 - Biological Databases