610 likes | 717 Views
Part II GO-Vocabulary of Genome. S. cerevisiae. D. melanogaster. Cells that normally survive. CED-3 CED-4 OFF. CED-9 ON. Cells that normally die. CED-3 CED-4 ON. CED-9 OFF. C elegans. M. musculus. Comparison of sequences from 4 organisms. MCM3. MCM2. CDC46/MCM5. CDC47/MCM7.
E N D
Cells that normally survive CED-3 CED-4 OFF CED-9 ON Cells that normally die CED-3 CED-4 ON CED-9 OFF C elegans
Comparison of sequences from 4 organisms MCM3 MCM2 CDC46/MCM5 CDC47/MCM7 CDC54/MCM4 MCM6 These proteins form a hexamer in the species that have been examined
The Gene Ontologies A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else!
Gene Ontology - 1998 FlyBase Drosophila Cambridge, EBI, Harvard Berkeley & Bloomington. SGD Saccharomyces Stanford. MGI Mus Jackson Labs., Bar Harbor.
Fruitfly - FlyBase Budding yeast - SaccharomycesGenome Database (SGD) Mouse - Mouse Genome Database (MGD & GXD) Rat - Rat Genome Database (RGD) Weed - TheArabidopsisInformation Resource (TAIR) Worm - WormBase Dictyostelium discoidem - Dictybase InterPro/UniProt at EBI - InterPro Fission yeast - Pombase Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB - Sanger Microbes - Vibrio, Shewanella, B. anthracus, … -TIGR Grasses - rice & maize - Gramene database zebra fish –Zfin ......... Gene Ontology -now
To provide structured controlled vocabularies for the representation of biological knowledge in biological databases.
Be open source • Use open standards • Make data & code available without constraint • Involve your community
Outline • Introduction to the Gene Ontologies (GO) • Annotations to GO terms • GO Tools • Applications of GO
Gene Ontology Objectives • GO represents concepts used to classify specific parts of our biological knowledge: • Biological Process • Molecular Function • Cellular Component • GO develops a common language applicable to any organism • GO terms can be used to annotate gene products from any species, allowing comparison of information across species
GO: Three ontologies What does it do? Molecular Function What processes is it involved in? Biological Process Where does it act? Cellular Component gene product
Example: Gene Product = hammer Function (what) Process (why) Drive nail (into wood) Carpentry Drive stake (into soil) Gardening Smash roach Pest Control Clown’sjuggling object Entertainment
Biological Examples Biological Process Molecular Function Biological Process Molecular Function Cellular Component Cellular Component
The 3 Gene Ontologies • Molecular Function = elemental activity/task • the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity • Biological Process = biological goal or objective • broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions • Cellular Component= location or complex • subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme
Molecular Function • A single reaction or activity, not a gene product • A gene product may have several functions • Sets of functions make up a biological process
Cellular Component • where a gene product acts
What’s in a GO term? term: gluconeogenesis id: GO:0006094 definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.
Content of GO • Molecular Function 7,309 terms • Biological Process 10,041terms • Cellular Component1,629 terms • Total 18, 975 terms • Definitions: 94.9 % • Obsolete terms: 992 As of October 2005
What’s in a name? • Glucose synthesis • Glucose biosynthesis • Glucose formation • Glucose anabolism • Gluconeogenesis • All refer to the process of making glucose from simpler components
tree directed acyclic graph
Parent-Child Relationships Nucleus Nuclear envelope Nucleoplasm Nucleolus Chromosome Perinuclear space A child is a subset of a parent’s elements The cell component term Nucleus has 5 children
Ontology Relationships Directed Acyclic Graph
Evidence Codes for GO Annotations http://www.geneontology.org/doc/GO.evidence.html
IEAInferredfromElectronicAnnotation ISSInferred from Sequence Similarity IEPInferred from Expression Pattern IMPInferred from Mutant Phenotype IGIInferred from Genetic Interaction IPIInferred from Physical Interaction IDAInferred from Direct Assay RCA Inferred from Reviewed Computational Analysis TASTraceable Author Statement NASNon-traceable Author Statement ICInferred by Curator NDNo biological Data available
IEAInferred from Electronic Annotation • Sequence Similarity (BLAST) • Automatic transfer from mappings (InterPro2GO, EC2GO etc.) • -> Not manually reviewed
ISSInferred from Sequence or Structural Similarity • Sequence similarity • Recognized domains • Structural similarity -> Use of ‘with’ column recommended
IEPInferred from Expression Pattern • Transcript levels (Northerns, microarrays) • Protein levels (Western blots) -> Timing or localization of expression -> Biological process annotations
IMPInferred from Mutant Phenotype • Gene mutation/knockout • Overexpression/ectopic expression • Anti-sense experiments • RNAi experiments • Specific protein inhibitors
IGIInferred from Genetic Interaction • Suppressors, synthetic lethals… • Functional complementation • Rescue experiments • -> Use of ‘with’ column recommended
IPIInferred from Physical Interaction • 2-hybrid interactions • Co-purification • Co-immunoprecipitation • Ion/complex/protein binding experiments • -> Use of ‘with’ column recommended
IDAInferred from Direct Assay • Enzyme assays • In vitro reconstitution (e.g. transcription) • Immunofluorescence (for cell. comp.) • Cell fractionation (for cell. comp.) • Physical interaction/binding assay
RCAInferred from Reviewed Computational Analysis • Non-sequence-based computational methods • Genome-wide analyses (e.g. 2-hybrid) • Combinations of large-scale experiments
TASTraceable Author Statement • Support from review article • Textbook ‘common knowledge’ • -> Data that can be ‘traced’ back
NASNon-traceable Author Statement • Database entries that don't cite a paper • -> Data that cannot be ‘traced’ back
ICInferred by Curator • Not supported by any direct evidence • Inferred from other GO annotations • -> GO term in ‘with/from’ column required
NDNo biological Data available Curator found no information supporting any annotation • molecular function unknown GO:0005554 • biological process unknown GO:0000004 • cellular component unknown GO:0008372
Term Hierarchy TAS/IDA IMP/IGI/IPI ISS/IEP NAS IEA
Annotation summaries Meloidogyne incognita: McCarter et al. 2003