1 / 78

Functional Annotation

Functional Annotation. Background + Strategy. The Group. Outline. What is Functional Annotation The I mportance of Functional Annotation The Biology of H . haemolyticus Background for Functional Annotation Pros/Cons of Available Approaches Planned Approach Breadth Depth . Outline.

garren
Download Presentation

Functional Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional Annotation Background + Strategy The Group

  2. Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth

  3. Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth

  4. Functional Annotation The ‘what?’

  5. Genome Assembly Assemble the Pieces Right

  6. Gene Prediction When on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers . Whenon board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers . Identify the words

  7. Functional Annotation Whenon board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers . nat·u·ral·ist [nach-er-uh-list, nach-ruh-] noun 1. a person who studies or is an expert in natural history, especially a zoologist or botanist. 2. an adherent of naturalism in literature or art. Origin: 1580–90; natural + -ist Identify the function (i.e., meaning) of each word DATABASES PROFILES Origin of Species, The noun ( On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life ) a treatise (1859) by Charles Darwin setting forth his theory of evolution.

  8. Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth

  9. Not just Newtonian The gravity of the annotation process

  10. Albert B, et al. (2002) Molecular biology of cell. New York: Garland Science. function “Ultimately, one wishes to determine how genes—and the proteins they encode—function in the intact organism.”

  11. Function? What is it? • To a cell biologist function might refer to the network of interactions in which the protein participates or to the location to a certain cellular compartment. • To a biochemist, function refers to the metabolic process in which a protein is involved or to the reaction catalyzed by an enzyme.

  12. Functional Annotation Functional annotation consists of attaching biological information to genomic elements. • Biochemical function • Biological function • Involved regulation and interactions • Expression

  13. Whatever happened to wet-lab? “Experimentally annotating one complete bacterial genome varies from organism to organism. Roughly speaking, it could take as much as $25,000 and a period of 6-12 months for completing the process” - Alejandro Caro

  14.  The Naked Truth No. of Genomes in KEGG KEGG Genome: Release Update of Jan 2012

  15. How Gene Performs Function? Operon • Operon: Several genes with related functions that are regulated together, because one piece of mRNA codes for several related proteins. • Polycistronic mRNA,, mRNA coding for more than one polypeptide, is found only in prokaryotes

  16. Coding and non coding RNA’s Protein CodingEnzymesStructural Regulatory Signal TransductionReceptors ToxinsVirulence Factors Membrane/ TransmembraneNon Coding RiboswitchesCRISPRSrna's Pathway Prediction

  17. Domain/Motif • Domain:A discrete structural unit that is assumed to fold independently of the rest of the protein and to have its own function.~20-100 aa • Motif:Are short, conserved regions and frequently are the most conserved regions of domains. Motifs are critical for the domain to function.

  18. Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth

  19. Understanding the Target Haemophilushaemolyticus - The Biography

  20. Haemophilushaemolyticus • Gram-negative • Facultative anaerobe • Known to colonize the human respiratory tract. • Out of the 8 Haemophilus species found to colonize the respiratory tract, H. influenzaeand H. haemolyticusare the most prevalent ones. • H. haemolyticus is an emerging pathogen • 5 cases of invasive disease reported between 2009-10.

  21. Strains of H. haemolyticus • fucK : ncodingfuculose-kinase.  fucK deletion has been observed in some Hi isolates • Hpd: encoding a lipoprotein protein D,

  22. Phylogeny NielsNørskov-Lauritsen, N., et al. (2005).Multilocus sequence phylogenetic study of the genus Haemophilus with description of Haemophiluspittmaniae sp. nov. International Journal of Systematic and Evolutionary Microbiology, 55, 449–456

  23. Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth

  24. View from 300 ftand a brief time travel

  25. Ontology • An ontology is a "formal, explicit specification of a shared conceptualization“ • Two formal major ontology schemes: • EC – Enzyme Commission Number • GO – Gene Ontology

  26. Enzyme Commission (EC) • A large scale comprehensive attempt to organize and classify enzymes according to its function • For inclusion in the list, direct experimental evidence is to be provided for its claimed activity • Organizes the list of enzymes in four levels of hierarchy, starting with the top most 6 classes: • Oxidoreductases • Transferases • Hydrolases • Lyases • Isomerases • Ligases

  27. Chronology: Enzyme Commission (EC) • Cons of EC:   • Hierarchy only provides parent to child relationship • Only specific to enzymes (doesn't cover all of the proteins)

  28. Chronology: Gene Ontology (GO)Or in other words "give this protein a name and stick to it!!"

  29. What is the GO? • Molecular Function • Biological Process • Cellular Component • Relations between the terms • ‘is_a’ • ‘part_of’, ‘has_part’ • ’regulates’

  30. Structure of GO du Plessis L, Skunca N, DessimozC (2011). The what, where, how and why of gene ontology–a primer for bioinformaticians. Brief Bioinform. Doi: 10.1093/bib/bbr002

  31. General Rule To Apply Evidence Code

  32. Where Do Annotations Come From? • Inferred from experiment • Most reliable • Base for computational method • Inferred from computational method • Sequence similarity, structural similarity, etc. • Inferred from author statement • Curator statement and Obsolete evidence codes

  33. Why use the GO? • The ‘GO Consortium’ consists of a number of large databases working together to define standardized ontologies and provide annotations to the GO. • Search for interacting genes • Reason across the relations • Analyze the results of high-throughput experiment • Infer function of un-annotated genes and inter protein-protein interactions.

  34. Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth

  35. Choosing The Right Function Prediction Tool Caution!Pros and Cons of Conventional Approaches

  36. “Perutz et al. showed in 1960 that myoglobin and hemoglobin, the first two protein structures to be solved at atomic resolution using X-ray crystallography, have similar structures even though their sequences differ.”

  37. Pros and Cons: There are no free lunches! • Homology Useful but different from “same” function • Simply implies common ancestry

  38. Pros and Cons: There are no free lunches!

  39. Pros and Cons: There are no free lunches! • Quality of Prediction is as good as the quality of annotation of the database • Eukaryotic function predictor can not be used for Prokaryotes and vice versa

  40. Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth

  41. A Snapshot of the Iceberg Named Functional Annotation Breadth and Depth of the analysis

  42. Spectrum of Methods Selected Breadth

  43. Criteria for selecting methods • Currently being maintained • Applicable to Prokaryotic sequences • Could be installed locally (support batch jobs if GUI) OR Could be included in a pipeline i.e., have a command-line interface

  44. Categories of Approaches • Sequence similarity-based • Phylogenomics-based • Domain/pattern/profile - based • Domain-based • Pattern-based • Profile-based • Sequence clustering-based • Machine learning-based • Network-based

  45. Breadth: Options

  46. Flowchart

  47. Description of Selected Methods Depth

  48. Level 1 The building blocks!

  49. PanGenomeAnalysis • PanGeome is the full complement of genes in a species. • It includes core genome which is a set of genes that are present in all strains, dispensable genome that are genes present in 2 or more strains and unique genes which are unique to specific strains. • In this case, we will be using pangeome of Haemophilusinfluenzae. • This database will be used as the reference database in BLAST. • This method gives high confidence annotations since the strains selected are very closely related to the organism in question.

  50. BLAST: How it works? • Divide a query sequence into short chunks called words, • Look for exact matches • in case of hit try extending the alignment

More Related