190 likes | 216 Views
This article discusses the extension of MAO (Multiple Alignment Ontology) towards an ontology of genetic and evolutionary events. It explores the use of multiple alignments in information management systems and current projects in integrative bioinformatics and genomics.
E N D
Extending MAO : towards an Ontology of Genetic and Evolutionary Events Julie Thompson Laboratory of Integrative BioInformatics and Genomics (LBGI), Department of Biology and Structural Genomics, UMR 7104, IGBMC Strasbourg NESCent Evolutionary Informatics Working Group Meeting November 12-14, 2007 Collège de France
Outline • Introduction : • MAO : multiple alignment ontology • MACSIMS : multiple alignment based information management system • Current projects : • AlexSys : integrated workbench for multiple alignment • MyoNet : interactome, transcriptome analyses ( French Myopathy Association) • EvolHHuPro : Evolution Histories of the Human Proteome
MAO: multiple alignment ontology • Standardised vocabulary for multiple alignments : • DNA, RNA, protein • sequences and 3D structures MAO consortium: - MSA algorithms (J Thompson, O Poch, IGBMC) (Kazutake KATOH, Kyoto) - Protein 3D analysis (Patrice KOEHL, Davis) - Protein 3D structure (Dino MORAS, IGBMC) - RNA analysis (Steve HOLBROOK, Berkeley) - 3D RNA structure (Eric WESTHOF, IBMC) http://www-igbmc.u-strasbg.fr/BioInfo/MAO/mao.html Thompson et al, Nucl Acids Res. 2005
is_a part_of attribute_of sequence_feature column_conservation type level sequence_feature_type aliphatic basic domain signal atom residue_function structural_location helix exposed 3d_atomic_coordinates hinge_region interaction mutation MAO: multiple alignment ontology multiple_sequence_alignment sub_alignment alignment_sequence alignment_column residue nucleotide amino_acid Thompsonet al, Nucl Acids Res. 2005
MAO: multiple alignment ontology Open Biomedical Ontologies : • well-structured vocabularies for shared use across different biological domains • Acceptance criteria. The ontology : • must be open, freely available to all • should be in a shared format for compatibility • should be orthogonal to the other OBO ontologies • The ontology is then considered to be authoritative by the OBO consortium http://obo.sourceforge.net/
SO: gene structure GO: gene ontology DOID: human disease IPR: Interpro ProPreO: proteomics MAO: multiple alignment taxon: TAXID BTO: tissue MI: interactions PSI-MOD: modifications PW: pathway OBO ontology MAO: links to other ontologies • OBO ontologies are all available in the same format and can be used in combination
MACSIMS : Information Management System • Data collection : • creation of a relational database • (BIRD, H. Nguyen) • Information management: • data validation • reliable propagation • Efficient exploitation : • of the multiple alignment for • phylogenetic inference • automatic, high-throughput processing • (XML format) • visualisation • (JalView, G. Barton, Scotland)
MACSIMS : Information Management System • MS2PH : Prediction of structural/functional effects of mutations Sulfatase protein family : GALNS • Mutations in GALNS gene are implicated in Morquio A syndrome : • mutation C79Y -> severe phenotype • others -> milder phenotypes
Receptor-interacting domain LXXLL nuclear receptor coactivator 2 LXXLL LXXLL nuclear receptor coactivator 2 CREBBP interaction AT HLH PAS PAS PAC Poly-Gln Acetylation (by CREBBP) S-nitrosylation PAS High throughput applications • Structural/functional characterisation • 1500 structural genomics targets : SPINE IP FP5 • Genome annotation • annotation of prokaryotic genome Mycobacterium smegmatis (JM. Reyrat, Hopital Necker, Paris) • functional annotation of 200 000 cDNA from eukaryoteAlvinella pompejana • (F. Zal, Station Biologique, Roscoff) • Functional genomics • Transcriptomic data analyses : PRIMA, EVI GENORET IP FP6 • Prediction of structural/functional effects of mutations • human genetic diseases : MS2PH (Structural Mutation to Human Pathology Phenotypes) • (G. Deléage, IBCP, Lyon)
Current Projects • ALEXSYS : Alignment Expert SYStem (thesis, R. Aniba; co-directed by A. Marchler-Bauer, NCBI, Washington) Motivation • test, evaluate and optimize all the stages of the construction, analysis and exploitation of a multiple sequence alignment Objectives • develop a modular platform, incorporating different, complementary algorithms and mined knowledge (sequence, structure, function, taxonomy…) • understand relationships between sequence characteristics and algorithmic strengths and weaknesses • automatic selection of suitable algorithms depending on sequences • design optimal scenarios/workflows for different biological applications Platform for integration and exploitation of pertinent information for the study of complex biological systems
Current Projects • MyoNet : interactome, transcriptome analyses ( French Myopathy Association) • Interactome data (Isabelle Richard, Genethon, Paris) • extension of MAO, MACSIMS • interaction residues represented by non-contiguous sequence features • interactions between proteins requires definition of links between MSA, notion of ‘collection of MSA’ • Transcriptome data (Frédéric Relaix, Hopital Pitié-Salpétrière, Paris; Miguel Andrade, Ottawa) • construction of transcriptional networks involved in muscle development • gene expression data • phylogenetic profile approach • functional anayses from MACSIMS
Analysis of protein coding regions (extensions, insertions, deletions,...) Reconstruction of the evolutionary histories of the human proteome Genetic events (duplication, loss, recombination,...) fish fish frog frog mouse mouse mouse mouse human human human human recombination mutation loss duplication active site mouse human Current Projects • EvolHHuPro:Reconstruction of evolutionary histories for human proteome (P. Pontarotti, Marseille) MSA construction (PipeAlign) MACSIMS analysis Genome mapping (Cassiope) Evolutionary mechanisms Tree construction (Figenix) Localisation genetic events Construction of evolutionary histories
Current Projects • EvolHHuPro:Reconstruction of evolutionary histories for human proteome (P. Pontarotti, Marseille) Genome scale analysis: • Classify evolutionary histories to define a set of ‘typical’ histories • compare stable and unstable families • identify proteins that have never experienced specific events (duplications, fusions,…) • … • Functional analyses of clusters, based on MACSIMS • enrichment of a particular class of proteins • correlations between the genetic events and structural/functional context • … Objective: better understanding of mechanisms involved in vertebrate evolution
Acknowledgements LBGI (Laboratory of Integrative Bioinformatics and Genomics): Luc MOULINIER Jean MULLER Ngoc-Hoan NGUYEN Emmanuel PERRODOU Laetitia POIDEVIN Francisco PROSDOCIMI Wolfgang RAFFELSBERGER Ravikiran REDDY Raymond RIPP Nicolas WICKER Laurent-Phillippe ALBOU Radouene ANIBA Yannick-Noel ANNO Guillaume BERTHOMMIER Yann BRELIVET Annaick CARLES Anne FRIEDRICH Nicolas GAGNIERE David KIEFFER Odile LECOMPTE Olivier POCH BIPS (Strasbourg BioInformatics Platform): Collaborators : Frédéric PLEWNIAK Laurent BIANCHETTI Sophie CANDEL Véronique GEOFFROY Miguel ANDRADE (Ottawa) Toby GIBSON (Heidelberg) Des HIGGINS (Dublin) Kazutaka KATOH (Kyoto) Patrice KOEHL (UC Davis) Aron MARCHLER-BAUER (Washington) Pierre PONTAROTTI (Marseille) Frédéric RELAIX (Paris) Eric WESTHOF (Strasbourg)
IGBMC Research Center (CNRS/Inserm/Université Louis Pasteur) 14 000 m2 laboratory area 7 departments 4 highthroughput technological platforms, RIO 534 personnel : • 90 researchers • 72 postdocs • 145 PhD • 227 engineers/technicians • European Biomedical research center • Eukaryote genome study • Genetic expression control • Genes and proteinsfunctional analysis • Human pathologies studies (cancer, monogenic disease, metabolic disease, ...)
Proteins Complexes Cells Tissues Informational Families Transcription Stem cells Cancer Prostate, breast LBGI : Laboratory of Integrative Bioinformatics and Genomics IGBMC Department of Biology and Structural Genomics (D. Moras) RIO platforms Laboratory of Integrative Bioinformatics and Genomics (O. Poch) • Bioinformatics • Services, education • Ressources updates • Development and distribution Fonctional genomics Comparative Genomics Phylogenetic Inference Structure Databases Algorithms Platform F. Plewniak L. Bianchetti S. Candel V. Geoffroy L. Moulinier LP. Albou Y. Brelivet A. Friedrich N. Wicker D. Kieffer JD. Thompson R. Aniba E. Perrodou F. Prosdocimi W. Raffelsberger A. Carles L. Poidevin R. Reddy R. Ripp Y. Benabbou G. Berthommier H. Nguyen O. Lecompte YN. Anno N. Gagnière
Integrated protein family analysis • Automatic construction and analysis of a high quality multiple alignment • BlastP, Ballast • SRS, BIRD • DbClustal • RASCAL • LEON • NorMD • Secator/DPC • MAO, MACSIMS • Reliable environment for integration of information related to protein families Plewniak et al, Nucl Acids Res. 2003
BBS10 R49W BBS6 sub_alignment:=BBS10 chaperonin column:=85 sequence:=bbs10_human motif:=(47-63)ATP binding conservation:=100% residue:= 49 mutation:=R->W MACSIMS Visualisation with JalView (G Barton) Bardet Biedl Syndrome, BBS10 : schematic overview window detailed alignment window sub-groups XML format based on MAO: conservation profile