290 likes | 411 Views
Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1. M. Madan Babu. MRC Laboratory of Molecular Biology Cambridge. C. H. H. C. Overview of research. Evolution of biological systems. Evolution of networks within and across genomes.
E N D
Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 M. Madan Babu MRC Laboratory of Molecular Biology Cambridge
C H H C Overview of research Evolution of biological systems Evolution of networks within and across genomes Evolutionary of transcriptional networks Evolution of transcription factors Nuc. Acids. Res (2003) Nature Genetics (2004) J Mol Biol (2006a) Structure and function of biological systems Uncovering a distributed architecture in networks Structure and dynamics of transcriptional networks Methods to study network dynamics J Mol Biol (2006b) J Mol Biol (2006c) Nature (2004) Data integration, function prediction and classification Discovery of transcription factors in Plasmodium Discovery of novel DNA binding proteins Evolution of a global regulatory hubs Nuc. Acids. Res (2005) Cell Cycle (2006)
Rcs1 – regulator of cell size 1 S. cerevisiae - wild type S. cerevisiae - Rcs1 mutant Size of mutant cells are twice that of the parental strain The critical size for budding in the mutant is similarly increased Rcs1 binds specific DNA sequences The following parameters that were used to define cell-size for the Rcs1 mutant were at least 2 Standard deviation (2 s) from the mean values of the wild-type Mother cell-size 874 760 Contour length of mother cell 108 100 Long axis length of mother cell 36 33 Short axis length of mother cell 30 27 Roundness of mother cell 1.29 1.20 Micrographs and data from SCMD
Rcs1p and Aft2p are global regulatory hubs with an as yet uncharacterized DNA binding domain Transcriptional regulatory network in yeast Sub-network of Rcs1 and Aft2 Aft2p Rcs1p 314 123 41 Number of target genes regulated How did the paralogous hubs that regulate distinct sets of genes evolve? Rcs1 is a global regulatory hub – Network analysis I No. of members Tig Hsf Tea Fkh P53 Myb bZip Abf1 LisH Gata Ace1 Ime1 Gcr1 Rcs1 Mads Apses Dal82 bHLH Tigger HMG1 Homeo AT-Hook C2H2-Zn C6-Fungal Distribution of DNA binding domains in yeast transcription factors
Relationship to WRKY DNA binding domain – Sequence analysis I Candida albicans (ascomycete) Yarrowia lipolytica (ascomycete) Ustilago maydis (basidiomycete) Cryptococcus sp (basidiomycetes) E. cuniculi (microsporidia) . . . + . Giardia lamblia (diplomonad) Dictyostelium discoideum Entamoeba histolytica Non-redundant database Lineage specific expansion in several fungi and is seen in lower eukaryotes WRKY domain (Arabidopsis) + FAR-1 type transposase (Medicago truncatula) Profiles + HMM of this region Non-redundant database Globular region maps to WRKY DNA-binding domain
Rcs1 (S. cerevisiae) + WRKY DNA-binding Domain from Arabidopsis WRKY4 Gcm1 (Drosophila) Non-redundant database WRKY DNA-binding domain maps to the same globular region Confirmation of relationship to WRKY DBD – Sequence analysis II S1 S2 S3 S4 JPRED/PHD Sequence of secondary structure is similar to the WRKY DNA-binding domain and GCM1 protein seen in mouse Multiple sequence alignment of all globular domains Homologs of the conserved globular domain constitutes a novel family of the WRKY DNA-binding domain
Template structure Both WRKY and GCM1 have similar network of stabilizing interactions Characterization of the globular domain – structural analysis I Predicted SS of Rcs1 DBD Predicted SS of Rcs1 DBD S1 S2 S3 S4 S1 S2 S3 S4 SS of WRKY4 SS of GCM1 S1 S2 S3 S4 S1 S2 S3 S4 Mus musculus Glial Cell Missing - 1 (GCM-1:1odh:X-ray structure) A. thaliana transcription factor (WRKY4:1wj2:NMR structure)
Core fold of the Rcs1 DBD will be similar to the WRKY-GCM1 domain and may bind DNA in a similar way Characterization of the globular domain – structural analysis II S4 S1 S2 S3 4 residues involved in metal co-ordination and 10 residues involved in key stabilizing hydrophobic interactions that determine the path of the backbone in the four strands of the GCM1-WRKY domain show a strong pattern of conservation.
C F HxC G I Classification of WRKY-GCM1 superfamily – Cladistic analysis I S1 S2 S3 S4 C H + Zn2+ H C S1 S2 S3 S4 Template structure HxC containing version (HxC) Classical WRKY (C) Insert containing version (I) FLYWCH domain (F) GCM domain (G) C C C H C H C H C H Zn2+ Zn2+ Zn2+ H H H Zn2+ Zn2+ H C C H C W C C S1 S1 S1 S2 S2 S3 S3 S2 S3 W S4 S4 S4 S1 S1 S2 S3 S2 S3 S4 S4 WRKY motif in S1 Short loop between S2 & S3 N-terminal helix Conserved W in S4 Large insert between S2 & S3 HxC instead of HxH N-terminal helix Short insert between S2 & S3 Conserved W in S2 Sequence features Insertion of Zn ribbon between S2 and S3 Gcm1 Far1 WRKY4 Rcs1 Mdg
C C C F F HxC HxC G G I I I Domain context for the different families – network analysis I HxC containing version (HxC) Classical WRKY (C) Insert containing version (I) FLYWCH domain (F) GCM domain (G) C C C H C H C H C H Zn2+ Zn2+ Zn2+ H H H Zn2+ Zn2+ H C C H C W C C S1 S1 S1 S2 S2 S3 S3 S2 S3 W S4 S4 S4 S1 S1 S2 S3 S2 S3 S4 S4 e.g. At2g23500 e.g. Far1 Mobile element OUT protease MULE Tpase MULE Tpase Stand alone Stand alone Stand alone Stand alone Stand alone BED finger Zn cluster Zn knuckle POZ SMBD Tandem Tandem e.g. WRKY4 e.g. Rcs1 e.g. 101.t00020 e.g. Mod (mdg) e.g. Gcm1
Human Fly Worm Fungi Entamoeba Slim mould Plants C F HxC G I Phyletic distribution – Comparative genome analysis I TF only TF only TF + TP Transcription factor Transposase GCM1 and FLYWCH versions evolved from an insert containing version that is a transposase Higher Eukaryotes HxC and Insert containing versions are seen as both transcription factors and as transposases Fungi Lower eukaryotes Classical version of the WRKY evolved from an insert containing version that is a transposase Plants
-explain that there has been multiple transitions from transposase to TFs in the fungal genomes -explain how this could have happened by showing the snapshot of the breakup of selfish elements into two distinct products -explain that the transposase can itself regulate the gene expression of itself
Outline of the presentation Rcs1 and aft2 have a distinct version of the WRKY type DNA binding domain Sensitive sequence search reveals that Oryza sativa (monocot) Arabidopsis thaliana (dicot) Medicago truncatula (dicot) Nicotiana tabacum (dicot)
Bed-finger (2ct5) Classical Zn-finger (1m36) C C C C H H Zn2+ Zn2+ H H S1 S3 S2 S1 S2 H1 S4 Structural equivalences of WRKY-GCM1 domain proteins with Bed and Zn finger WRKY (1wj2) GCM-type WRKY (1odh) C C Zn C C C C H H Zn2+ Zn2+ H H C C S1 S1 S2 S3 S2 S3 S4 S4
Why Rcs1? While systematically analyzing the genes which gave rise to abnormal cell size, We and the other noted that mutants of Rcs1 give abnormal cell shape. It was known to be an important transcription factor involved in cell size regulation – explain showing graphs and images Independently, during the analysis of the TNET in yeast We looked at the hubs and the DNA binding domains That were present in them. Interestingly, there were two Hubs that did not have any known DNA binding domain Identified in them, but the region which mediates DNA was known – explain showing the family relationship Of the hubs -only two members, and both are hubs -how and when did they evolve? Standard search procedures using Pfam and other databases did not provide any clue about the domain. So we set out to characterize the DNA binding region from Rcs1p and its paralog Aft2p using sensitive sequence search and other computational methods. -show output from Pfam hits
WRKY DNA binding domain – Structure analysis I Structural aspects of the DNA binding domain Explain the residues involved in metal chelating -DNA contacting surface -Inserts in the loops -Stabilizing contacts involved
WRKY DNA binding domain – Structure analysis II Structure comparisons identify several other Known transcription factors including the GCM protein in eukaryotes -Explain the insert of a zinc ribbon in the loop In fact sequence comparison without the insert can pick these WRKY proteins
Classification of WRKY domains – Cladistic analysis I Multiple starting points identified all homologs in the different species This allowed us to classify the sequences into different families Each with a specific feature suggesting common evolutionary relationship Based on shared and derived features of the domains - List the 5 families and point to features involved using a structure template
Phylogenetic distribution and domain architecture for the different families - I Phyletic profiles of the different domains points to the possibility that these transcription factors could have evolved from transposases With at least two distinct recruitment into transcription factors. -In plants in one case -In the base of the fungal genomes in the other case
Phylogenetic distribution and domain architecture for the different families - II
Comparative genomics using the fungal genomes provides the clue for the evolution of these TFs -explain that there has been multiple transitions from transposase to TFs in the fungal genomes -explain how this could have happened by showing the snapshot of the breakup of selfish elements into two distinct products -explain that the transposase can itself regulate the gene expression of itself
Comparative genomics using the fungal genomes provides the clue for the evolution of these TFs -extensive recruitment of the transposase in the different fungal lineages -multiple jumps within the fungal lineage -very recent duplication event in the order Saccharomycetales suggest hubs could Evolve rapidly -Candida rbf1 and other TFs independently duplicated and evolved as global regulators
Analysis of the gene expression data in plants Since it happened in fungal genomes, we ask how does this behave in the plants. -show the gene expression patterns for the different subfamilies. We see two trends one where divergence has primarily occurred in the expression changes rather than in the protein sequence, and the other in which proteins with the same expression pattern have different binding site residues. -spatio-temporal changes in gene expression -It is experimentally well known that the FLYWCH and the GCM proteins are developmentally important regulatory proteins. So in three lineages there has been recruitment of the transposase into becoming a developmentally important global regulator.
Analysis of the gene expression data in plants There are interesting traces of gene expression pattern when we see for the different WRKY containing proteins. TPases are expressed in the root and in the pollen enhancing the possibility of rapidly expanding themselves during evolution.
Acknowledgements Aravind group L Aravind S Balaji Lakshminarayan Iyer
C C C C C C C C C C C C F F F F F F HxC HxC HxC HxC HxC HxC I G I I I I I I I I I I I I I I I I I I I I I G G * YALI0A02266g_Ylip_50543034 T24C4.7_Cele_17555272 MtrDRAFT_AC146590g49v2_Mtru_92891293 * * gcm_Dmel_17137116 F54C4.3_Cele_3790719 CHGG_00311_Cglo_88184608 MtrDRAFT_AC126008g21v1_Mtru_92876827 * * LOC411361_Amel_66547010 hGCMa_Hsap_1769820 * Ci-ZF-1_Cint_93003122 YALI0C00781g_Ylip_50547661 * TTR1_Atha_30694675 1- 5 mod(mdg4)_Dmel_24648712 WRKY41_Osat_46394336 KIAA1552_Hsap_10047169 * C26E6.2_Cele_32565510 CHGG_08318_CGLO_88179597 1- 5 At2g23500_Atha_3242713 * CG13845_Dmel_24649011 T24C4.2_Cele_17555262 LOC_Os11g31760_Osat_77551147 C20orf164_Hsap_13929452 UM03656.1_Umay_71019145 Drosophila melanogaster Homo sapiens Caenorhabditis elegans NtEIG-D48_Ntab_10798760 AN6124.2_ANID_67539908 FAR1_Atha_18414374 Fungi Plants AT4g19990_Atha_7268794 mutA_Ylip_49523824 Animals Afu2g08220_Afum_71000950 WRKY58_Atha_22330782 At2g34830_Atha_27754312 AFT2_Scer_6325054 Encephalitozoon cuniculi Ciliates ECU05_0180_Ecun_19173554 Apicomplexa 101.t00020_Ehis_67474280 Entamoeba histolytica GLP_9_36401_35940_Glam_71071693) Giardia lamblia Dictyostelium discoideum GLP_79_64671_67418_Glam_71077115) dd_03024_Ddis_28829829 HxC-type WRKY PHD finger Plant specific Zn-cluster Classical WRKY BED finger Isochoris matase Insert-containing WRKY TIR domain OTU LRR STAND ATPase GCM-type WRKY C2H2 finger Zinc knuckle Plant specific N-all-beta SWIM domain MULE transposase FLYWCH-type WRKY AT-hook POZ Plant-specific mobile domain
Expression profiles of WRKY-GCM1 domain proteins in Arabidopsis WRKY proteins show tissue specific expression WRKY proteins show light specific expression
Relationship between Rcs1p and Aft2p homologs Multiple independent evolution of TFs from Transposons AAL026Wp Agos 44980144 UM03656.1 Umay 71019145 CHGG 06963 CGLO 88178242 CHGG 06785 CGLO 88182698 CHGG 09478 CGLO 88177996 CHGG 00175 CGLO 88184472 CHGG 10902 CGLO 88175616 FG05699.1 Gzea 46122643 NCU06551.1 Ncra 85106835 NCU05145.1 Ncra 85081010 YALI0F07128g Ylip 50555399 MG05295.4 Mgri 39939890 FG04147.1 Gzea 46116610 NCU07855.1 Ncra 85109845 MG06795.4 Mgri 39977821 NCU08168.1 Ncra 85093270 CHGG 09951 CGLO 88176079 Kwal 24045 waltii CHGG 08318 CGLO 88179597 ORFP 7853 mikatae ORFP 21513 mikatae NCU04492.1 Ncra 32406464 AFT2 SCER 6325054 ORFP 8601 paradoxus RCS1 SCER 51830313 ORFP 22109 paradoxus FG09606.1 Gzea 46136181 AFL087C AGOS 44984319 CaO19.2272 Calb 68482460 NCU06975.1 Ncra 85108658 UM03656.1 Umay 71019145 CHGG 05063 CGLO 88180976 KLLA0D03256g Klac 50306475 ORFP Scas Contig690.14 castelli ORFP Scas Contig720.21 castelli DEHA0F25124g Dhan 50425555 ORFP Sklu Contig1830.2 kluyveri CAGL0G09042G CGLA 49526062 CAGL0H03487G CGLA 49526254 HOP78 FOXY 30421204 ORFP Skud Contig1659.3 kudriavzeii ORFP Skud Contig2057.12 kudriavzeii CHGG 00311 CGLO 88184608 CIMG 00825 CIMM 90305840 AN6124.2 Anid 67539908 * * ISOCHOR AFUM 71001046 CNC00740 CNEO 57225606 CNBH2400 Cneo 50256416 AN0859.2 ANID 67517161 Rbf1 cluster YALI0A16269g Ylip 50545173 CaO19 12424 Calb 68467239 DEHA0E17127g Dhan 50422877 RBF1P CALB 2498834 DEHA0A05258g Dhan 50405817 CaO19.2272 Calb 68482460 Rcs1 Aft2p cluster DEHA0F25124g Dhan 50425555 CAGL0H03487G CGLA 49526254 AFL087C AGOS 44984319 KLLA0D03256g Klac 50306475 CAGL0G09042G CGLA 49526062 RCS1 SCER 51830313 AFT2 SCER 6325054 YALI0A05313g Ylip 50543230 YALI0A02266g Ylip 50543034 Mutyl Ylip 50545163 YALI0C17193g.c Ylip 50548927 Mutyl.c Ylip 50545161 YALI0C00781g.d Ylip 50547661 YALI0C00781g.a Ylip 50547661 YALI0C00781g.b Ylip 50547661 YALI0C00781g.c Ylip 50547661 YALI0C17193g.a Ylip 50548927 Mutyl.a Ylip 50545161 YALI0D22506g Ylip 50551361 Fungi Mutyl.b Ylip 50545161 YALI0C17193g.b Ylip 50548927 MG07557.4 Mgri 39972511 MG09992.4 Mgri 39965911 101.T00020 EHIS 67474280 Entamoeba 4.T00052 EHIS 67483840 FAR1 ATHA 18414374 AT2G27110 ATHA 18401324 AT2G43280 ATHA 30689328 AT4G38180 ATHA 15233732 AT3G59470 ATHA 18411179 Plants AT5G28530 ATHA 22327146 AT1G52520 ATHA 15219020 AT1G80010 ATHA 15220043 C20ORF164 HSAP 13929452 LOC428161 GGAL 50759053 T24C4.2 CELE 17555262 Animals SJCHGC04823 SJAP 56758936 6330408A02RIK MMUS 50053999 LOC374920 HSAP 27694337 Transcriptional network involving Aft2p and Rcs1p Aft2p Aft2p Rcs1p Rcs1p 41 314 123 Number of target genes regulated
Conclusion Integration of different types of experimental data allowed us to Identify the DNA binding domain in Rcs1 Sequence Structure Expression Interaction