550 likes | 647 Views
Patching the Puzzle of Genetic Network. Grace S. Shieh Institute of Statistical Science, Academia Sinica gshieh@stat.sinica.edu.tw. Outline. What is Genetic Network? Why the area is one of the frontiers? How Statistical modeling/computational algorithms simplify the complex puzzle?
E N D
Patching the Puzzle of Genetic Network Grace S. Shieh Institute of Statistical Science, Academia Sinica gshieh@stat.sinica.edu.tw
Outline What is Genetic Network? Why the area is one of the frontiers? How Statistical modeling/computational algorithms simplify the complex puzzle? Applications
Dogma of biology • DNA -> mRNA -> Protein • Proteins: the elements that function in organisms, e.g. yeast and human.
Somatic mutations affect key pathways in Lung adenocarcinomaNature, Oct.2008
Complex human disease • Digenic effects may underlie: • Type II diabetes • Schizophrenia • Retinitis pigmentosa • Glaucoma Tong et al., Science 2004
Complex human disease Elements of genetic networkderived from model organism, e.g. yeast, are likely to be conserved These diseases may have similar synthetic effectin the yeast genetic interaction map The topology of the genetic network of neighborhood of SGS1 (Tong et al., 2004)
Experimental method to reveal genetic interactions • Systematic Genetic Analysis with ordered Arrays of Yeast Deletion Mutants Tong et al., 2001, Science • Global mapping of the Yeast Genetic interaction network Tong et al., 2004, Science Genome landscape of a cell Costanzo et al. 2010,Science
Synthetic sick or lethal (SSL) gene pairs: when both genes are mutated, the organism will die, but neither lethal SSL is important for understanding how an organism tolerates genetic mutations Hartman, Garvik and Hartwell, 2001, Science
3 partially redundant pathways, 2 required Partially redundant genes 2 partially redundant pathways Protein complex tolerating 1 but not 2 destabilizing mutations A B H H B K A F A J F G E C G E C L B B C1 A C2 C D D M D E F D E I I SSL Scenarios resulting in synthetic interaction < 2% < 4% *
A Pattern Recognition Approach to Infer Gene Networks Grace S. Shieh joined with C.-L. Chuang, C.-H. Jen andC.-M. Chen Bioinformatics 2008
Transcriptional Compensation(transcription reverse compensation) interactions (Lesage et al. 2004; Wong & Roth, 2005, Genetics; Kafri et al.,2005, Nature Genetics): among paralogues or SSL gene pairs, when one gene is mutated, its partner gene’s expression increases(decreases) Goal: to predict TC and TRC interactions among SSL gene pairs
Four sets of Yeast (Sachromyces cerevisiae) micro-array gene expression data (Spellman, et al, 1998) were used. The red channel R: intensities of synchronized yeastby alpha factor arrest, arrest of a cdc 15 or cdc 28 mutant and Elutration; The Green channel G: average of non-synchronized.
qRT-PCR experiments For a given pair of SSL genes, Experimental group: gene A’s expression, gene B been knocked out Control group: gene A’s expression, gene B wildtype ifA>> B => A& B may be TC ifA<< B => A& B may be TRC
Gene expression of Transcription Reverse Compensation (TRC) pairs
The dependence of patterns and their associated interactions • Assumption for PARE: the dependence of CP (SP) and TC (TD) interactions is significant. To test this hypothesis: Fisher’s exact test
The Proportion of Complementary Pattern (CP) in TC • Screen genes with significant changes over time by • resulted in 35 gene pairs Fisher’s exact test: p-value < 0.02significant at 95% level
PARE The gene expression of the regulating gene is treated as object contour, and the lagged-1 expression of the target gene the boundary of interest in image segmentation algorithm
Discrete Signals Because gene expression is discrete signal, the 1st- and 2nd-order partial differential terms can be modified as follows: the interaction can be determined as weighted sum of the internal and external energies:
PARE • In this study, each gene is represented by a node in a graphical model, which is denoted by , where i = 1, 2, …, N. The edge represents the gene-gene interaction between and , where the enhancer gene plays a key role in activating or repressing the target gene .
Training set vs test set • Leave-one-out cross validation: among n pairs, use n-1 pairs to train PARE, then predict the left 1 pair, iteratively for n. • 3-fold cross validation: among all pairs, use 2/3 pairs to train, then predict the left 1/3, from all combinations iterative this for N times
Experimental Results (TC/TRC) alpha data set(18 time points)– Table 1. The prediction results, checked against the qRT-PCR experiments *Since 500 times 3-fold CVs were performed, only averages of TPRs are reported.
Experimental Results (TC/TRC) • For the alpha dataset, PARE yields • 71-73% of true-positive rate • prediction accuracy 81% • FPR for predicting TC (TD) interaction was bounded by 12% (10%) genome-wide.
Checking against published literature • These genetic interactions are consistent with the following experimental results: • Sgs1 and Srs2 are known redundant pathwaysin replication(Ira et al., 1999; Lee et al., 1999) • Ex: Srs2 and Sgs1-Top3 suppress crossovers during double stand break repair in yeast.
Sgs1/Top3/Rmi1 and Mus81/Mms4 complex are involved in both double-strand break repair andhomologousrecombination (Frabe et al., 2002). • This indicates that Sgs1/Top3/Rmi1 and Mus81/Mms4 are alternative pathways to resolve recombination intermediates.
Inferring transcriptional interactions • 132 pairs of Activator-target gene (AT) and Repressor-target (RT) gene interactions were collected from published literatures (MIPS, Mewes et al, 1999, Nucleic Acids Research; Gancedo, 1998, Microbiology & Molecular Biology; Draper et al., 1994, Molecular & Cellular Biology, etc)
Test for CP (SP) associatied with RT (AT) pairs in the data Chi-Squared test
Experimental Results (AT/RT) Table 2.The prediction results using Elu data set, checked against the 132 TIs from literatures. *the average of 500 times repeats FPRs for genome-wide TIs predictions, and they are bounded by 21%.
Conclusions • The proposed PARE learns gene expression patterns, then it can predict similar genetic interactions using microarray data. • TPRs of PARE applied to the alpha (Elu) dataset are about 73% (77%) for inferring TC/TD interactions (TI), respectively.
Inferring genesis of obesity in human (join w. Karine & Jean-Daniel Time-course MGED • MGED from • Human adipocyte-derived cell lines • Adipocytes • cells that primarily compose adipose tissue • specialized in storing energy as fat
PARE to infer genesis of obesity in human Training stage: MGED of human adipocytes-derived cell lines • 70known transcriptional interactions (TIs) from iHOP Prediction results: • 40+ pairs of TIs and some genetic interactions predicted • Some are consistent with existingexperimental results, some novel ones
Inferring TIs Data preparation: • Select significantly expressed genes: • P-value < 0.01 • Significantly expressed in at least 1 time point (5 time points in total) ->36 genes with a function of interest Interact with 14 genes of interest (AP2, CCL2, CCL5, LEP, etc…) -> 504 gene pairs
WebPARE: webcomputing service of PARE(Chuang+, Wu+, Cheng and Shieh*, 2010, Bioinformatics) • To provide a simple web-interface for users to inferGIs/TIs using time course gene expression data and existing knowledge, e.g. pre-stored validated TIs in yeast, mouse, human, etc (TRANSFAC)
An example: A list of genes involved in cell cycle and a data set (e.g. Elu) were uploaded to WebPARE, TIs of these pairs were of interest. Using integrated (pre-stored) pairs of TIs in yeast, PARE correctly predicted 118 out of 176 TIs, mTPR=67% e.g. The significant predicted network from 66 pairs -> 46
Demo • WebPARE can be assessed at: http://www.stat.sinica.edu.tw/WebPARE
Acknowledgement Dr. Ting-Fang Wang and Da-Yow Huang, Inst. of Biological Chemistry, Academia Sinica Drs. Karine Clement and J-D. Zucker, INSERM & IRD, France Cheng-Long Chuang, Chin-Yuan Guo, Chia-Chang Wang, Dr. Shi-Fong Guo, Yu-Bin Wang, Jia-Hung Wu Inst. of Statistical Science