280 likes | 297 Views
Learn about biclustering in microarray data analysis to group genes with similar behavior under specific conditions for more accurate gene expression understanding.
E N D
Gibbs biclustering of microarray data Yves Moreau
From genome projects to transcriptome projects • Microarray cost per expression measurement • Budgets and expertise • Publicly available microarray data • Need for exchange standards & repositories • Big consortia set up big microarray projects • Genome projects “transcriptome” projects (= compendia) • Change in microarray projects ( sequence analysis) • Analyze public data first to generate an hypothesis • Design and perform your own microarray experiment CBS Microarray Course
Why biclustering? • Data becomes more heterogeneous • Gene clustering • Group genes that behave similarly over all conditions • Gene biclustering • Group genes that behave similarlyover a subset of conditions • “Feature selection” • More suitable for heterogeneous compendium CBS Microarray Course
Discretized microarray data set Discretizing microarray data Microarray data is continuous Discretize by equal frequency High Medium Low Distribution of expression values for a given gene Bicluster genes conditions CBS Microarray Course
Bicluster CBS Microarray Course
1 0 Pattern Background Likelihood CBS Microarray Course
1 0 Likelihood .9.9.9.9.9 .9.05.9.9.9 .9.9.9.9.9 .05.9.9.9.9 .9.9.9.9.05 CBS Microarray Course
1 0 Likelihood Get the right genes .9.05.05.05.9 .05.9.9.05.05 .05.05.05.05.05 .05.05.9.9.05 CBS Microarray Course
1 Likelihood 0 Get the right conditions .9.9.05.05.9 .9.05.05.9.9 .9.9 .05 .05.9 .05.9.05 .05.9 .9.9 .05 .05.05 CBS Microarray Course
1 Likelihood 0 Get the right frequency pattern .6.6.2.2.6 .6.2.2.2.6 .6.6.2.2.6 .2.6.2.2.6 .2.6.2.2.2 CBS Microarray Course
Optimizing the bicluster • Find the right bicluster • Genes • Conditions • Pattern • For a given choice of genes and conditions, the “best” pattern is given by the frequencies found in the extracted pattern • No more need to optimize over the pattern • Maximum likelihood: find genes and conditions that maximize • Gibbs sampling: find genes and conditions that optimize CBS Microarray Course
Gibbs sampling Current configuration Next gene configuration CBS Microarray Course
Updated gene configuration Next complete configuration iterate many times CBS Microarray Course
Gibbs biclustering CBS Microarray Course
Simulated data CBS Microarray Course
Remarks • Gibbs biclustering allows noisy patterns • Optimized configuration is obtained by averaging successive iterated configurations • Biclustering is oriented • Find subset of samples for which a subset of genes is consistenly expressed across genes • Find subset of genes that are consistently expressed across a subset of samples • Searching for multiple patterns • For gene biclustering, remove the data of the genes from the current bicluster • Search for a new pattern • Stop if only empty pattern repeatedly found CBS Microarray Course
Multiple biclusters CBS Microarray Course
Leukemia fingerprints CBS Microarray Course
Mixed-Lineage Leukemia • Armstrong et al., Nature Genetics, 2002 • Mixed-Lineage Leukemia (MLL) is a subtype of ALL • Caused by chromosomal rearrangement in MLL gene • Poorer prognosis than ALL • Microarray analysis shows that MLL is distinct from ALL • FLT3 tyrosine kinase distinguishes most strongly between MLL, ALL, and AML • Candidate drug target CBS Microarray Course
PCA Features CBS Microarray Course
Biclustering leukemia data • Bicluster patients • Find patients for which a subset of genes has a consistent expression profile across this group of patients • Discovery set • 21 ALL, 17 MLL, 25 AML • Validation set • 3 ALL, 3 MLL, 3 AML CBS Microarray Course
Discovering ALL • Bicluster 1: 18 out of 21 ALL patients CBS Microarray Course
Discovering MLL • Bicluster 2: 14 out of 17 MLL patients CBS Microarray Course
Discovering AML • Bicluster 3: 19 out of 25 AML patients CBS Microarray Course
Rescoring ALL CBS Microarray Course
Rescoring MLL CBS Microarray Course
Rescoring AML CBS Microarray Course