420 likes | 557 Views
Pathway Analysis Karl Brand, June 2012. overview. 1. goal 2. annotation 3. tools (various approaches, pros & cons) 4. underlying statistics (Fisher’s exact test) 5. in use (DAVID) 6. to summarise. goal. To understand genomics results &/or Translate genomics data into knowledge &/or
E N D
overview 1. goal 2. annotation 3. tools (various approaches, pros & cons) 4. underlying statistics (Fisher’s exact test) 5. in use (DAVID) 6. to summarise
goal To understand genomics results &/or Translate genomics data into knowledge &/or “…for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power”1 To facilitate generating a testable hypothesis 1Khatri et al., 2012
tools You have : Applied methods to identify differentially regulated biological entities (BEs), e.g. p < 0.05 with fold change greater than 1.5 What now? You could pass this list to your chosen pathway analysis tool, but first…
annotation: a modern problem Synonyms Homonyms Acronyms Different names for the same biological entity Same name for different biological entities Reduced words representing biological entities PAP, alias for: • PAP (Pancreatitis-associated protein) • MRPS30 (Mitochond ribosomal prot 30S) • PAPOLA (Poly(A) polymerase alpha) 5418 genes with synonyms (38% of total) SCT stands for: • Stem cell transplant • Secretin • Salmon calcitonin
Dutch printed map 1600’s Discoveries of Willem Jansz: 1606 is the first recorded European discovery of Australia (New Holland) at Cape York Peninsula annotation Slide by A. Stubbs
annotation • And now! • Post Genome view of the world Slide by A. Stubbs
These changes reflect new information or analysis The frequency of the changes can be problematic Attempts made to ‘hide’ this IDs merged/ deleted/ temporarily un-mapped on the genome sequence Even common concepts such as Genes Boundaries move, TF Binding Sites discovered annotation Database (and their IDs) Change Over Time… The Shifting Sands of Databases and Genome builds… “M. Moorhouse” Slide by M. Moorhouse
annotation Khatri et al., 2012
annotation Khatri et al., 2012
tools You have : Applied methods to identify differentially expressed gene’s* (DEGs), e.g. p < 0.05 with fold change greater than 1.5 What now? You could pass this list to your chosen pathway analysis tool, but first… ensure you have mapped your identifiers to the latest annotations. And then what? *or proteins, metabolites
tools You get the latest pathway analysis tools... February 2012 | Volume 8 | Issue 2
tools February 2012 | Volume 8 | Issue 2 Huang et al., 2009
tools February 2012 | Volume 8 | Issue 2 Khatri et al., 2012
tools • First generation - over representation analysis (ORA) • aka singular enrichment analysis (SEA) • e.g. EASE, DAVID, IPA* • 0. Use parametric statistics to identify DEGs, e.g. limma • 1. Choose significance level e.g. FDR < 0.05, FC > 1.5 • 2. Use parametric statistics to identify annotations over represented within your list compared to what was assayed e.g. Fisher’s exact test *disclosure – our department has a licensing agreement with Ingenuity Systems, Inc.
tools First generation - over representation analysis (ORA) Caveats: 1. thresholdness – what about the transcript with p = 0.050001, FC = 1.4999 2. equality, transcript-X with p = 0.0000001, FC = 100 considered equal to trans-Y p = 0.049, FC = 1.51 3. assumption of independence between both genes and pathways inflates significance 4. ignores relationships between genes/gene products 5. significance increases with population size
tools • Second generation – gene set enrichment analysis (GSEA), • aka functional class scoring (FCS) • e.g. GSEA, GlobalTest, Gazer, IPA • Use parametric statistics to determine DE for all genes • e.g. t-distribution statistics • 2. Use various statistics to combine gene statistics and determine pathway statistics e.g. Wilcoxon rank sum, Kolmogorov-Smirnov • 3. Permute phenotypes and pathways to determine pathway significance
tools Second generation – gene set enrichment analysis (GSEA) Overcomes most ORA limitations, except… Caveats: 1. assumes independence between pathways 2. dependence on ranking approaches miss magnitude of changes between phenotypes, i.e., sham FC = 10; treated similar FC = 100 3. ignores relationships between genes/gene products 4. difficult/can not use your own special list - not an issue for ORA
tools • Third generation – pathway topology (PT), • aka modular enrichment analysis (MEA) • e.g. DAVID, SPIA, IPA • Use various statistics to determine differences in gene-gene* interactions** for all genes • e.g. Pearson’s correlation • 2. Use various statistics to combine gene interaction statistics and determine pathway significance e.g. permutation, hypergeometric distribution *aka node-node **edges
tools • Third generation – pathway topology (PT) • Caveats: • limited interaction knowledge, i.e., thus hampered by immature interaction databases (KEGG, BioCarta, Reactome, PantherDB etc.) • Not to mention a lack of cellular and temporal resolution of interactions.
underlying statistics Fisher’s exact test demonstration (if time permits)
in use DAVID
in use • DAVID • Keep in mind, before uploading: • does you list of DEGs contain gene’s expected a priori? • have you generated at least three* list’s with different cutoffs e.g. p < 0.05 / 0.01 , FC > 1.3 / 1.5 • And after uploading: • are the pathway(s) expected a priori, identified in your analysis? *only for ORA analysis
in use • DAVID • Withstood the test of time (released 2003) • proven functionality – highly cited • 2. comprehensive – many databases accessible • 3. feature rich – ORA, MEA, annotation mapping, etc. • 4. constantly updated & maintained – v6.7 • 5. well supported – personal experience • 6. easy to use, well documented • 7. free as in gratis
in use DAVIDhome
in use DAVID upload
in use DAVID list management *Ariel Pink's Haunted Graffiti
in use DAVID background selection
in use DAVID functional annotation chart
in use DAVID functional annotation chart (options)
in use DAVID dowload results
in use DAVID results in spreadsheet
in use DAVID results in spreadsheet
in use DAVID functional annotation clustering
in use DAVID functional annotation clustering
to summarise • 1. choose your analysis approach: • ORA if you must use your own special gene list • GSEA or PT, in addition to ORA, where possible
to summarise DAVID Khatri et al., 2012
to summarise • 1. choose your analysis approach: • ORA if you must use your own special gene list • GSEA or PT, in addition to ORA, where possible • 2. use a range of cut-offs for ORA analysis • 3. verify gene lists and pathway analysis output with a priori biology • 4. choose free (gratis & libre) tools where possible, in addition to proprietary apps
questions? k.brand@erasmusmc.nl