Pathway Analysis Karl Brand, June 2012

Pathway AnalysisKarl Brand, June 2012

overview 1. goal 2. annotation 3. tools (various approaches, pros & cons) 4. underlying statistics (Fisher’s exact test) 5. in use (DAVID) 6. to summarise

goal To understand genomics results &/or Translate genomics data into knowledge &/or “…for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power”1 To facilitate generating a testable hypothesis 1Khatri et al., 2012

tools You have : Applied methods to identify differentially regulated biological entities (BEs), e.g. p < 0.05 with fold change greater than 1.5 What now? You could pass this list to your chosen pathway analysis tool, but first…

annotation

annotation: a modern problem Synonyms Homonyms Acronyms Different names for the same biological entity Same name for different biological entities Reduced words representing biological entities PAP, alias for: • PAP (Pancreatitis-associated protein) • MRPS30 (Mitochond ribosomal prot 30S) • PAPOLA (Poly(A) polymerase alpha) 5418 genes with synonyms (38% of total) SCT stands for: • Stem cell transplant • Secretin • Salmon calcitonin

Dutch printed map 1600’s Discoveries of Willem Jansz: 1606 is the first recorded European discovery of Australia (New Holland) at Cape York Peninsula annotation Slide by A. Stubbs

annotation • And now! • Post Genome view of the world Slide by A. Stubbs

These changes reflect new information or analysis The frequency of the changes can be problematic Attempts made to ‘hide’ this IDs merged/ deleted/ temporarily un-mapped on the genome sequence Even common concepts such as Genes Boundaries move, TF Binding Sites discovered annotation Database (and their IDs) Change Over Time… The Shifting Sands of Databases and Genome builds… “M. Moorhouse” Slide by M. Moorhouse

annotation Khatri et al., 2012

annotation

tools You have : Applied methods to identify differentially expressed gene’s* (DEGs), e.g. p < 0.05 with fold change greater than 1.5 What now? You could pass this list to your chosen pathway analysis tool, but first… ensure you have mapped your identifiers to the latest annotations. And then what? *or proteins, metabolites

tools You get the latest pathway analysis tools... February 2012 | Volume 8 | Issue 2

tools February 2012 | Volume 8 | Issue 2 Huang et al., 2009

tools February 2012 | Volume 8 | Issue 2 Khatri et al., 2012

tools • First generation - over representation analysis (ORA) • aka singular enrichment analysis (SEA) • e.g. EASE, DAVID, IPA* • 0. Use parametric statistics to identify DEGs, e.g. limma • 1. Choose significance level e.g. FDR < 0.05, FC > 1.5 • 2. Use parametric statistics to identify annotations over represented within your list compared to what was assayed e.g. Fisher’s exact test *disclosure – our department has a licensing agreement with Ingenuity Systems, Inc.

tools First generation - over representation analysis (ORA) Caveats: 1. thresholdness – what about the transcript with p = 0.050001, FC = 1.4999 2. equality, transcript-X with p = 0.0000001, FC = 100 considered equal to trans-Y p = 0.049, FC = 1.51 3. assumption of independence between both genes and pathways inflates significance 4. ignores relationships between genes/gene products 5. significance increases with population size

tools • Second generation – gene set enrichment analysis (GSEA), • aka functional class scoring (FCS) • e.g. GSEA, GlobalTest, Gazer, IPA • Use parametric statistics to determine DE for all genes • e.g. t-distribution statistics • 2. Use various statistics to combine gene statistics and determine pathway statistics e.g. Wilcoxon rank sum, Kolmogorov-Smirnov • 3. Permute phenotypes and pathways to determine pathway significance

tools Second generation – gene set enrichment analysis (GSEA) Overcomes most ORA limitations, except… Caveats: 1. assumes independence between pathways 2. dependence on ranking approaches miss magnitude of changes between phenotypes, i.e., sham FC = 10; treated similar FC = 100 3. ignores relationships between genes/gene products 4. difficult/can not use your own special list - not an issue for ORA

tools • Third generation – pathway topology (PT), • aka modular enrichment analysis (MEA) • e.g. DAVID, SPIA, IPA • Use various statistics to determine differences in gene-gene* interactions** for all genes • e.g. Pearson’s correlation • 2. Use various statistics to combine gene interaction statistics and determine pathway significance e.g. permutation, hypergeometric distribution *aka node-node **edges

tools • Third generation – pathway topology (PT) • Caveats: • limited interaction knowledge, i.e., thus hampered by immature interaction databases (KEGG, BioCarta, Reactome, PantherDB etc.) • Not to mention a lack of cellular and temporal resolution of interactions.

underlying statistics Fisher’s exact test demonstration (if time permits)

in use DAVID

in use • DAVID • Keep in mind, before uploading: • does you list of DEGs contain gene’s expected a priori? • have you generated at least three* list’s with different cutoffs e.g. p < 0.05 / 0.01 , FC > 1.3 / 1.5 • And after uploading: • are the pathway(s) expected a priori, identified in your analysis? *only for ORA analysis

in use • DAVID • Withstood the test of time (released 2003) • proven functionality – highly cited • 2. comprehensive – many databases accessible • 3. feature rich – ORA, MEA, annotation mapping, etc. • 4. constantly updated & maintained – v6.7 • 5. well supported – personal experience • 6. easy to use, well documented • 7. free as in gratis

in use DAVIDhome

in use DAVID upload

in use DAVID list management *Ariel Pink's Haunted Graffiti

in use DAVID background selection

in use DAVID functional annotation chart

in use DAVID functional annotation chart (options)

in use DAVID dowload results

in use DAVID results in spreadsheet

in use DAVID functional annotation clustering

to summarise • 1. choose your analysis approach: • ORA if you must use your own special gene list • GSEA or PT, in addition to ORA, where possible

to summarise DAVID Khatri et al., 2012

to summarise • 1. choose your analysis approach: • ORA if you must use your own special gene list • GSEA or PT, in addition to ORA, where possible • 2. use a range of cut-offs for ORA analysis • 3. verify gene lists and pathway analysis output with a priori biology • 4. choose free (gratis & libre) tools where possible, in addition to proprietary apps

questions? k.brand@erasmusmc.nl

Pathway Analysis Karl Brand, June 2012

Pathway Analysis Karl Brand, June 2012

Presentation Transcript

Pathway analysis using BioConductor

Pathway Analysis

Pathway analysis Daniel Hurley

Brand Analysis

June 2012

Pathway Analysis

June 2012

June 2012

June, 2012

Clustering and Pathway Analysis

Pathway Analysis Tools

Metabolic pathway analysis

Pathway Analysis

BRAND ANALYSIS

Clustering and pathway analysis

Brand Analysis

June 2012

BRAND PRESENTATION 2012

Pathway Risk Analysis: NAPPO RSPM 31 (2012)