TRD 2 Update: An annotation scheme to foster reproducible NMR data analysis

TRD 2 Update: An annotation scheme to foster reproducible NMR data analysis Matt Fenwick, Eldon Ulrich, Michael Gryk

Overview of NMR spectral analysis peak-picking: distinguishing S from N true positives from false resonance assignment NOESY peak assignment semi-automated - software tools - human intervention required human uses deductive process of reasoning - small set of rules/expectations (library) - deductions may be logically dependent on each other L10 + A5 +

Problem: Missing Data -> Irreproducible Much intermediate data is not saved / deposited - step order - logical dependencies - deductive reasoning - peculiarities found and their resolutions (unexpected, missing, extra peaks) final data - resonances, spin systems - extraneous data -- contaminants, noise, artifacts, anomalies ...

Missing Data: Spin Systems & Resonances NMR experiments are designed to exploit networks of coupled spins (spin systems). The assignment process is 2-step: (1) assign resonances to spin systems, (2) assign spin-systems to residues Resonance and spin-systems are not deposited. Images are from Protein NMR: A Practical Guide (http://www.protein-nmr.org.uk/)

Solution 1. capture process of reasoning - version control: capture intermediate states - model of commonly used deductive reasons - annotate changeset with deductive reasons 2. capture complete final data set - model for identifying problems - model for extraneous data - deposit full results

1. version control -- snapshots, commit message snapshots of intermediate states: enables backtracking, inspecting of past states describe difference between consecutive snapshots; summary, purpose, justification, questions, uncertainties

1. model of NMR deductive reasoning start with CCPN data model augment with library of common deductive reasons use deductive reasons to annotate commits

2. model: identify problems (distinguishing signal from noise; true positives, false positives, false negatives) facilitates re-interpretation, if additional data is collected, by pointing out trouble spots unassigned signal peak missing CB peaks of Gln sidechain

2. extraneous data, full results collaborate with BMRB: deposit full data sets extend NMR-Star data dictionary extend Sparky assignment program noise & artifact peaks, unassigned spin systems, contaminants, anomalies, ...

Review: Solution 1. process of reasoning - version control: capture intermediate states - model of commonly used deductive reasons - annotate changeset with deductive reasons 2. final data - model for identifying problems - model for extraneous data - deposit full results

Challenges? - human/computer optimization - simple enough for users to apply properly, vs. detailed enough that a program can understand complete context of an annotation - separate layers: use more/less detail as needed - (future) tools can increase level of detail without bogging humans down - future compatibility - library of annotations provides “guidance”; extensions can be trivially added by augmenting library - if there’s a problem with the library of annotations, can fix by extending (providing a new, similar annotation) - tooling - Sparky

Annotation Mock up (STAR-like format) loop_ # spin-system/amino-acid-type assignment _SSAA_Assn.ID _SSAA_Assn.SS_ID _SSAA_Assn.AA_ID ... ... 101 52 Alanine stop_ loop_ # peak/spin-system assignment _Peak_SS_Assn.ID _Peak_SS_Assn.SS_ID _Peak_SS_Assn.Peak_ID _Peak_SS_Assn.Peak_Spectrum ... ... 175 52 124 HNCACB 176 52 125 HNCACB 177 52 126 HNCACB 178 52 127 HNCACB stop_ save_ data_example save_assign loop_ # tags _Tag.ID _Tag.Parent_ID ... ... 24 23 stop_ loop_ # reasons used _Tag_Reason.ID _Tag_Reason.Tag_ID _Tag_Reasons.Name ... ... 73 24 "BMRB statistics" 74 24 "chemical shift grouping" stop_

Impact - reproducibility - error detection - error correction - collaboration - sharing - learning - analysis quality - amenability to future analysis

Appendix: NMR phenomena: grouping resonances based on chemical shift

Appendix: extraneous data: processing artifacts, spurious peaks

Appendix: Library examples Asn sidechain Ala backbone sequential spin systems

TRD 2 Update: An annotation scheme to foster reproducible NMR data analysis

TRD 2 Update: An annotation scheme to foster reproducible NMR data analysis

Presentation Transcript

Data-Flow Analysis II

EPN Analysis Update

Exploratory Data Analysis

QUALITATIVE DATA ANALYSIS

Dr. Ka-fu Wong

Data Analysis Overview

Recruiting Foster Families

Data analysis

Exploratory Data Analysis and Data Visualization

A Plea for Adaptive Data Analysis: Instantaneous Frequencies and Trends For Nonstationary Nonlinear Data

Difference between Structured Analysis and Object Oriented Analysis?

OVERVIEW of AB 12: Focus on Foster Family Agencies and Group Home Providers

Functional Annotation

Econometric Analysis of Panel Data

INTRODUCTION TO SYMBOLIC DATA ANALYSIS

Languages for the Annotation and Specification of Dialogues (updated 31-Oct-2001)

A generic and modular platform for automated sequence processing and annotation

MetaCore data analysis suite and functional analysis

Annotation as Algebra: a formal framework for linguistic annotation

Annotation as Algebra: a formal framework for linguistic annotation

Data collection and analysis

Requirements Analysis-1