1 / 48

Royal Society of Chemistry activities to develop a data repository for chemistry-specific data

Aileen Day, Alexey Pshenichnov, Ken Karapetyan, Colin Batchelor, Peter Corbett, Jon Steele, Valery Tkachenko and Antony Williams, ACS Dallas March 2014. Royal Society of Chemistry activities to develop a data repository for chemistry-specific data. Data in a Scientific Publication.

Download Presentation

Royal Society of Chemistry activities to develop a data repository for chemistry-specific data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aileen Day, Alexey Pshenichnov, Ken Karapetyan, Colin Batchelor, Peter Corbett, Jon Steele, Valery Tkachenko and Antony Williams, ACS Dallas March 2014 Royal Society of Chemistry activities to develop a data repository for chemistry-specific data

  2. Data in a Scientific Publication This is not new, you known the story… So much data of value contained within a publication and delivered in a PDF form PDF files, and especially unclear licensing, don’t allow me at the data so I can rework, reuse, repurpose, text mine etc. I specialize in XXXX. I want a database of YYYY extracted from publications and made available, for free, with capabilities I need, and the publishers should just do it

  3. Many useful discussions…

  4. Many good visions…

  5. And over the years, progress… There is much progress with open access, data access, licensing, enhanced articles, open data, free online tools, open source codes, publishers waking up, scientists contributing We should be excited at what is available now, what the future holds, what opportunities exist in front of us

  6. But it’s NOT easy..technology

  7. But it’s not easy…US Not everything we would like around data handling is there for sure Many systems, tools, platforms are already available but we don’t know about them or even if we did contributing us “more work” “What’s in it for me?”, “It’s my data”, “It’s too much work”, “What credit do I get?”

  8. An Initial “Vague” Vision Set Manage “all” of the chemistry data associated with chemical substances Data to be downloadable, reusable, interactive Build a platform that enables the scientist Data storage, validation, standardization and curation Collaborative data sharing Provide data platform that can enable and enhance publishing of scientific papers

  9. Data Repository Registration of chemical compounds Deposition of chemical syntheses Addition of analytical data Integration to electronic notebooks Rewards and recognition for data sharing Document processing Hosting of data as private, embargoed or public

  10. Solving for Authors

  11. I hate text mining data DERA: Developing pipelining tools for text-mining so we will be able to process documents for mark-up Compound extraction/markup Reaction extraction/conversion Convert “text spectra” to generate spectral libraries… AGGHHHHH!

  12. “Where is the real data please?” FIGURE DATA

  13. Data Preferences - total bias Views of a spectroscopist Give me the data – interactive, downloadable spectrum is way more valuable to me (processed spectrum and FID available) Spectral header in JCAMP standard is very incomplete (and most spectral standards) I want ASSIGNED/ANNOTATED spectra if possible – don’t “textify” a spectrum!

  14. Solving the problem here.. Binary file formats are problematic – think of the variations in instrumentation and software Standards can be defined – are they correctly implemented? CIF and its Checking, Spectral standards - JCAMP versions, Structure formats, etc… Metadata is crucial

  15. …and what does it solve? “Fixing the data” – data can’t be faked as easily Reprocessing of analytical data can be done…weighting functions, baseline correction, deconvolution etc. I can convert and store it locally

  16. But solve it for many things I want molecules as structure formats not images Please don’t make us hack tables of data Tell us how you generated your files – software version, software libraries, etc.

  17. Input data pipeline

  18. Depositions Gateway User Interface

  19. Depositions Gateway User Interface

  20. Depositions Gateway User Interface

  21. Depositions Gateway User Interface

  22. Depositions Gateway User Interface

  23. Document processing

  24. Input data pipeline

  25. Depositions Gateway User Interface

  26. User Interface Approach

  27. User Interface Approach

  28. User Interface Approach

  29. User Interface Approach

  30. Addition of Analytical Data Spectral Container is in development using componentized widgets for display NIST spectra converted into standardized JCAMP format for deposition - 296,103 spectra deposited 10% of remaining NIST spectra need to be curated as there are obvious structure issues

  31. Electronic Notebook Data Development work integrating chemistry into the Southampton Labtrove notebook Stoichiometry table development Analytical data integration “ChemTrove” rolled out to a small test group in January

  32. Present activities – ACS Fall Deposition process development of compounds, reactions and spectral data by Spring FTP, DropBox, Web-upload, ELN integration Compounds, Reactions, Spectral data search, display, download Data sharing – private, public, collaborative Metadata, metadata, metadata standards! Open Sourcing CRD and CVSP

  33. Acknowledgments Jeremy Frey and Simon Coles, University of Southampton Will Dichtel and Leah McEwan, Cornell University Stuart Chalk, University of North Florida Bob Hanson and Bob Lancashire, Jmol and JSpecView

  34. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

More Related