1 / 34

Unless otherwise indicated slides licensed under

Doing Science in the Digital Age Software, Skills and Sociology http://dx.doi.org/10.6084/m9. figshare.957527 TGAC Science Symposia series , 11 March 2014 Neil Chue Hong (@ npch ), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | N.ChueHong@software.ac.uk. Project funding from.

trynt
Download Presentation

Unless otherwise indicated slides licensed under

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Doing Science inthe Digital AgeSoftware, Skills and Sociologyhttp://dx.doi.org/10.6084/m9.figshare.957527TGAC Science Symposia series, 11 March 2014Neil Chue Hong (@npch), Software Sustainability InstituteORCID: 0000-0002-8876-7606 | N.ChueHong@software.ac.uk Project funding from Supported by Unless otherwise indicatedslides licensed under

  2. Four Paradigms of Research Empirical Theoretical Computational Data Exploration

  3. Water Swap Reaction Coordinate A water-swap reaction coordinate for the calculation of absolute protein-ligand binding free energies Woods CJ, Malaisree M, Hannongbua S, Mulholland AJ J. Chem. Phys. (2011) vol. 134, pp. 054114 http://dx.doi.org/10.1063/1.3519057

  4. Pleiotropic loci Selection at pleiotropic loci underlies diseaseco-occurrence in human populations. Navarro, Haley, Karosas et al. Submitted to Nature Genetics

  5. Behind every great piece of science… #go through each SNP of interest for(my $x = 0; $x < scalar @pos; $x++) { #and then each downstream SNP of interest for(my $y = $x+1; $y < scalar @pos; $y++) { #if SNPs within our chosen distance (500kb) and both present in the haplotypesfile if((!($trait[$x] eq $trait[$y])) && (abs($pos[$x] - $pos[$y]) <= 500000) && (exists($legArrayPos{$pos[$x]})) && (exists($legArrayPos{$pos[$y]}))) { my $snp1ArrayPos = "”; my $snp2ArrayPos = "”; my $snp1All = "”; my $snp2All = "”; #create output file for this SNP pair my $filename = "ConditionedResults2/$chr[$x].$pos[$x]-$pos[$y].EHH.GBR.2.txt”; print "$filename\n”; unless (-e $filename) { open(OUT, ">$filename"); #####################CHANGE THESE IF NOT FOCUSING ON SECOND SNP######################### my $start = $pos[$y]-500000; if ($start < 1) { $start = 1; } my $end = $pos[$y]+500000; if ($end > $chrLengths{$chr[$x]}) { $end = $chrLengths{$chr[$x]}; }

  6. The modern researcher… • … worries about: • Data management and analysis • Reproducible research • Scalable simulations • Integration of models and workflows • Collaboration Where do they learn how to do this? Picture of Otto Stern courtesy of Emilio Segre Visual Archives

  7. Observation 1:Software is pervasive across researchCorollary: software is bleeding edge and long-tail Demanding users are coming from arts + humanities, economics, and social science as well as sciences

  8. Observation 2:A culture of re-use rather than re-invention is not widespreadCorollary: we have wasted effort and increased siloing

  9. Observation 3:Many people are “embarrassed” about software Corollary: something is broken in the way we regard, recognise and reward software

  10. The Research Cycle Research Outputs Research is a continuous cycle. When we publish we are contributing to the body of knowledge. Interpret Data Test Revise Publish Paper Software Create

  11. Research/Reuse/Reward Cycle Research Reuse Reuse is also a cycle. We build our research on the work of others. Reward mechanisms should encourage reuse. Interpret Index Test Revise Publish Identify Create Reward Cite

  12. The current process Startresearch Writesoftware Usesoftware Produce results Publishresearchpaper Which mentions software and data Release data This process is simple but does not reward production orreuse of good software and data. It also has a long contribution cycle. Release software

  13. A better process? Startresearch Writesoftware Adapt/extendsoftware Usesoftware Produce results Publishresearchpaper Identify existingsoftware Release software Release data Which references software and data papers Software and data papers are needed as proxies for rewarding reuse. But it enables a shorter contribution cycle for data and software. Publish software paper Publish data paper

  14. Boundary • What do we choose to identify: • Workflow? • Software that runs workflow? • Software referenced by workflow? • Software dependencies? • What’s the minimum citable part?

  15. Granularity Function Algorithm Program Library / Suite / Package …

  16. Versioning • Why do we version? • To indicate a change • To allow sharing • To confer special status Public v1 Public v2 Public v3 Personal v3 Personal v3a Personal v1 Personal v2 Personal v2a Personal v2a

  17. Authorship Authorship • Which authors have had what impact on each version of the software? • Who had the largest contribution to the scientific results in a paper? • http://beyond-impact.org/?p=175 OGSA-DAI projects statistics from Ohloh

  18. Observation 4:This is all getting just a little confusingCorollary: maybe we need to get on to firmer conceptual ground

  19. The Foundations of Digital Research Re-usable Re-producible www.rse.ac.uk www.software.ac.uk/blog/ 2012-11-09-craftsperson-and-scholar Software software.ac.uk/blog/2012-08-16-what-research-software-community-and-why-should-you-care Software www.software.ac.uk/blog/2011-05-02-publish-or-be-damned-alternative-impact-manifesto-research-software www.software.ac.uk/ software-evaluation-guide resources/guides software-carpentry training Software Prlić A, Procter JB (2012) Ten Simple Rules for the Open Development of Scientific SoftwarePLoSComputBiol 8(12): e1002802. doi:10.1371/journal.pcbi.1002802 Wilson G, et al. (2014) Best Practices for Scientific ComputingPLoSBiol 12(1): e1001745. doi:10.1371/journal.pbio.1001745

  20. Gap 1: Software Skills Training Research Focussed (methods) Summer Schools Software Carpentry Who fills this gap? Doctoral Training HPC Short Courses MSc in HPC / scientific computing Advanced HPC Training Programming Focussed (Tools) Programming 101 Programming 201 Basic Advanced

  21. Gap 2: Lack of recognition and reward • There is an anachronism in the way we conduct and recognise research? • REF references software as an output but it is still not easy to get recognition – peer review fails • Software careers • Researchers who use software • Researcher-Developers • Research Software Engineers • Research Software Support • Research Systems Providers

  22. Gap 3: Software Maturity and Management Not all software should make it to the next stage Management changes through time, requiring planning Software proliferation Innovation Consolidation Customisation Time

  23. Standing on the shoulders of giants • “If I have seen further it is by standing on the shoulders of giants” • Isaac Newton • As researchers we are honour-bound to share our knowledge so that all may benefit

  24. Observation 5:Most of the issues are not technical, they’re socialCorollary: we can do something to change them

  25. Careers outside academic sector Career Paths in UK Non-university Research (industry,government etc.) UK STEM graduate career paths PhD students Early Career Research PermanentResearch Staff Professor Source: The Scientific Century, Royal Society, 2010 (revised to reflect first stage clarification from “What Do PhD’s Do?” study)

  26. We are scienceHear us roar! Picture by Tamako the Jaguar

  27. Shake up the system • “Swim or drown” is not an efficient learning method • “Publish or perish” is not an effective reward mechanism • “Becoming a Professor” is not a scalable career path • “I’ll just have to do it myself” is not a modern way of doing science

  28. The Software Sustainability Institute A national facility for cultivating world-class research through software • Better software enables better research • Software reaches boundaries in its development cycle that prevent improvement, growth and adoption • Providing the expertise and services needed to negotiate to the next stage • Developing the policy and tools tosupport the community developing andusing research software Better software Better research Supported by EPSRC Grant EP/H043160/1

  29. Campaigning for careers http://www.rse.ac.uk/ www.rse.ac.uk

  30. Nurturinga training community • Bringing together 39+ organisations with interest in e-Infrastructure training • Raising issues and enablers with RCUK, BIS software.ac.uk/policy

  31. SSI Fellows 2014 • 2014: 16 fellows • 2013: 15 fellows • 2012: 10 fellows • Range of subjects, career stages software.ac.uk/fellows

  32. The Role of Software in Reproducible Research 6th Collaborations Workshop, Oxford 26-28th March 2014 Welcome to the CW14 Organised by the Software Sustainability Institute Sponsored by Microsoft Research and Github #CollabW14 software.ac.uk/cw14

  33. Publicise your software http://openresearchsoftware.metajnl.com http://dx.doi.org/10.6084/m9.figshare.942289

  34. What you can do now • Read the Best Practices for Scientific Computing • http://dx.doi.org/10.1371/journal.pbio.1001745 • Release your code and publish it in a journal • http://bit.ly/softwarejournals • Learn new software skills and pass them on to others • http://www.software-carpentry.org/ • Ask for software and data if you’re reviewing a paper • Forge a career in research, and change it for those coming behind you • The DOI for this presentation: 10.6084/m9.figshare.957257 • The Software Sustainabilty Institute is a collaboration between universities of Edinburgh, Manchester, Oxford and Southampton. Supported by EPSRC Grant EP/H043160/1.

More Related