340 likes | 444 Views
Doing Science in the Digital Age Software, Skills and Sociology http://dx.doi.org/10.6084/m9. figshare.957527 TGAC Science Symposia series , 11 March 2014 Neil Chue Hong (@ npch ), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | N.ChueHong@software.ac.uk. Project funding from.
E N D
Doing Science inthe Digital AgeSoftware, Skills and Sociologyhttp://dx.doi.org/10.6084/m9.figshare.957527TGAC Science Symposia series, 11 March 2014Neil Chue Hong (@npch), Software Sustainability InstituteORCID: 0000-0002-8876-7606 | N.ChueHong@software.ac.uk Project funding from Supported by Unless otherwise indicatedslides licensed under
Four Paradigms of Research Empirical Theoretical Computational Data Exploration
Water Swap Reaction Coordinate A water-swap reaction coordinate for the calculation of absolute protein-ligand binding free energies Woods CJ, Malaisree M, Hannongbua S, Mulholland AJ J. Chem. Phys. (2011) vol. 134, pp. 054114 http://dx.doi.org/10.1063/1.3519057
Pleiotropic loci Selection at pleiotropic loci underlies diseaseco-occurrence in human populations. Navarro, Haley, Karosas et al. Submitted to Nature Genetics
Behind every great piece of science… #go through each SNP of interest for(my $x = 0; $x < scalar @pos; $x++) { #and then each downstream SNP of interest for(my $y = $x+1; $y < scalar @pos; $y++) { #if SNPs within our chosen distance (500kb) and both present in the haplotypesfile if((!($trait[$x] eq $trait[$y])) && (abs($pos[$x] - $pos[$y]) <= 500000) && (exists($legArrayPos{$pos[$x]})) && (exists($legArrayPos{$pos[$y]}))) { my $snp1ArrayPos = "”; my $snp2ArrayPos = "”; my $snp1All = "”; my $snp2All = "”; #create output file for this SNP pair my $filename = "ConditionedResults2/$chr[$x].$pos[$x]-$pos[$y].EHH.GBR.2.txt”; print "$filename\n”; unless (-e $filename) { open(OUT, ">$filename"); #####################CHANGE THESE IF NOT FOCUSING ON SECOND SNP######################### my $start = $pos[$y]-500000; if ($start < 1) { $start = 1; } my $end = $pos[$y]+500000; if ($end > $chrLengths{$chr[$x]}) { $end = $chrLengths{$chr[$x]}; }
The modern researcher… • … worries about: • Data management and analysis • Reproducible research • Scalable simulations • Integration of models and workflows • Collaboration Where do they learn how to do this? Picture of Otto Stern courtesy of Emilio Segre Visual Archives
Observation 1:Software is pervasive across researchCorollary: software is bleeding edge and long-tail Demanding users are coming from arts + humanities, economics, and social science as well as sciences
Observation 2:A culture of re-use rather than re-invention is not widespreadCorollary: we have wasted effort and increased siloing
Observation 3:Many people are “embarrassed” about software Corollary: something is broken in the way we regard, recognise and reward software
The Research Cycle Research Outputs Research is a continuous cycle. When we publish we are contributing to the body of knowledge. Interpret Data Test Revise Publish Paper Software Create
Research/Reuse/Reward Cycle Research Reuse Reuse is also a cycle. We build our research on the work of others. Reward mechanisms should encourage reuse. Interpret Index Test Revise Publish Identify Create Reward Cite
The current process Startresearch Writesoftware Usesoftware Produce results Publishresearchpaper Which mentions software and data Release data This process is simple but does not reward production orreuse of good software and data. It also has a long contribution cycle. Release software
A better process? Startresearch Writesoftware Adapt/extendsoftware Usesoftware Produce results Publishresearchpaper Identify existingsoftware Release software Release data Which references software and data papers Software and data papers are needed as proxies for rewarding reuse. But it enables a shorter contribution cycle for data and software. Publish software paper Publish data paper
Boundary • What do we choose to identify: • Workflow? • Software that runs workflow? • Software referenced by workflow? • Software dependencies? • What’s the minimum citable part?
Granularity Function Algorithm Program Library / Suite / Package …
Versioning • Why do we version? • To indicate a change • To allow sharing • To confer special status Public v1 Public v2 Public v3 Personal v3 Personal v3a Personal v1 Personal v2 Personal v2a Personal v2a
Authorship Authorship • Which authors have had what impact on each version of the software? • Who had the largest contribution to the scientific results in a paper? • http://beyond-impact.org/?p=175 OGSA-DAI projects statistics from Ohloh
Observation 4:This is all getting just a little confusingCorollary: maybe we need to get on to firmer conceptual ground
The Foundations of Digital Research Re-usable Re-producible www.rse.ac.uk www.software.ac.uk/blog/ 2012-11-09-craftsperson-and-scholar Software software.ac.uk/blog/2012-08-16-what-research-software-community-and-why-should-you-care Software www.software.ac.uk/blog/2011-05-02-publish-or-be-damned-alternative-impact-manifesto-research-software www.software.ac.uk/ software-evaluation-guide resources/guides software-carpentry training Software Prlić A, Procter JB (2012) Ten Simple Rules for the Open Development of Scientific SoftwarePLoSComputBiol 8(12): e1002802. doi:10.1371/journal.pcbi.1002802 Wilson G, et al. (2014) Best Practices for Scientific ComputingPLoSBiol 12(1): e1001745. doi:10.1371/journal.pbio.1001745
Gap 1: Software Skills Training Research Focussed (methods) Summer Schools Software Carpentry Who fills this gap? Doctoral Training HPC Short Courses MSc in HPC / scientific computing Advanced HPC Training Programming Focussed (Tools) Programming 101 Programming 201 Basic Advanced
Gap 2: Lack of recognition and reward • There is an anachronism in the way we conduct and recognise research? • REF references software as an output but it is still not easy to get recognition – peer review fails • Software careers • Researchers who use software • Researcher-Developers • Research Software Engineers • Research Software Support • Research Systems Providers
Gap 3: Software Maturity and Management Not all software should make it to the next stage Management changes through time, requiring planning Software proliferation Innovation Consolidation Customisation Time
Standing on the shoulders of giants • “If I have seen further it is by standing on the shoulders of giants” • Isaac Newton • As researchers we are honour-bound to share our knowledge so that all may benefit
Observation 5:Most of the issues are not technical, they’re socialCorollary: we can do something to change them
Careers outside academic sector Career Paths in UK Non-university Research (industry,government etc.) UK STEM graduate career paths PhD students Early Career Research PermanentResearch Staff Professor Source: The Scientific Century, Royal Society, 2010 (revised to reflect first stage clarification from “What Do PhD’s Do?” study)
We are scienceHear us roar! Picture by Tamako the Jaguar
Shake up the system • “Swim or drown” is not an efficient learning method • “Publish or perish” is not an effective reward mechanism • “Becoming a Professor” is not a scalable career path • “I’ll just have to do it myself” is not a modern way of doing science
The Software Sustainability Institute A national facility for cultivating world-class research through software • Better software enables better research • Software reaches boundaries in its development cycle that prevent improvement, growth and adoption • Providing the expertise and services needed to negotiate to the next stage • Developing the policy and tools tosupport the community developing andusing research software Better software Better research Supported by EPSRC Grant EP/H043160/1
Campaigning for careers http://www.rse.ac.uk/ www.rse.ac.uk
Nurturinga training community • Bringing together 39+ organisations with interest in e-Infrastructure training • Raising issues and enablers with RCUK, BIS software.ac.uk/policy
SSI Fellows 2014 • 2014: 16 fellows • 2013: 15 fellows • 2012: 10 fellows • Range of subjects, career stages software.ac.uk/fellows
The Role of Software in Reproducible Research 6th Collaborations Workshop, Oxford 26-28th March 2014 Welcome to the CW14 Organised by the Software Sustainability Institute Sponsored by Microsoft Research and Github #CollabW14 software.ac.uk/cw14
Publicise your software http://openresearchsoftware.metajnl.com http://dx.doi.org/10.6084/m9.figshare.942289
What you can do now • Read the Best Practices for Scientific Computing • http://dx.doi.org/10.1371/journal.pbio.1001745 • Release your code and publish it in a journal • http://bit.ly/softwarejournals • Learn new software skills and pass them on to others • http://www.software-carpentry.org/ • Ask for software and data if you’re reviewing a paper • Forge a career in research, and change it for those coming behind you • The DOI for this presentation: 10.6084/m9.figshare.957257 • The Software Sustainabilty Institute is a collaboration between universities of Edinburgh, Manchester, Oxford and Southampton. Supported by EPSRC Grant EP/H043160/1.