SPRINT

SPRINT A Simple Parallel RINTerface

Overview • What is SPRINT • How is SPRINT different from other parallel R packages • Biological example: Post-genomic data analysis • Code comparison SPRINT

SPRINT SimpleParallelRINTerface (www.r-sprint.org) “SPRINT: A new parallel framework for R”,J Hill et al, BMC Bioinformatics, Dec 2008. SPRINT

Issues of existing parallel R packages • Difficult to program • Require scientist to also be a parallel programmer! • Require substantial changes to existing scripts • Can’t be used to solve some problems • No data dependencies allowed SPRINT

Biological example • Data: A matrix of expression measurements with genes in rows and samples in columns SPRINT

Biological example • ProblemUsing all or many genes will either crash or be very slow (R memory allocation limits, number of computations) Data limitations (correlations) Work load limitations (permutations) SPRINT

Workarounds and solution • Workaround: • Remove as many genes as possible before applying algorithm. This can be an arbitrary process and remove relevant data. • Perform multiple executions and post-process the data. Can become very painful procedure. • Solution:Parallelisation of R code can be made accessible to bioinformaticians/statisticians.A library with expert coded solutions once, then easy end-point use by all. Big Post Genomic Data SPRINT HPC R Biological Results SPRINT

Benchmarks (256 processes) Data limitations (correlations) Work load limitations (permutations) SPRINT

Correlation code comparison edata <- read.table("largedata.dat") pearsonpairwise <- cor(edata) write.table(pearsonpairwise, "Correlations.txt") quit(save="no") library("sprint") edata <- read.table("largedata.dat") ff_handle <- pcor(edata) pterminate() quit(save="no") SPRINT

Permutation testing code comparison data(golub) smallgd <- golub[1:100,] classlabel <- golub.cl resT <- mt.maxT(smallgd, classlabel, test="t", side="abs") quit(save="no") library("sprint") data(golub) smallgd <- golub[1:100,] classlabel <- golub.cl resT <- pmaxT(smallgd, classlabel, test="t", side="abs") pterminate() quit(save="no") SPRINT

SPRINT • Website: http://www.r-sprint.org/ • Source code can be downloaded from website • Soon also in the CRAN repository • Mailing list: sprint@lists.ed.ac.uk • Contact email: sprint@ed.ac.uk SPRINT

DPM Team: Peter Ghazal Thorsten Forster Muriel Mewissen Numerical Algorithms Group Acknowledgements EPCC Team: • Terry Sloan • Michal Piotrowski • Savvas Petrou • Bartek Dobrzelecki • Jon Hill • Florian Scharinger This work is supported by the Wellcome Trust and the NAG dCSE Support service. SPRINT

SPRINT

SPRINT

Presentation Transcript

Investment Sprint

Daily Sprint

Daily Sprint

Sprint Starts

SPRINT

SPRINT PROTECTIONS

Daily Sprint

Debugging Sprint

Daily Sprint

Sprint

Sprint Events

(Sprint)

Daily Sprint

Sprint Training

Nascar Sprint Cup 2015 Sprint Unlimited

Sprint Unlimited Sprint Cup 2015

Sprint Events

HURDLE SPRINT

Sprint