1 / 12

SPRINT

SPRINT. A S imple P arallel R INT erface. Overview. What is SPRINT How is SPRINT different from other parallel R packages Biological example: Post-genomic data analysis Code comparison. SPRINT. S imple P arallel R INT erface ( www.r-sprint.org )

cicely
Download Presentation

SPRINT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPRINT A Simple Parallel RINTerface

  2. Overview • What is SPRINT • How is SPRINT different from other parallel R packages • Biological example: Post-genomic data analysis • Code comparison SPRINT

  3. SPRINT SimpleParallelRINTerface (www.r-sprint.org) “SPRINT: A new parallel framework for R”,J Hill et al, BMC Bioinformatics, Dec 2008. SPRINT

  4. Issues of existing parallel R packages • Difficult to program • Require scientist to also be a parallel programmer! • Require substantial changes to existing scripts • Can’t be used to solve some problems • No data dependencies allowed SPRINT

  5. Biological example • Data: A matrix of expression measurements with genes in rows and samples in columns SPRINT

  6. Biological example • ProblemUsing all or many genes will either crash or be very slow (R memory allocation limits, number of computations) Data limitations (correlations) Work load limitations (permutations) SPRINT

  7. Workarounds and solution • Workaround: • Remove as many genes as possible before applying algorithm. This can be an arbitrary process and remove relevant data. • Perform multiple executions and post-process the data. Can become very painful procedure. • Solution:Parallelisation of R code can be made accessible to bioinformaticians/statisticians.A library with expert coded solutions once, then easy end-point use by all. Big Post Genomic Data SPRINT HPC R Biological Results SPRINT

  8. Benchmarks (256 processes) Data limitations (correlations) Work load limitations (permutations) SPRINT

  9. Correlation code comparison edata <- read.table("largedata.dat") pearsonpairwise <- cor(edata) write.table(pearsonpairwise, "Correlations.txt") quit(save="no") library("sprint") edata <- read.table("largedata.dat") ff_handle <- pcor(edata) pterminate() quit(save="no") SPRINT

  10. Permutation testing code comparison data(golub) smallgd <- golub[1:100,] classlabel <- golub.cl resT <- mt.maxT(smallgd, classlabel, test="t", side="abs") quit(save="no") library("sprint") data(golub) smallgd <- golub[1:100,] classlabel <- golub.cl resT <- pmaxT(smallgd, classlabel, test="t", side="abs") pterminate() quit(save="no") SPRINT

  11. SPRINT • Website: http://www.r-sprint.org/ • Source code can be downloaded from website • Soon also in the CRAN repository • Mailing list: sprint@lists.ed.ac.uk • Contact email: sprint@ed.ac.uk SPRINT

  12. DPM Team: Peter Ghazal Thorsten Forster Muriel Mewissen Numerical Algorithms Group Acknowledgements EPCC Team: • Terry Sloan • Michal Piotrowski • Savvas Petrou • Bartek Dobrzelecki • Jon Hill • Florian Scharinger This work is supported by the Wellcome Trust and the NAG dCSE Support service. SPRINT

More Related