1 / 9

High-Performance Computing for NGS

High-Performance Computing for NGS. NGS machines. Current PGFI: 2 Hiseq, 1 Solid 4, 1 Solid 5500 Functional Genomics: 1 Hiseq, 1 GAIIx Microarray core: 1 GAIIx Pierce lab: 1 GAIIx Sequencing core: 1 454 CHOP: 8 Solid 5500, 8 Solid 4 Total: ~ 6 Hiseq equivalent Future

hume
Download Presentation

High-Performance Computing for NGS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-Performance Computing for NGS

  2. NGS machines • Current • PGFI: 2 Hiseq, 1 Solid 4, 1 Solid 5500 • Functional Genomics: 1 Hiseq, 1 GAIIx • Microarray core: 1 GAIIx • Pierce lab: 1 GAIIx • Sequencing core: 1 454 • CHOP: 8 Solid 5500, 8 Solid 4 • Total: ~ 6 Hiseq equivalent • Future • 4 Hiseqs ordered or budgeted

  3. Throughput and Compute Need • Hiseq max throughput: 55 gb/day, ~500 million reads = 5 RNAseq samples per day • Compute time: • Bowtie Map of 100 million 100bp reads: 20 CPU hrs • RUM map: 1800 CPU hrs • 30x genome mapping and Sniper SNP calling: 1200 CPU hrs • Storage: • 250 ~ 500 GB of intermediate storage files (e.g., 120 days)

  4. Cost and Capacity Analysis • Assume RUM type of compute demand (1800 CPUs, 250-500 gb storage per sample) • RNAseq varies from 20-1800 hrs • But, more computing better results (e.g., alternative splicing, ncRNA, etc.) • Mapping is just first step, other follow through analysis (SNPs, GWAS, ChIPseq peaks, etc.) will require more computing • Thus, a reasonable bound

  5. Assume RUM runs for 100 M/100 bp RNAseq • Compute cost (no storage): • PGFI current cost ($0.08/hr): $144 • PGFI potential cost ($0.06/hr): $115 • AWS On-Demand cost (~$0.25/hr+500gb transfer): $489 • AWS contract cost ($0.075/hr+ 500gb transfer): $192 • Storage cost (120 days): • PGFI current cost ($0.1/gb/month): $200 • PGFI potential cost ($0.08/gb/month): $160 • AWS—too expensive have to bring it back • Total compute cost per 100M/100bp RNAseq RUM run • PGFI: $275~$344 • AWS: $392

  6. Total Compute Demand • Assumptions • 10 Hiseq equivalent machines • 200 days up time per machine • 4 RNAseq type data sets per day per machine • 120 days data retention policy • Estimated Computing Need • 14.4 million CPU hours/yr= ~2000 compute cores at 6 GB memory • 1.3 petabytes/yr

  7. Current PGFI Cluster • Compute: 400 cores + 150 new cores = 550 cores • Storage: 85 TB (old) + 390 TB (recent) + 300 TB (ordered) = 690 TB (old storage needs to be retired) • Expansion problems: • Data center space is completely saturated • Need HIPAA/FISMA compliant space • Need ~$2.5 million to meet demand

  8. JIG organization chart JIG oversight Collaborations Garret FitzGerald John Hogenesch Junhyong Kim

  9. Take Home Msg:Please budget for computing. For NGS applications computing costs are approaching the costs of supplies

More Related