100 likes | 238 Views
High-Performance Computing for NGS. NGS machines. Current PGFI: 2 Hiseq, 1 Solid 4, 1 Solid 5500 Functional Genomics: 1 Hiseq, 1 GAIIx Microarray core: 1 GAIIx Pierce lab: 1 GAIIx Sequencing core: 1 454 CHOP: 8 Solid 5500, 8 Solid 4 Total: ~ 6 Hiseq equivalent Future
E N D
NGS machines • Current • PGFI: 2 Hiseq, 1 Solid 4, 1 Solid 5500 • Functional Genomics: 1 Hiseq, 1 GAIIx • Microarray core: 1 GAIIx • Pierce lab: 1 GAIIx • Sequencing core: 1 454 • CHOP: 8 Solid 5500, 8 Solid 4 • Total: ~ 6 Hiseq equivalent • Future • 4 Hiseqs ordered or budgeted
Throughput and Compute Need • Hiseq max throughput: 55 gb/day, ~500 million reads = 5 RNAseq samples per day • Compute time: • Bowtie Map of 100 million 100bp reads: 20 CPU hrs • RUM map: 1800 CPU hrs • 30x genome mapping and Sniper SNP calling: 1200 CPU hrs • Storage: • 250 ~ 500 GB of intermediate storage files (e.g., 120 days)
Cost and Capacity Analysis • Assume RUM type of compute demand (1800 CPUs, 250-500 gb storage per sample) • RNAseq varies from 20-1800 hrs • But, more computing better results (e.g., alternative splicing, ncRNA, etc.) • Mapping is just first step, other follow through analysis (SNPs, GWAS, ChIPseq peaks, etc.) will require more computing • Thus, a reasonable bound
Assume RUM runs for 100 M/100 bp RNAseq • Compute cost (no storage): • PGFI current cost ($0.08/hr): $144 • PGFI potential cost ($0.06/hr): $115 • AWS On-Demand cost (~$0.25/hr+500gb transfer): $489 • AWS contract cost ($0.075/hr+ 500gb transfer): $192 • Storage cost (120 days): • PGFI current cost ($0.1/gb/month): $200 • PGFI potential cost ($0.08/gb/month): $160 • AWS—too expensive have to bring it back • Total compute cost per 100M/100bp RNAseq RUM run • PGFI: $275~$344 • AWS: $392
Total Compute Demand • Assumptions • 10 Hiseq equivalent machines • 200 days up time per machine • 4 RNAseq type data sets per day per machine • 120 days data retention policy • Estimated Computing Need • 14.4 million CPU hours/yr= ~2000 compute cores at 6 GB memory • 1.3 petabytes/yr
Current PGFI Cluster • Compute: 400 cores + 150 new cores = 550 cores • Storage: 85 TB (old) + 390 TB (recent) + 300 TB (ordered) = 690 TB (old storage needs to be retired) • Expansion problems: • Data center space is completely saturated • Need HIPAA/FISMA compliant space • Need ~$2.5 million to meet demand
JIG organization chart JIG oversight Collaborations Garret FitzGerald John Hogenesch Junhyong Kim
Take Home Msg:Please budget for computing. For NGS applications computing costs are approaching the costs of supplies