Next Generation DNA Sequencing Platforms: Evolving Tools for Cancer Research

Next Generation DNA Sequencing Platforms: Evolving Tools for Cancer Research Norma Neff Bioengineering / Quake Lab Sequencing Core Director Stem Cell Institute SIM1 G1115 / G0821 nfneff@stanford.edu

Sequencing By Synthesis Emulsion PCR-based Sequencing Technologies Sequencing Technologies Single Molecule Sequencing Technologies Recommended Reviews: Michael Metzker(2010) Nature Reviews Genetics 11:31 Quail et al (2012) BMC Genomics Jul 24;13:341.

Outline of Today’s Presentation: Sequencing by Synthesis Next Gen Sequencing Sample or Library Preps Review of Seq Technologies Comparisons of Different Platforms Summary and Final Thoughts

Design of Sequencing Samples or Libraries A1 A2 Adapters are Ligated to Sample DNA to be sequenced = Library Adapters are short (30-50bp) double-stranded oligos Sequences of the adapters are specific to each seq platform A1 A2 Sites for PCR primers to bind to amplify the Library A1 A2 Sites for seq primers to bind to seq the sample DNA A1 A2 BC1 BC2 Bar codes (6-12bp) for multiplexing libraries in a seq run

Sequencing by Synthesis: Bases are added to DNA Molecules at the 3’ OH end of the Chain 3’ OH

Emulsion PCR – Library DNA is amplified in an Oil Droplet • Beads are spun into wells on a plate • Flows one dNTP at a time • Detects PPi Release • By Coupled LuciferaseRxn • Light Intensity = Base addition • Beads are spun into wells of chip • Flows one dNTP at a time • Detects H+ Release • pH change = Base addition

Roche 454 Benchtop Sequencers – 400bp Readlengths / Reliable Chemistry Requires most time from Library to Machine Loading First Technology to Incorporate Bar Coding of Libraries GS Junior Roche 454 GS FLX+ Titanium Output = 70k-100k Reads; 30Mb Read Length = 400 bases Run Time = 10 hours Error Profile = Indels Homopolymers Output = 1 Millions Reads; 400 -700Mb Read Length = 400bases (700bases) Run Time = 8-23 hours Error Profile = Indels Homopolymers

Ion Torrent = Desktop Sequencers for Low and High Sequence Output PGM Output 10-500M bases Read Length = 200 bases Run Time = 1-3 hours Error Profile = Indels Homopolymers Output 10 G bases Read Length = 200 bases Run Time = 4 hours Error Profile = Indels Homopolymers Ion Proton I Coming soon: Proton II and III 300-400 base reads

O O O Adenosine 5’ H O O P O P O P dATP vs ATP 2’ O- O- 3’ O- OH H O O O Adenosine 5’ H O O P O P O P ddATP vs dATP 2’ O- O- 3’ O- H H N3 X Irreversible Terminator Sanger Sequencing O O O O Adenosine 5’ H O O P O P O P 2’ O- O- 3’ O- X H O N3 Reversible Terminators & Cleavable Fluorescent Tags

Solid Phase Amplification – Library DNA binds to Oligos Immobilized on Glass Flowcell Surface • Clusters are Linearized • Seq primer annealed • All four dNTPs added at each cycle • Error Profile = substitutions • Each dNTP has a different • **Fluorescent Tag** • Intensity of different Tags = Base call V3 HiSeq

Evolution of Solexa / Illumina Sequencing Platform GA II (2006) HiSeq 2000 (2010) Output 30 - 40 Gb / lane Read Length = 100 bases SR Or 2x100 PR Accommodates Dual Bar codes Run Time = 2-14 days Error Profile = substitutions HISeq 2500 = 2x150 (2x250) 600 million reads / 39 hours Output 1 million 1x36bp reads / lane Improved chemistry to 10 million / lane Paired end reads to 2x150bp V3

MiSeq – QC Libraries and 250bp Reads V1 Runs 1x50bp + I or 2 bar codes (6 hrs) 2x150bp + bar codes (28 hrs) 10M reads = 1G bases V2 Runs – Use Top and Bottom of Lane 2x250bp + bar codes (39 hrs) 15M reads = 7G bases Accommodates Dual Bar Codes • Uses single reagent cassette and buffer bottle • Same paired end libraries on all Illumina seqs • Has additional options for Base Space data • storage system and alignment software • Real time run monitoring and data sharing MiSeq V3 HiSeq

Single Molecule Imaging: Heavy Metal Battle Royale Short Reads & High Output vs Long Reads & Low Output

Helicos Genetic Analysis System 20Tb >GATAGCTAGCTAGCTACACAGAGAT >GATAGACACACACACACACAGCGCA >GTACTACACACAGCGACACAGTCTA >GTCGAACACACATGAACACATGAGC >GTGTCACACACGACTACACATGCAT >TAGTGACACACGTAGACACGACAGT >TCTCGACACACTATCACACGACTCA >TGCACACACACTCGTACACGAGACG Sample Preparation dA Tailing TdT 600 – 900 Million Aligned Bases per lane X 50 lanes Does not use ligation or PCR amplification Output HeliScope™ Single Molecule Sequencer HeliScope™ Sample Loader OligodT on Flowcell HeliScope™ Analysis Engine • 33bp Avg Reads; 1-10 Gb; 8 day Run • Use Terminal transferse to add poly dA tail • Flows one nucleotide at a time – Error Profile = Indels • DNA quality not an important factor – ancient DNA • Can do Direct RNA Sequencing – 3’ ends • Custom Seq Capture Flowcells • Primarily a sequencing service company 14 Company Confidential

Pacific Biosciences RS: Real Time Movies of Nucleotide-binding by DNA Polymerase Adapter Subreads Mapped Read Length

PacBio Technology Makes Base Calls on How Long the Base Stays in the Active Site Output = 50k Reads; 100 Mb per SMRT Cell (16 max per run) Read Length = 2000 bases Run Time = 90min per SMRT Cell Error Profile = Indels

Update of PacBio Progress 2011 – 2012

Cost of $equencing Reagents Library Construction Quality Assessment Accessory Equipment and Supplies Labor to get samples on the machine Machine maintenance / service contracts Computational Requirements Data Storage

Two Strategies for Sequencing: Depth of Coverage vs Speed Depth of Coverage (20-100 million reads with good quality scores) = Discovery vs Speed (1-24 hours run time) = Validation and Diagnosis Accuracy / Seq Error Profiles / Bioinformatic Tools

Summary and Final Thoughts Sequencing Technologies Keep Evolving Plan your sequencing experiments based on the data set you need Consider size of data set, accuracy reads, cost and speed Choose your platform appropriately Work smarter – be imaginative and what seems impossible today can be the standard tomorrow

Next Generation DNA Sequencing Platforms: Evolving Tools for Cancer Research

Next Generation DNA Sequencing Platforms: Evolving Tools for Cancer Research

Presentation Transcript

“Cancer Genomics”

Dulbecco R. (1986) A turning point in cancer research: sequencing the human genome. Science 231:1055-6

Next Generation Sequencing Technologies

DNA Sequencing: Present Status and Future Challenges

William Pao, MD, PhD Professor of Medicine Director, Personalized Cancer Medicine Director, Division of Hematology/Oncol

Should genome sequencing of multiple oncogenes surplant

Whole Genome Sequencing for Colorectal Cancer

Next Generation Sequencing in the Clinical Laboratory

Next-generation sequencing and PBRC

BMS 617

Less is more

Genome Sequencing and genome viewers

Last lecture summary

The SOLiD System: Next-Generation Sequencing

High Throughput Sequencing

Next Generation Sequencing

Detection of Genomic Rearrangements in K562 cells using Paired End Sequencing

A Lot More Advanced Biotechnology Tools (Part 1)

Research Techniques Made Simple Next-Generation Sequencing: Methodology and Application

GNUMap : Unbiased Probabilistic Mapping of Next-Generation Sequencing Reads

Next Generation Sequencing in Virus and Parasite Research

The SOLiD System: Next-Generation Sequencing