200 likes | 512 Views
Next Generation DNA Sequencing Platforms: Evolving Tools for Cancer Research. Norma Neff Bioengineering / Quake Lab Sequencing Core Director Stem Cell Institute SIM1 G1115 / G0821 nfneff@stanford.edu. Sequencing By Synthesis. Emulsion PCR-based Sequencing Technologies.
E N D
Next Generation DNA Sequencing Platforms: Evolving Tools for Cancer Research Norma Neff Bioengineering / Quake Lab Sequencing Core Director Stem Cell Institute SIM1 G1115 / G0821 nfneff@stanford.edu
Sequencing By Synthesis Emulsion PCR-based Sequencing Technologies Sequencing Technologies Single Molecule Sequencing Technologies Recommended Reviews: Michael Metzker(2010) Nature Reviews Genetics 11:31 Quail et al (2012) BMC Genomics Jul 24;13:341.
Outline of Today’s Presentation: Sequencing by Synthesis Next Gen Sequencing Sample or Library Preps Review of Seq Technologies Comparisons of Different Platforms Summary and Final Thoughts
Design of Sequencing Samples or Libraries A1 A2 Adapters are Ligated to Sample DNA to be sequenced = Library Adapters are short (30-50bp) double-stranded oligos Sequences of the adapters are specific to each seq platform A1 A2 Sites for PCR primers to bind to amplify the Library A1 A2 Sites for seq primers to bind to seq the sample DNA A1 A2 BC1 BC2 Bar codes (6-12bp) for multiplexing libraries in a seq run
Sequencing by Synthesis: Bases are added to DNA Molecules at the 3’ OH end of the Chain 3’ OH
Emulsion PCR – Library DNA is amplified in an Oil Droplet • Beads are spun into wells on a plate • Flows one dNTP at a time • Detects PPi Release • By Coupled LuciferaseRxn • Light Intensity = Base addition • Beads are spun into wells of chip • Flows one dNTP at a time • Detects H+ Release • pH change = Base addition
Roche 454 Benchtop Sequencers – 400bp Readlengths / Reliable Chemistry Requires most time from Library to Machine Loading First Technology to Incorporate Bar Coding of Libraries GS Junior Roche 454 GS FLX+ Titanium Output = 70k-100k Reads; 30Mb Read Length = 400 bases Run Time = 10 hours Error Profile = Indels Homopolymers Output = 1 Millions Reads; 400 -700Mb Read Length = 400bases (700bases) Run Time = 8-23 hours Error Profile = Indels Homopolymers
Ion Torrent = Desktop Sequencers for Low and High Sequence Output PGM Output 10-500M bases Read Length = 200 bases Run Time = 1-3 hours Error Profile = Indels Homopolymers Output 10 G bases Read Length = 200 bases Run Time = 4 hours Error Profile = Indels Homopolymers Ion Proton I Coming soon: Proton II and III 300-400 base reads
O O O Adenosine 5’ H O O P O P O P dATP vs ATP 2’ O- O- 3’ O- OH H O O O Adenosine 5’ H O O P O P O P ddATP vs dATP 2’ O- O- 3’ O- H H N3 X Irreversible Terminator Sanger Sequencing O O O O Adenosine 5’ H O O P O P O P 2’ O- O- 3’ O- X H O N3 Reversible Terminators & Cleavable Fluorescent Tags
Solid Phase Amplification – Library DNA binds to Oligos Immobilized on Glass Flowcell Surface • Clusters are Linearized • Seq primer annealed • All four dNTPs added at each cycle • Error Profile = substitutions • Each dNTP has a different • **Fluorescent Tag** • Intensity of different Tags = Base call V3 HiSeq
Evolution of Solexa / Illumina Sequencing Platform GA II (2006) HiSeq 2000 (2010) Output 30 - 40 Gb / lane Read Length = 100 bases SR Or 2x100 PR Accommodates Dual Bar codes Run Time = 2-14 days Error Profile = substitutions HISeq 2500 = 2x150 (2x250) 600 million reads / 39 hours Output 1 million 1x36bp reads / lane Improved chemistry to 10 million / lane Paired end reads to 2x150bp V3
MiSeq – QC Libraries and 250bp Reads V1 Runs 1x50bp + I or 2 bar codes (6 hrs) 2x150bp + bar codes (28 hrs) 10M reads = 1G bases V2 Runs – Use Top and Bottom of Lane 2x250bp + bar codes (39 hrs) 15M reads = 7G bases Accommodates Dual Bar Codes • Uses single reagent cassette and buffer bottle • Same paired end libraries on all Illumina seqs • Has additional options for Base Space data • storage system and alignment software • Real time run monitoring and data sharing MiSeq V3 HiSeq
Single Molecule Imaging: Heavy Metal Battle Royale Short Reads & High Output vs Long Reads & Low Output
Helicos Genetic Analysis System 20Tb >GATAGCTAGCTAGCTACACAGAGAT >GATAGACACACACACACACAGCGCA >GTACTACACACAGCGACACAGTCTA >GTCGAACACACATGAACACATGAGC >GTGTCACACACGACTACACATGCAT >TAGTGACACACGTAGACACGACAGT >TCTCGACACACTATCACACGACTCA >TGCACACACACTCGTACACGAGACG Sample Preparation dA Tailing TdT 600 – 900 Million Aligned Bases per lane X 50 lanes Does not use ligation or PCR amplification Output HeliScope™ Single Molecule Sequencer HeliScope™ Sample Loader OligodT on Flowcell HeliScope™ Analysis Engine • 33bp Avg Reads; 1-10 Gb; 8 day Run • Use Terminal transferse to add poly dA tail • Flows one nucleotide at a time – Error Profile = Indels • DNA quality not an important factor – ancient DNA • Can do Direct RNA Sequencing – 3’ ends • Custom Seq Capture Flowcells • Primarily a sequencing service company 14 Company Confidential
Pacific Biosciences RS: Real Time Movies of Nucleotide-binding by DNA Polymerase Adapter Subreads Mapped Read Length
PacBio Technology Makes Base Calls on How Long the Base Stays in the Active Site Output = 50k Reads; 100 Mb per SMRT Cell (16 max per run) Read Length = 2000 bases Run Time = 90min per SMRT Cell Error Profile = Indels
Cost of $equencing Reagents Library Construction Quality Assessment Accessory Equipment and Supplies Labor to get samples on the machine Machine maintenance / service contracts Computational Requirements Data Storage
Two Strategies for Sequencing: Depth of Coverage vs Speed Depth of Coverage (20-100 million reads with good quality scores) = Discovery vs Speed (1-24 hours run time) = Validation and Diagnosis Accuracy / Seq Error Profiles / Bioinformatic Tools
Summary and Final Thoughts Sequencing Technologies Keep Evolving Plan your sequencing experiments based on the data set you need Consider size of data set, accuracy reads, cost and speed Choose your platform appropriately Work smarter – be imaginative and what seems impossible today can be the standard tomorrow