GenXPro GmbH, Frankfurt am Main genxpro.de

Nucleotide-based Information Analysis GenXPro GmbH, Frankfurt am Main www.genxpro.de

Our Service Portfolio Nucleotide-basedinformation Transcriptomics : - RNAseq - SuperSAGE, MACE - Real-Time qPCRservice - NormalizationofcDNAlibraries - microRNA Genomics: - GeneticMarkers - Genotyping - Copynumbervariations - de novosequencing; hybrid Sequencing - Exome-sequencing - Chip-Seq Epigenomics: - Methylation-specific DK (MSDK) - MethSeq Bioinformatics: - NGS Data Handling, Assembly, Quantification, Annotation, SNP Discovery, Data Interpretation

NGS-Platforms Read-Length, Throughput, Costs PacBio Illumina Hiseq2000 Sequences: 1600 Million 0,85 Million Length: 2x100 bp ~ 500-10.000 bp

Transcriptomics: some frequent, many rare transcripts Frequenciesoftranscriptspecies Total transcriptdistribution Frequenttranscriptsmakeup 50 % of all transcripts. Most ofthetranscriptspeciesareexpressedatlowlevels (below 10 copies per million). *

One solution: RNA-Seq

GenXProsoptimzedRNAseqprotocol • Strand-Information ismaintained: Sense- and Antisense-transcriptscanbedistinguished • Ligation-basedmethodfor sample preparation: avoidstheusualbias, observedwhenhexamersareusedforsecondstrandsysntehsis • Synthesis ofartificial „Antisense“ transcriptsisavoided

Low-cost, digital geneexpressionanalysis • Problem: • RNAseqis still costlycomparedtomicorarrays • Oursolution = MACE: onlythe cDNA-3‘ends aresequenced • DisadvantagetoRNAseq: • Lessvariantscanbedistinguished, SNPs orfusion-transcriptsmightbemissed. • Advantages: • lowcosts: only1/20th ofsequencingdepthrequiredforsimilarresolution • Verygoodquantification • Alternative Poly-Adenylationisrevealed • concentration on themostpolymorphicsite in a gene • highlyspecificforgoodannotation

Massive Analysis of cDNA Ends: MACE How it works Massive Analysis of cDNA Ends (MACE): AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA Streptavidin-Beads 5’ 3’ AAAAAAA-3’ TTTTTTT-5’ cDNA 5’ 3’ AAAAAAA-3’ TTTTTTT-5’ cDNA AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA

Massive Analysis of cDNA Ends: MACE How it works Fragmentation AAAAAAA-3’ TTTTTTT-5’ Streptavidin-Beads AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ 100-300 bp

Massive Analysis of cDNA Ends: MACE How it works 2nd generation sequencing of 50-100 bp AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’

Massive Analysis of cDNA Ends: MACE How it works Assembly & Counting AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’

Massive Analysis of cDNA ends: MACE How it works Assembly & Counting 1 1 4 Counting, BLAST 50-400bp Onlyonefragment per transcript!

Advantages of MACE orSuperSAGE vs. Micro-Arrays Example: 4.455.653 Tags from Mouse Spleen More than 75 % rare transcripts: This information is lost on microarrays ! Only this part is visible for microarrays >18.000 different transcripts excluding the singletons * >13.000 Singletons with distinct matches to the NCBI-DB

TranSNiPtomics- why MACE? High coveragetodistingishbetween SNP anderror MACE AT CG AT AAAAAAAA TTTTTTTT cDNA Reads Concentration on polymorph 3‘ end: SNPs withenoughcoverage: 2 RNA-Seq AT CG AT AAAAAAAA TTTTTTTT cDNA Reads Reads distributed all overtranscript: SNPs withenoughcoverage: 0

Alternative Polyadenylation Signal Hexamer <- 3‘ UTR -> MACE-tags of short variant PolyAtail miRNA MACE-tags oflong variant • Up to 50% or transcripts are differentially poly-adenylated(Tian et al., 2005). • Escape from miRNA-based regulation: • ->Upto 10x moreproteinatsimilartranscriptlevels! • Influence mRNA nuclear export and cellular localization • Oncogenes often have shorter 3’UTR because of APA (Mayr & Bartel, 2009) • Stem cells show longer, differentiated cells shorter 3’UTRs (Shepard et al. 2011) • APA is an extremely important feature of a transcript and should be assessed for every transcript.

Coveragefor SNP detection Wheat, nucleosome/chromatinassemblyfactorC; 160 TPM MACE, 20 Mio Reads Sufficientcoveragefor SNP detection!

Coveragefor SNP detection Wheat, nucleosome/chromatinassemblyfactor C RNA seq, 20 Mioreads, same position Coveragetoolow!

RNA-Seqvs. MACE AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA of transcript A RNA-Seq AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA of transcript B Manyreads per transcript, reads per transcriptvaries! A MACE one read = one transcript B For the same depth of analysis, RNA-Seq requires about 20-30 times more sequencing* *Asmann et. al 2009

TrueQuant Technology to solve Problem of PCR-introduced bias Certain tags or fragments are preferentially amplified during PCR biased quantification The Solution: GenXPro’s bias-proof “TrueQuant” technology: Each molecule is individually labeled prior to PCR; copies are eliminated from dataset. securequantification

Problem of PCR-introduced BIAS TrueQuant-corrected vs. uncorrected Data Negative common logarithm of the p-value for differential expression of gene expression comparisons (Audic & Claverie; 1997). From human pancreatic tumors.

MACE Technical Replicate 10 Mioreads, Barley Pearson correlationcoefficient= 0.9983357 (!)

Bioinformatics: automatedworkflow formodeland non-model organsisms quantification Tags: Gen 1 1 annotation / mapping 1 Gen 2 unknown unknown Assembly unknown WEB tool „MACE2GO“ unknown quantification 4 BLASTX (Protein DBs) Enrichment analysis

Normalization of cDNA libraries: Frequent transcripts are strongly reduced cDNA before normalization cDNA after normalization

microRNAs and the RNA-degradome microRNA mRNA-ends AAAAAAA-3’ AAAAAAA-3’ mRNA AAAAAAA-3’ AAAAAAA-3’ Next-Gen-Sequencing, counting, BLAST

Genomic DNA

GenotypingbySequencing: ReducedComplexitiyGenomicSequencing 1. Digestion withrestriction Enzyme 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE I 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE II

Methylation-Specific Digital Karyotyping (MSDK) 2. Adapter Ligation, Sequencing 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE I 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE II 3. Counting (CNVs), Comparison, Annotation

Methylation-Specific Digital Karyotyping (MSDK) 1. Digestion withmethylation-sensitiverestrictionenzyme M 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE I M 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE II

Methylation-Specific Digital Karyotyping (MSDK) 2. Adapter Ligation, Sequencing 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE I 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE II 3. Counting, Annotation Counting, BLAST, Statistics

Exome Sequencing of genomic DNA Exons Fragmentation (100-300 bp) Introns Denaturation Binding toexon-specific, matrix-boundoligos Elution, sequencing, bioinformatics

Some References

GenXPro GmbH, Frankfurt am Main genxpro.de

GenXPro GmbH, Frankfurt am Main genxpro.de

Presentation Transcript

Ulrich Achatz Goethe- Universität Frankfurt am Main

City of Frankfurt am Main

Frankfurt am Main

WHY GERMANY and FACHHOCHSCHULE FRANKFURT AM MAIN? Modern Architecture Movement

Klinikum der Johann Wolfgang Goethe Universität Frankfurt am Main

traffiQ – Local Public Transport Authority of Frankfurt am Main

Messe Frankfurt Venue GmbH, Frankfurt a. M. Messe Frankfurt, Fairgrounds, Frankfurt a. M. Germany

Johann Wolfgang Goethe-Universität Frankfurt am Main

Frankfurt am Main-BRD , 2015

Coaching Frankfurt Am Main