330 likes | 536 Views
Nucleotide-based Information Analysis. GenXPro GmbH, Frankfurt am Main www.genxpro.de. Our Service Portfolio. Nucleotide - based information. Transcriptomics : - RNAseq - SuperSAGE , MACE - Real-Time qPCR service - Normalization of cDNA libraries
E N D
Nucleotide-based Information Analysis GenXPro GmbH, Frankfurt am Main www.genxpro.de
Our Service Portfolio Nucleotide-basedinformation Transcriptomics : - RNAseq - SuperSAGE, MACE - Real-Time qPCRservice - NormalizationofcDNAlibraries - microRNA Genomics: - GeneticMarkers - Genotyping - Copynumbervariations - de novosequencing; hybrid Sequencing - Exome-sequencing - Chip-Seq Epigenomics: - Methylation-specific DK (MSDK) - MethSeq Bioinformatics: - NGS Data Handling, Assembly, Quantification, Annotation, SNP Discovery, Data Interpretation
NGS-Platforms Read-Length, Throughput, Costs PacBio Illumina Hiseq2000 Sequences: 1600 Million 0,85 Million Length: 2x100 bp ~ 500-10.000 bp
Transcriptomics: some frequent, many rare transcripts Frequenciesoftranscriptspecies Total transcriptdistribution Frequenttranscriptsmakeup 50 % of all transcripts. Most ofthetranscriptspeciesareexpressedatlowlevels (below 10 copies per million). *
GenXProsoptimzedRNAseqprotocol • Strand-Information ismaintained: Sense- and Antisense-transcriptscanbedistinguished • Ligation-basedmethodfor sample preparation: avoidstheusualbias, observedwhenhexamersareusedforsecondstrandsysntehsis • Synthesis ofartificial „Antisense“ transcriptsisavoided
Low-cost, digital geneexpressionanalysis • Problem: • RNAseqis still costlycomparedtomicorarrays • Oursolution = MACE: onlythe cDNA-3‘ends aresequenced • DisadvantagetoRNAseq: • Lessvariantscanbedistinguished, SNPs orfusion-transcriptsmightbemissed. • Advantages: • lowcosts: only1/20th ofsequencingdepthrequiredforsimilarresolution • Verygoodquantification • Alternative Poly-Adenylationisrevealed • concentration on themostpolymorphicsite in a gene • highlyspecificforgoodannotation
Massive Analysis of cDNA Ends: MACE How it works Massive Analysis of cDNA Ends (MACE): AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA Streptavidin-Beads 5’ 3’ AAAAAAA-3’ TTTTTTT-5’ cDNA 5’ 3’ AAAAAAA-3’ TTTTTTT-5’ cDNA AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA
Massive Analysis of cDNA Ends: MACE How it works Fragmentation AAAAAAA-3’ TTTTTTT-5’ Streptavidin-Beads AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ 100-300 bp
Massive Analysis of cDNA Ends: MACE How it works 2nd generation sequencing of 50-100 bp AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’
Massive Analysis of cDNA Ends: MACE How it works Assembly & Counting AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’ AAAAAAA-3’ TTTTTTT-5’
Massive Analysis of cDNA ends: MACE How it works Assembly & Counting 1 1 4 Counting, BLAST 50-400bp Onlyonefragment per transcript!
Advantages of MACE orSuperSAGE vs. Micro-Arrays Example: 4.455.653 Tags from Mouse Spleen More than 75 % rare transcripts: This information is lost on microarrays ! Only this part is visible for microarrays >18.000 different transcripts excluding the singletons * >13.000 Singletons with distinct matches to the NCBI-DB
TranSNiPtomics- why MACE? High coveragetodistingishbetween SNP anderror MACE AT CG AT AAAAAAAA TTTTTTTT cDNA Reads Concentration on polymorph 3‘ end: SNPs withenoughcoverage: 2 RNA-Seq AT CG AT AAAAAAAA TTTTTTTT cDNA Reads Reads distributed all overtranscript: SNPs withenoughcoverage: 0
Alternative Polyadenylation Signal Hexamer <- 3‘ UTR -> MACE-tags of short variant PolyAtail miRNA MACE-tags oflong variant • Up to 50% or transcripts are differentially poly-adenylated(Tian et al., 2005). • Escape from miRNA-based regulation: • ->Upto 10x moreproteinatsimilartranscriptlevels! • Influence mRNA nuclear export and cellular localization • Oncogenes often have shorter 3’UTR because of APA (Mayr & Bartel, 2009) • Stem cells show longer, differentiated cells shorter 3’UTRs (Shepard et al. 2011) • APA is an extremely important feature of a transcript and should be assessed for every transcript.
Coveragefor SNP detection Wheat, nucleosome/chromatinassemblyfactorC; 160 TPM MACE, 20 Mio Reads Sufficientcoveragefor SNP detection!
Coveragefor SNP detection Wheat, nucleosome/chromatinassemblyfactor C RNA seq, 20 Mioreads, same position Coveragetoolow!
RNA-Seqvs. MACE AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA of transcript A RNA-Seq AAAAAAA-3’ TTTTTTT-5’ 5’ 3’ cDNA of transcript B Manyreads per transcript, reads per transcriptvaries! A MACE one read = one transcript B For the same depth of analysis, RNA-Seq requires about 20-30 times more sequencing* *Asmann et. al 2009
TrueQuant Technology to solve Problem of PCR-introduced bias Certain tags or fragments are preferentially amplified during PCR biased quantification The Solution: GenXPro’s bias-proof “TrueQuant” technology: Each molecule is individually labeled prior to PCR; copies are eliminated from dataset. securequantification
Problem of PCR-introduced BIAS TrueQuant-corrected vs. uncorrected Data Negative common logarithm of the p-value for differential expression of gene expression comparisons (Audic & Claverie; 1997). From human pancreatic tumors.
MACE Technical Replicate 10 Mioreads, Barley Pearson correlationcoefficient= 0.9983357 (!)
Bioinformatics: automatedworkflow formodeland non-model organsisms quantification Tags: Gen 1 1 annotation / mapping 1 Gen 2 unknown unknown Assembly unknown WEB tool „MACE2GO“ unknown quantification 4 BLASTX (Protein DBs) Enrichment analysis
Normalization of cDNA libraries: Frequent transcripts are strongly reduced cDNA before normalization cDNA after normalization
microRNAs and the RNA-degradome microRNA mRNA-ends AAAAAAA-3’ AAAAAAA-3’ mRNA AAAAAAA-3’ AAAAAAA-3’ Next-Gen-Sequencing, counting, BLAST
GenotypingbySequencing: ReducedComplexitiyGenomicSequencing 1. Digestion withrestriction Enzyme 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE I 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE II
Methylation-Specific Digital Karyotyping (MSDK) 2. Adapter Ligation, Sequencing 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE I 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE II 3. Counting (CNVs), Comparison, Annotation
Methylation-Specific Digital Karyotyping (MSDK) 1. Digestion withmethylation-sensitiverestrictionenzyme M 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE I M 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE II
Methylation-Specific Digital Karyotyping (MSDK) 2. Adapter Ligation, Sequencing 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE I 5‘ 3‘ 3‘ 5‘ DNA – SAMPLE II 3. Counting, Annotation Counting, BLAST, Statistics
Exome Sequencing of genomic DNA Exons Fragmentation (100-300 bp) Introns Denaturation Binding toexon-specific, matrix-boundoligos Elution, sequencing, bioinformatics