1 / 42

A. Willingham Affymetrix, Inc

Mapping Sites of Transcription Across the Drosophila Genome Using High Resolution Tiling Microarrays LBNL, Berkeley CA August 20, 2007. A. Willingham Affymetrix, Inc. Presentation Outline. I. Affymetrix’s Contribution to Specific Aims and Milestones II. Previous Studies

barid
Download Presentation

A. Willingham Affymetrix, Inc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mapping Sites of Transcription Across the Drosophila Genome Using High Resolution Tiling MicroarraysLBNL, Berkeley CA August 20, 2007 A. Willingham Affymetrix, Inc

  2. Presentation Outline • I. Affymetrix’s Contribution to Specific Aims and Milestones • II. Previous Studies • Manak et al analysis of developmental transcriptome • III. Initial Results for Aim I • sample preparation & data processing • first look at cell line data on 35bp arrays • pilot analysis of brand-new 7bp arrays • IV. RACE-array • example of ENCODE extension analysis of genes on Chr21 & 22 • V. Summary and Steps for Moving Forward

  3. Specific Aim 1 • 480 samples on 35-bp genome tiling arrays • 24 samples on 7-bp genome tiling array sets • 160 RACE-fragment pools (16,000 prod’s) Specific Aim 2 • RNAi of 120 RNA binding proteins on arrays Specific Aim 3 • Northern blotting of ncRNA models

  4. RNA Samples and Genome Tiling Arrays

  5. Milestones

  6. Timeline for Milestones • stepwise nature of individual aims & responsibilities • involvement & interdependencies of each step • propose shifting milestones to more of a “ramp-up” model

  7. Previous Studies Manak et al. Nature Genetics, v38 Sep 2006

  8. Transcription Analysis of Early (0-24hr) of Drosophila Embryogenesis • 70% Annotated • 30% Unannotated Manak et al. Nature Genetics, v38 Sep 2006

  9. 0-2 hr 2-4 hr 19Kb 4-6 hr 6-8 hr 8-10 hr 10-12 hr 12-14 hr 14-16 hr 16-18 hr 18-20 hr 20-22 hr 22-24 hr Differential expression in Drosophila embryogenesis (~40kb region of Chromosome 3R) Maternally Expressed Genes (Restarted in two patterns) 5’ TSS

  10. Drosophila:5`-sites predicted by txn co-reg.~1500 genesavg 1st intron size = ~20kbavg 1st annotated intron = ~1.7kb Unannotated transcription updates known gene annotations Manak et al. Nature Genetics, v38 Sep 2006

  11. Initial Results of Aim I

  12. Affymetrix sample preparation & data generation pipeline • sample treatment & QC • DNase-treat • BioAnalyzer • this example highlights method for generation of RNA maps but is similar for other applications: • RNA maps of long and short RNAs • RACE-array maps • RNAi knockdown experiments • chromatin-immunoprecipitation • 1st-strand cDNA synth. • random primed • Superscript-II • 2nd-strand cDNA synth. • DNA Pol-I • save aliquot for downstream QC • label & hybridize to arrays • TdT-based end labeling • CEL file generation • signal graph generation • median-scaling • q-norm bioreps • select bandwidth • transfrag generation • select min-run • select max-gap • quality control • overlap w/ RACE • Northern blots • QPCR of cDNA • data distribution • tomeweb hosting • FTP to servers? • deliver to DCC, GEO, etc

  13. Current Sample Prep (5 cell line samples completed in triplicate) (for 3 other cell lines, several samples failed) • Hosted at http://transcriptome.affymetrix.com/download/modENCODE/

  14. RNA QC by Agilent BioAnalyzer

  15. Chr2L: Transcription Expression Maps Across ~50 Kb ML-DmD4-c1 ML-DmBG3-c2 Kc167 CME-W1-Cl8

  16. Chr2L: Transcription Expression Maps Across ~25 Kb ML-DmD4-c1 ML-DmBG3-c2 Kc167 CME-W1-Cl8

  17. transcription in 4 Drosophila cell lines: overlapping transcription

  18. transcription in 4 Drosophila cell lines: overlapping annotation

  19. RNA Samples and Genome Tiling Arrays 7 nt resolution arrays • new 7bp design • 5 arrays, total of ~14.4 million probes • by comparison, 35bp array has ~3.1 million probes • 5bp design required 7 arrays… 40% more chips required • 1512 arrays instead of 1080 • replicates & strand not calculated in original budget • updated genome version (release 5) used for design • repeats can be masked or unmasked • virtual probes • existing 35-bp design • 1 array, total of ~3.1 million probes • Affy commercial group will produce an “updated” 2.0 design • 39bp resolution, release5-based design • however, we will continue using the current design • 35bp resolution more optimal for RNA maps • 7bp arrays have better coverage & newer design • question of $ cost per array? • comparison of nucleotide coverage (dm3, release5) • 35bp array = 111,117,940 nt • 7bp array masked = 107,355,171 nt • 7bp array unmasked = 118,523,115 nt

  20. New 7-bp 5-chip array compared to 35-bp 1-chip array • Cherbas total RNA samples from 2 cell lines (KC & clone8) • Same labeled reactions hyb’d to 35bp and 7bp arrays • Signal graphs generated in TAS: 2 technical replicates for each sample were q-norm together • Bandwidth = 30 (7bp) or 50 (35bp), Norm target = 200 • Transfrags generated in TAS using 5% bacterial negative controls • 7bp arrays: min-run 50, max-gap 10 • 35bp arrays: min-run 50, max-gap 90 • Intersections of 7bp vs 35bp and overlap with FlyBase annotations performed in Galaxy • Hosted at: http://transcriptome.affymetrix.com/download/modENCODE/pilot_studies/Dros-7bp-pilot/ • Share with modENCODE DCC & ArrayExpress to determine whole-chromosome vs whole-chip data hosting

  21. Improved exon discrimination by transfrags from 7bp arrays

  22. Pseudo-ROC curves comparing base-pair coverage & overlap with annotated exons • five different thresholds for calculated probe false-positive rate were used • 1%...3%...5%...7%...10% (7% and 10% not shown for 35bp array) • 7bp arrays clearly have a significantly lower false-positive rate for forming transfrags from bacterial negative regions • ~4-5 fold lower than 35bp arrays • attributable to higher probe density and different min-run & max-gap rules

  23. Summary: 7bp arrays • 35bp and 7bp arrays have similar amount of bp coverage in transfrags • BUT 7bp arrays have 50-65% more transfrags • 7bp transfrags are more “fragmented” and do a better job of delineating exons with small introns • 7-bp array has better “resolution” of small exons • Intersection with annotations shows both 35bp and 7bp arrays are detecting similar amounts of transcription as measured by bp coverage

  24. Improved exon discrimination by transfrags from 7bp arrays

  25. modENCODE RACE array methodology • 5` RACE for 16,000 Drosophila genes • choice of tissues? • hybridize products (in pools of 100) to 35bp arrays • 1Mb separation between genes • confirm presence of transfrags • identify new, “rare” transfrags due to amplification of PCR • human ENCODE project has done a similar study on the genes present on chromosomes 21 & 22

  26. DeGeorge Critical Region 14 gene RACE Analysis of Coding Genes Kapranov, et al. Genome Res. (2005)

  27. Conclusions • array types & applications • pilot analysis of 7bp arrays • updated for dm3-release5 genome annotation: bpmaps & IGB • sample processing pipeline & data generation • multiple applications require different types of graphs & transfrags • bandwidth0 versus smoothing (e.g. bandwidth50) • RACE array • lessons learned by ENCODE • QC and validation • some of the specific aims (Northerns, RACE) will address these • additional analysis such as RT-PCR and QPCR validation of novel transcripts • data hosted at affy-transcriptome website: • http://transcriptome.affymetrix.com/download/modENCODE/ • sharing pilot data with DCC (Nicole Washington) to facilitate the process • Steps Moving Forward • adjusting milestones? • changes in samples? (usage of 7bp versus 35bp) • shifting focus in favor of more analysis of small RNAs? • data hosting and transfer issues?

  28. Computation S. Ghosh H. Tammana N. Garg S. Dike J. Cheng Molecular Biology I. Bell J. Drenkow E. Dumais J. Dumais R. Duttagupta P. Kapranov A. Willingham J. Manak Acknowledgements AFFX Transcriptome Group Tom Gingeras

  29. supplemental slides

  30. Kapranov et al. Science, v316 Jun 2007

  31. same intronic expression seen by all arrays

  32. value-of-probe-density

  33. value-of-smoothing

  34. value-of-unmasked

  35. masking-issue-in-exons

  36. unmasked regions are frequently higher

More Related