1 / 27

Peg Folta Lawrence Livermore National Laboratory 3/12/02

TRANSCRIPTOME 2002 Seattle, WA. The Integrated Molecular Analysis of Genomes and their Expression Consortium’s Data Mining Tools: Introducing the IQ. Peg Folta Lawrence Livermore National Laboratory 3/12/02. I.M.A.G.E. maintains world’s largest publicly available cDNA collection.

aira
Download Presentation

Peg Folta Lawrence Livermore National Laboratory 3/12/02

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TRANSCRIPTOME 2002 Seattle, WA The Integrated Molecular Analysis of Genomes and their Expression Consortium’s Data Mining Tools: Introducing the IQ Peg Folta Lawrence Livermore National Laboratory 3/12/02

  2. I.M.A.G.E. maintains world’s largest publicly available cDNA collection 5,819,514 clones arrayed cumulative arrayed * I.M.A.G.E. clones account for 64% of human ESTs in GenBank

  3. The I.M.A.G.E. collection has been shaped by projects (C-GAP, MGC…) Species Developmental state Clonesequence Tissue Library Method

  4. Redesign of data management system Informatics focus this year was on tools to characterize and query the collection. • IMAGEne – mature clustering tool • IMAGEne Tissue – allows searching of tissue type dominance in clusters • IQ – Intelligent Query tool allows mining of I.M.A.G.E. data • Library/plate query – allows selective searching of libraries and plates • Problem report and query – allows users to report or query problems related to I.M.A.G.E. clones

  5. Known Clusters I.M.A.G.E. Singletons IMAGEne-Human Process 279,262 Lower quality I.M.A.G.E Sequences 14,566 NCBI Ref Seq IMAGEne 2,289,020 Quality I.M.A.G.E sequences 623,294 Sequences 1,676,516 Sequences Remaining Sequences >50 basepairs of contiguous, non-repeat sequence 67,521 14,566 268,472 Candidate Clusters w/consensus

  6. Initial query page, construct the query.

  7. Clusters matching query results, chose your cluster.

  8. Display of cluster

  9. Known gene clusters with full length I.M.A.G.E. clones have doubled in number. Cluster coverage Avg. gene length 1578 3392 2763 3380 1896

  10. Known Gene Cluster distribution of full length clones avg. length = 948

  11. Candidate gene clusters consensus sequence and contigs are generated by CAP4 61,314 4,971 824 227 95 40

  12. Candidate Gene cluster characteristics.

  13. Singleton: Wheat within the chaff 305 full insert sequences are singletons. 62,143 singletons have a 3’ PolyA site. Avg. length is 547

  14. IMAGEne Tissue query allows searching for tissue proportions within clusters.

  15. Introducing the Intelligent Query - IQ • For a given category (currently clone and library) a user can specify a query based on key database attributes. • The user can specify the fields returned. • Various result format options (HTML, text) • Initial version was rolled out last summer • New functionality to be added this year (additional categories, etc.)

  16. Specify a clone-based query.

  17. Next specify what clone centric results will be provided and in what format.

  18. HTML version of clone-based query results.

  19. Specify a library-based query.

  20. Similarly specify what library centric results will be provided.

  21. HTML version of library-based query results.

  22. Other tools to mine I.M.A.G.E. information Query for reported problems. Query plates from libraries.

  23. Quality control for historical collection

  24. Master vs. GenBank LLNL Replication QC on-going

  25. Ongoing QC results Error in replication @ LLNL On-going Comparing master to GenBank

  26. Next for I.M.A.G.E. Informatics • Extensive expansion of query tools and data access • IMAGEne non-species specific • Analysis of human cluster candidate genes and singletons • Redo of web site, easier to navigate MUCH influenced by public needs…..you have a say!

  27. This work was partially funded by the NIH and was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under contract no. W-7405-Eng-48. Acknowledgements • LLNL • Christa Prange, I.M.A.G.E. PI • Tim Harsch, Amber Johnston, Julie Amundson • Sponsors • DOE, Marv Stodolsky • NIH, Bob Strausberg image.llnl.gov

More Related