1 / 20

EnsMart: A Generic System for Fast and Flexible Access to Biological Data

EnsMart: A Generic System for Fast and Flexible Access to Biological Data. Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust. Objectives. Understand the idea of a “Data Mart” Understand why this idea is useful to biology Have an idea of how Ens Mart works.

sinjin
Download Presentation

EnsMart: A Generic System for Fast and Flexible Access to Biological Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust

  2. Objectives • Understand the idea of a “Data Mart” • Understand why this idea is useful to biology • Have an idea of how EnsMart works. • Assess the significance of the EnsMart system. Will it last?

  3. Data Mart defined • A database that is potentially derived from many other databases whose primary purpose is query processing and report generation for non-technical users. • Similar to a “Data Warehouse” • Marts/warehouses important components in “decision support systems” in business.

  4. Data Mart in EnsMart • Data collected • Standardized • Query Optimized • Presented to Users

  5. Marts – benefits • Allows good division of labor • Computers for transactions separate from computers for queries • Interface development separate from database development. • Biologists (can be) separated from computer scientists as a result of good interface design. • Produces faster more stable system for users

  6. Costs • Construction of the Mart is a challenging and continuous process. • New sources of data need to be incorporated and validated constantly • Trust

  7. The case for EnsMart, why now? • Growing number of different databases and opportunities. Genomes, expression, protein, disease… • Assembled, high quality genomes available. • “finished” genomes can be used as references to link data from different databases consistently. • EnsMart built to take advantage of the opportunities for cross-database queries.

  8. Inside EnsMart • 9 organisms • At least 17 different primary sources of data, many with multiple databases. • 2 kinds of “Foci” • Genes • Ensemble • EST • Vega • SNPs

  9. EnsMart schema Many Many One Focus 1 Many Many

  10. EnsMart schema: another focus

  11. Schema -> Query Speed • “Central” tables or foci contain binary values for each satellite indicating existence. First step in query generation limits the range of satellite tables accessed. • These values are only useful in the query process (take extra space and time for transactions). • Many queries may not require access to satellite tables as a result.

  12. User Interfaces • Supposedly Confucian quote • "What I hear I forget. • What I see I remember. • What I do I understand."

  13. User Interfaces • MartView: website, “wizard” query construction. • MartExplorer: Stand alone tool, tree-based query construction. • MartShell: text-based application that utilizes an SQL-like query language. Can be used interactively or in batch processes. • Write your own! – using MartLib java library

  14. MartView 1 Mart View 1 Choose org and focus

  15. MartView 2 Design query

  16. MartView 3 Specify Output

  17. MartExplorer

  18. MartShell

  19. Conclusions • Powerful query system for biologists. • Useful framework for software engineers. • All open source! • What about other loci such as repetitive elements? • Data validation? • Annotation updates?

  20. EnsMart Discussion • What, if any, are the problems with the foci system? • What alternatives to this system exist? • Describe a task that EnsMart could be used to accomplish. • Describe any personal experiences with EnsMart.

More Related