1 / 19

by: Samson Kiware Janelle Schroeder

A Methodology for Implementing a Distributed Storage System for Structured XML data in a Health Care Environment. by: Samson Kiware Janelle Schroeder. Overview. Problems of RDBMS Why is it interesting to Health Care? Health Care XML Data Model What is Hadoop and Hypertable?

oya
Download Presentation

by: Samson Kiware Janelle Schroeder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Methodology for Implementing a Distributed Storage System for Structured XML data in a Health Care Environment. by: Samson Kiware Janelle Schroeder

  2. Overview • Problems of RDBMS • Why is it interesting to Health Care? • Health Care XML Data Model • What is Hadoop and Hypertable? • Hadoop/Hypertable Architecture • Hadoop/Hypertable Solution • Server Config File • Hypertable Schema • HQL Sample • Research Contributions • Related and Future Work • Questions and Answers

  3. Problems of RDBMS • Scalability • High cost of licensing, servers, memory and disks • Applications vary in volume of information required to access • Frequency of access - batch processing versus real time

  4. Why is it interesting to Health Care? • Requires different methods of data access • batch processing for historical decision support • trending and research • real time patient care. • Large data applications • Write once, read many atmosphere • Reduction in server cost and licensing • Centralized management of servers

  5. Health Care XML Data Model <Visit> <VisitNumber>67868687687<VisitNumber> <VisitDate>01/01/2008<VisitDate> <PrimaryPhy>Dr. Kiware<PrimaryPhy> <ReferringPhy>Dr. Schroeder<ReferringPhy> <ICD9Diagnosis>120.45< ICD9Diagnosis> <CPTProcedure>888.88<CPTProcedure> <Visit>

  6. What are Hadoop and Hypertable? • Hadoop is a distributed computing platform for running a processing system • Hypertable is an open source, high performance, scalable, distributed storage processing system for structured and unstructured data

  7. Hadoop/Hypertable Architecture

  8. Hadoop/Hypertable Solution • Handles applications with large datasets • Detection of faults and quick recovery • High Throughput processing • Centralized scheduling of server tasks and execution of batch processes • Deployed on low cost hardware • Eliminates or reduces the need for table joins • Access user mechanism - improves I/O performance

  9. Server config file <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>nematode</value> </property> <property> <name>mapred.job.tracker</name> <value>rat</value> </property> <property> <name>dfs.name.dir</name> <value>/logs</value> </property>

  10. Server config file cont’ <property> <name>dfs.data.dir</name> <value>/data</value> </property> <property> <name>mapred.system.dir</name> <value>/mapred/system</value> </property> <property> <name>mapred.local.dir</name> <value>/MapReduceData</value> </property> <property> <name>mapred.tasktracker.{map|reduce}.tasks.maximum</name> <value>1</value> </property> <property> <name>dfs.hosts/dfs.hosts.exclude</name> <value></value> </property> <property> <name>mapred.hosts/mapred.hosts.exclude</name> <value></value> </property> </configuration>

  11. Hadoop/Hypertable Architecture

  12. Hypertable Schema hypertable> describe table Pages;<Schema generation="1">  <AccessGroup name="default">    <ColumnFamily id="1">      <Name>refer-url</Name>    </ColumnFamily>    <ColumnFamily id="2">      <Name>http-code</Name>    </ColumnFamily>    <ColumnFamily id="3">      <Name>date</Name>    </ColumnFamily>  </AccessGroup></Schema>

  13. HQL Sample • Sample for a patient centric model, where MRN (medical record number of patient) serves as a row key and column families are created for different categories of health information: • See next slide

  14. CREATE TABLE “Patient” ROWKEY: <MRN>12234434<MRN> Column_Family_Name: “Visit”, “Insurance”, “Genetic Profile” Column: “Visit” Value: <Visit> <VisitNumber>67868687687<VisitNumber> <VisitDate>01/01/2008<VisitDate> <PrimaryPhy>Dr. Kiware<PrimaryPhy> <ReferringPhy>Dr. Schroeder<ReferringPhy> <ICD9Diagnosis>120.45< ICD9Diagnosis> <CPTProcedure>888.88<CPTProcedure> <Visit> Timestamp: (TODAY’S DATE/TIME) HQL Sample

  15. HQL Sample • Add insert statement

  16. Research Contributions • Install Hadoop and Hypertable an open source, distributed storage system in cluster environment • Create documentation for installation in a Linux and Windows environment • Designed and implement a data model for a health care environment

  17. Related and Future Work • Google’s BigTable • Web Crawler • Solution for managing xml schema versions • Conduct comparative performance research • Investigate job tracking and task scheduling • Apply 3-dimmensional data warehousing techniques (Type 1, Type 2 or Type 3)

  18. Questions and Answers

More Related