210 likes | 413 Views
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois. Shannon Hastings hastings@bmi.osu.edu Department of Biomedical Informatics Ohio State University. National Cancer Institute 2015 Goal. Relieve suffering and death due to cancer by the year 2015.
E N D
Cancer Bioinformatics Grid (caBIG)CANS 2006Chicago, Illinois Shannon Hastingshastings@bmi.osu.eduDepartment of Biomedical InformaticsOhio State University
National Cancer Institute 2015 Goal Relieve suffering and death due to cancer by the year 2015
Cancer Biomedical Informatics Grid (caBIGTM) The cancer Biomedical Informatics Grid (caBIG™), is a voluntary network or grid connecting individuals and institutions to enable the sharing of data and tools, creating a World Wide Web of cancer research. The goal is to speed the delivery of innovative approaches for the prevention and treatment of cancer. The infrastructure and tools created by caBIG™ also have broad utility outside the cancer community. • National Cancer Institute Initiative • Over 800 Participants • Over 80 Organizations • Over 70 Projects
Origins of caBIG • Need:Enable investigators and research teams nationwide to combine and leverage their findings and expertise in order to meet NCI 2015 Goal. • Strategy: Create scalable, actively managed organization that will connect members of the NCI-supported cancer enterprise by building a biomedical informatics network
caBIG Overview • Common, widely distributed infrastructure that permits the cancer research community to focus on innovation • Shared, harmonized set of terminology, data elements, and data models that facilitate information exchange • Collection of interoperable applications developed to common standards • Cancer research data is available for mining and integration
Syntacticinteroperability Interoperability • The ability of multiple systems to exchange information and to be able to use the information that has been exchanged. Semanticinteroperability
SEMANTIC SEMANTIC SEMANTIC SYNTACTIC caBIG Compatibility Guidelines
What is caGrid? • Development project of Architecture Workspace, aimed at helping define and implement Gold Compliance (the highest level of caBIG compatibility) • Gold compliance creates the G in caBIG • Gold => Grid => connecting Silver Compliant Systems • No requirements on implementation technology is necessary for Gold compliance • Specifications will be created defining requirements for interoperability • caGrid provides core infrastructure, and tooling to provide “a way” to achieve Gold compliance
Grid-Enabled Client Analytical Service Gene Database Tool 1 caArray Tool 2 Research Center NCICB Protein Database Grid Services Infrastructure (Metadata, Registry, Query, Invocation, Security, etc.) Grid Data Service Tool 3 Tool 4 Grid Portal Image Microarray Research Center Tool 2 Tool 3 caGrid Conceptual View
caGrid Data Description Infrastructure • Client and service APIs are object oriented, and operate over well-defined and curated data types • Objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR) • Object definitions draw from vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described • XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)
caGrid Components • Leverage existing technologies: • caDSR, EVS, Mobius GME: Common data elements, controlled vocabularies, schema management • Globus Toolkit (currently version 4.0.3) • Core grid services infrastructure • Service deployment, service registry, invocation, base security infrastructure • Additional Core Infrastructure • Higher-level security services (Dorian, GTS, GridGrouper) • Grid service access to metadata components (caDSR, GME, etc) • Workflow, Identifier services • Service Provider Tooling (Introduce) • Graphical service development and configuration environment • Abstractions from service infrastructure for Data and Analytical services • Deployment wizards • Client Tooling • High-level APIs for interacting with core components and services • Graphical Tools
Grid Authentication and Authorization with Reliably Distributed Services (GAARDS) • The GAARDS Security Infrastructure provides services and tools for the administration and enforcement of security policy in an enterprise Grid. • Developed on top of the Globus Toolkit • Extends the Grid Security Infrastructure (GSI) • Provide enterprise services and administrative tools for: • Grid User Management • Identity Federation • Trust management • Group/VO management • Access Control Policy management and enforcement • Integration between existing security domains and the grid security domain. • Security Infrastructure for the Cancer Biomedical Informatics Grid (caBIGTM)
GAARDS Services • Dorian • Grid User Account Management • Integration point between external security domains and the grid. • Allows accounts managed in external domains to be federated and managed in the grid. • Dorian allows users to use their existing credentials (external to the grid) to authenticate to the grid • Grid Trust Service (GTS) • Creation and Management of a federated trust fabric. • Supports applications and services in deciding whether or not signers of digital credentials/user attributes can be trusted. • Supports the provisioning of trusted certificate authorities and corresponding CRLS. • Grid Grouper • Group management service for the grid • Provides a group-based authorization solution for the Grid • Enforce authorization policy based on membership to groups
Data Service @ uchicago.edu Accessing caGrid workflow BPEL Workflow Doc Workflow inputs Workflow Mgmt Service BPEL Engine Analytic service @ duke.edu Workflow Results Analytic service @ osu.edu • Workflow management service • Sharing workflows • Get workflow status
Introduce Graphical Development Environment (GDE) • GUI for creating and manipulating a grid service • Provides means of simple creation of service skeleton that a developer can then implement, build, and deploy • Automatic code generation of complete caBIG compliant grid service which is configured to provide: • Security • Advertisement • Discovery • Complete Client API • Provides a set of tools which enable the developer to add/remove/modify/import methods of the service as well create sub-services. • Automatic code generation of all the required code, Globus grid service code/configuration, service configuration, implementation of the client, and stubbed implementation of the service
Introduce Generated Grid Service Architecture • Base service is a GT4 based WSRF capable grid service. • Utilize compositional inheritance (in lieu of non-standard port type extensions) to enable the service to inherit required features such as providing service security metadata and access to resource properties. • Utilize JNDI for server side configuration properties, and resources and resource properties. • Provide client and service side wrappers which implement the service designers interface as opposed to the document literal interface generated by Axis. • Provide metadata registration to the index service by configuring the Resource to register it’s service groups to a predefined caGrid MDS based Index Service.
Collaborating Architects and Developers • Ohio State University • Argonne National Lab • Duke University • Georgetown University • Semantic Bits
Project Resources and Communication • caBIG at NCI • http://cabig.nci.nih.gov • Globus Dev • http://dev.globus.org • caGrid 1.0 GForge Home: • Feature Requests • Bug Reports • Discussion Forums • Public Wiki • Quality Dasboards • Downloads / Source Repository • http://gforge.nci.nih.gov/projects/cagrid-1-0/ • caGrid Users Mailing List • https://list.nih.gov/archives/cagrid_users-l.html • cagrid_users-l@list.nih.gov
Cancer Bioinformatics Grid (caBIG)CANS 2006Chicago, Illinois Shannon Hastingshastings@bmi.osu.eduDepartment of Biomedical InformaticsOhio State University