1 / 38

Introduction to gCube: promoting an ecosystem approach to controlled resource sharing

Introduction to gCube: promoting an ecosystem approach to controlled resource sharing. gCube - FAO's Information Systems Architecture Forum FAO, Rome 25 January 2011. Pasquale Pagano Pasquale.pagano@isti.cnr.it. www.d4science.eu. Outline. D4Science-II Challenges.

julie
Download Presentation

Introduction to gCube: promoting an ecosystem approach to controlled resource sharing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to gCube: promoting an ecosystem approach to controlled resource sharing gCube - FAO's Information Systems Architecture Forum FAO, Rome 25 January 2011 Pasquale Pagano Pasquale.pagano@isti.cnr.it www.d4science.eu

  2. Outline

  3. D4Science-II Challenges D4Science Ecosystem Challenges • Heterogeneous resources • Heterogeneous computational platforms • Rich set of legacy applications • Multiple administrative domains • Evolving communities D4SCIENCE INFRASTRUCTURE DRIVER Portal Group B GENESI-DR Group C Group C FAO Geonetwork Group A FAO FIGIS AquaMaps INSPIRE Hadoop EGEE/EGI

  4. D4Science-II Challenges D4Science II Current Status Data Resources • File based Data: • 64 large data collections gathering 164087 objects • Tabular Data: • Time series (catch statistics) • Environmental authority (properties of ~250K marine areas) • Species environmental envelope (environmental description of 11k species) • Species assignment (assignment of a species to cell areas, ~2.75 billion records)   • Geospatial Data: • Environmental (SST, salinity, sea ice concentration, distance to land, etc) • Layers and several thousand species distribution layers (~25k layers) • Others Data Resources: • Metadata collections in multiple schemas (163) • Full text, forward, and geo spatial indexes (165) • Transformation programs (41) HW and SW Resources

  5. D4Science-II Challenges From a testbed to a production ecosystem Oct .’04 Nov.’07 Jan.’08 Oct .’09 Dec.’09 Sept.’11

  6. gCube identity: the starting point gCube Open Platform • gCube is physically distributed across • Libraries • Services • import, collect, store, index, transform, search, describe, manage, and annotate data. • .. • .. • .. • .. • .. • .. • .. • .. • .. • server-side libraries, client-side libraries, plugins Portlets • interactive components, mediators • Designed for working at large scale • over wide-area links and across administrative domains • to cope with the computational demands • can be easily deployed in a single site

  7. gCube identity: the starting point gCube Release Cycle Procedure Bug Fixing Patching the Production • Release 2.2.2: • 23 subsystem • 307 software packages • 22 full-time developers • 4 testers

  8. gCube identity: the starting point gCube Software License: EUPL

  9. gCube e-Infrastructure • A gCube e-Infrastructure promotes effective consumption of shared resources: • hardware resources • data resources • software resources to facilitate research collaborations that span institutions, disciplines, and countries within a coherent model, regardless of the location of their research facilities • It extends the e-Infrastructure concept by promoting sharing and collaboration and enforcing policies • It increases flexibility in the organization of community resources with Virtual Research Environments • gCube e-Infrastructure enabler: the VRE innovation

  10. Virtual Organization • A Virtual Organization (VO) specifies how a set of users can access a set of resources • what is shared • who is allowed to share • the conditions under which sharing can occur • Is the VO adequate to represent a growing aggregation of resources tailored to satisfy the evolving needs of the user community? • NO, it is not ! • Common scenarios • Data needs to be assessed before to make it publically exploitable by the VO members. • Restricted set of users have to collaborate to refine processes and implement show cases. • Products generated through elaboration of data or simulation have to be validated by expert users. • gCube e-Infrastructure enabler: the VRE innovation

  11. Virtual Research Environment VO VRE 1 • Virtual Research Environment (VRE) is • a distributed and dynamically created environment • where subset of resources can be assigned to a subset of users via interfaces • for a limited timeframe • at little or no cost for the providers of the infrastructure • gCube e-Infrastructure enabler: the VRE innovation VRE 2 gCube is a first example of a VRE management system

  12. How does it work ? • gCube e-Infrastructure enabler: the VRE innovation

  13. Why sharing through VREs is a key? • A Virtual Research Environment (VRE) supports cooperative activities • Metadata cleaning, enrichment, and transformation by exploiting mapping schema, controlled vocabulary, thesauri, and ontology • Processes refinement and show cases implementation (restricted to a set of users); • Data assessment (required to make data publically exploitable by VO members); • Expert users validation of products generated through data elaboration or simulation. • gCube e-Infrastructure enabler: the VRE innovation

  14. Why sharing through VREs is a key? VREs integrated environment put at disposal a functionality set to support and perform activities: • the ability to integrate heterogeneous data and services • the ability to process information on-demand ingesting the results, • to share data and process with other users, • to customize collection of information, • to store user actions and exploit them for further use, • to aggregate relevant information into ad-hoc information sources and keeping them updated. • VREs integrated environment put at disposal a functionality set to support and perform activities: • the ability to integrate heterogeneous data and services • the ability to process information on-demand ingesting the results, • to share data and process with other users, • to customize collection of information, • to store user actions and exploit them for further use, • to aggregate relevant information into ad-hoc information sources and keeping them updated. • gCube e-Infrastructure enabler: the VRE innovation

  15. Why sharing through VREs is a key? • Through the VRE, groups of users have controlled access to distributed data and services integrated under a personalised interface. • gCube e-Infrastructure enabler: the VRE innovation

  16. Building Virtual Research Environments

  17. VRE Facilities A virtual desktop to organize the working environment Workspace • gCube e-Infrastructure enabler: the VRE innovation Species Maps Generation Tools supporting specific tasks Time Series Management A virtual live document to describe research results Report Management Search Annotation Visualisation Storage Transformation Search Annotation Visualisation Storage Transformation Search Annotation Visualisation Storage Transformation …

  18. Workspace • A collaboration-oriented suite providing for • seamless access and organisation facilities on a rich array of objects (e.g. Information Objects, Queries, Files, Templates) • mediation between external world objects, systems and infrastructures (import/export/publishing) • support common file manager (drag & drop, contextual menu) • support an effective rich object sharing facility • gCube e-Infrastructure enabler: the VRE innovation

  19. Species Distribution Maps Generation • AquaMaps is an application* • tailored to predict global distributions of marine species initially designed for marine mammals and subsequently generalised to marine species, • that generates color-coded species range maps using a half-degree latitude and longitude blocks • by interfacing several databases and repository providers • gCube e-Infrastructure enabler: the VRE innovation * Algorithm by Kashner et al. 2006

  20. Species Distribution Maps Generation • AquaMaps execution is based on the gCube Ecological Niche Modelling Suite which allows the extrapolation of known species occurrences • gCube e-Infrastructure enabler: the VRE innovation • to determine environmental envelopes (species tolerances) • to predict future distributions by matching species tolerances against local environmental conditions (e.g. climate change and sea pollution) Very large volume of input and output data: HSPEC native range 56,468,301 - HSPEC suitable range 114,989,360 Very large number of computation: One multispecies map computed on 6,188 half degree cells (over 170k) and 2,540 species requires 125 millions computations (Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center)

  21. Time Series Management • Offers a set of tools to manage capture statistics • Supports the complete TS lifecycle • Supports validation, curation, and analysis • Provides support for data reallocation • Produces uniform data-set • gCube e-Infrastructure enabler: the VRE innovation

  22. Time Series • Offers a set of tools to operate on capture statistics • Multiple key families support • Filtering, grouping, and aggregation • Union • Mining • Produce automatically provenance information • gCube e-Infrastructure enabler: the VRE innovation

  23. Report Management • A collaboration-oriented suite providing for • template-oriented, feature-rich and flexible document format definition • effective and infrastructure-integrated report compilation (drag & drop workspace items) • collaborative and distributed editing (workspace based) • standard-based report materialisation (HTML, OpenXML) • gCube e-Infrastructure enabler: the VRE innovation

  24. gCube model Wide-area computing based on shared computing, data, and service resources. • provision as Federation but resources can be acquired by the infrastructure • added value for consumers and providers • ownership is decentralised but control is autonomic • resources are heterogeneous • security is pervasive but mostly hidden by gCube middleware Application model is dominantly resource-oriented • VREs profiled as aggregation of resources dynamically deployed, executed, and terminated • are interactive • are built on shareable resources (including workflow) in their own right • are published and discoverable • may integrate storage elements sited at communities site • may host applications that can also be executed by interfacing classic grid and cloud • Deployment model • dynamic and autonomic • Development platform • complete service programming abstraction • gCube e-Infrastructure enabler: the VRE innovation

  25. Interoperability: Assumptions • Consolidated facts: • Very rich applications and data collections are currently maintained by a multitude of authoritative providers • Different problems require different execution paradigms: batch, map-reduce, synchronous call, message-queue, … • Key distributed computation technologies exist: grid (gLite and Globus), distributed resource management (Condor), clusters (Hadoop), … • Several standards are adopted in the same domain • Societal observations • A rich variety of protocols, models, and formats • Create barriers in the usage of resources • Delay dramatically new exploitation patterns • Technical observations • Protocols, models, and formats heterogeneity increases load, • Load increases failures gCube interoperability framework: the challenge

  26. Interoperability: Landscape • Unstructured Data: blob (binary), and textual files • Structured Data: tabular, statistical, geospatial, temporal, and textual data • Compound Data: data composed by unstructured and structured data entities gCube interoperability framework: the challenge security

  27. Interoperability: gCube Vision • gCube objectives: • hide heterogeneity, i.e. abstract over differences in location, protocol, and model; • embrace heterogeneity, i.e. allow for multiple locations, protocols, and models; • Technical goals • no bottlenecks: scale no less than the interfaced resources • no outages: keep failures partial and temporary • autonomicity: system reacts and recovers gCube interoperability framework: the vision

  28. Hiding Heterogeneity • Heterogeneous resources are virtually accessible in a common ecosystem of resources • despite their locations, technologies, and protocol • Different communities have access to different views • according to the conditions under which the sharing can occur gCube interoperability framework: the challenge • Each community can define its own VRE • for a limited timeframe and at no cost for the providers of the resource • Several VRE can coexist • without interfering each other even by competing for the same resources

  29. Embracing Heterogeneity • Approaches and solutions to achieve interoperability : • Blackboard-based • asynchronous communication between components in a system • one protocol to R/W and one language to specify messages • Wrapper/ Mediator-based • translates one interface for a component into a compatible interface • Proxy-based • exposes the same interface but allows additional operation over received calls • Adaptor-based • provides a unified interface to a set of other components interfaces and encapsulates how this set of objects interact • Broker-based • Specialises an Adaptor by coordinating communication gCube interoperability framework: the approach

  30. Interoperability Approaches: Resource Discovery • Each resource is represented by a profile (metadata) characterising: • the interface • the state • the list the dependencies • the run-time status • the policies • the configuration • the pending tasks to execute • A Resource profile • is published by the resource owner • is discovered by the resource consumers asynchronously through a common resource-independent protocol • gCube offers a distributed and scalable Information System (blackboard) to store, discover, and access resource profiles gCube interoperability framework: the solution

  31. Interoperability Approaches: Content Interoperability gCube Open Content Management Architecture (OCMA) • Assumption • data stored in different storage back-ends • diverse locations, models, access types • few common primitives: documents, collections, repositories • gCube allows to • reach content that lies outside system • expose content (reachable from) inside system • perform coarse-grained as well as fine-grained retrieval, update, and addressing • Runtime scalability • autonomic read-only state replication, • maximize throughput, minimize response time: discovery-time load balancing • reduce latencies • Software • plugin-based architecture to reduce development costs gCube interoperability framework: the solution

  32. Interoperability Approaches :Data Discovery and Access • gCube offers • Several index types • Forward indexing, which supports ultra fast lookups on tabular typed metadata; • XML indexing, that supports semistructured lookups on content metadata; • Textual field indexing, that supports full text and qualified lookups on textual (mainly) metadata; • Metadata full text indexing, that enables full text lookups on metadata; • Content full text indexing, that enables full text lookups on text extracted by content; • Geospatial/temporal indexing, that enables geospatial proximity and coverage queries to be executed over geospatial/temporal metadata; • Feature indexing, that enables high-dimension vector indexing, for feature lookup (currently the feature is inactive); • Runtime scalability - WORM (Write Once – Read Many) behavior pattern • multiple readers (Lookups in gCube lingo) • single updater for each index • Autonomic sync under a dynamically expanding/shrinking gCube interoperability framework: the approach

  33. Interoperability Approaches :Data Representation and Manipulation • gCube offers • Open transformation service framework • Extendible with specific source-target mediators • To use for metadata and data crosswalk transformations • Tailored for statistical, geospatial, temporal, and textual data • Rich set of reference data • Extendible with domain-specific reference data • To reuse in services for data curation and harmonization • Support for geospatial services • To capture, manage, analyze, and display all forms of data that can be geographically referenced • Integrated resources registry • Format agnostic • To support discovery and access gCube interoperability framework: the approach

  34. Interoperability Approaches : Process Execution [1/2] • gCube offers solutions to: • Decouple the business domain and infrastructure specific logic from the core “execution” functionality • Invocate a wide range of logic components: SOAP and REST WebServices, Shell Scripts, Executable Binaries, POJOs, … • Support most of the execution paradigms: batch, map-reduce, synchronous call • Bridges key distributed computation technologies: grid (gLite and Globus), Condor, Hadoop • Control and monitor the execution of a processing flow • Staging of data among different storage providers • Streaming data among computation elements gCube interoperability framework: the approach

  35. Interoperability Approaches : Process Execution [2/2] • By using adaptors that • operate on a specific third party language and translate them into native constructs, • allow for the creation of complex workflows that exploit several diverse technologies deployed on different infrastructures gCube interoperability framework: the approach

  36. Conclusions • gCube System: • Stable software being improved over the last 5 years • Powerful Ecosystem management system equipped with advanced infrastructure management functionality • gCube offers a variety of patterns, tools, and solutions • to delivery interoperability solutions and interconnect • Heterogeneous digital content • Heterogeneous repository systems • Heterogeneous computation platforms • to decrease the cost of adoption • to reduce the time to market of new ideas • to deal with plethora of standards

  37. Supported Standards • WSRF Specifications • WS-ResourceProperties (WSRF-RP) • WS-ResourceLifetime (WSRF-RL) • WS-ServiceGroup (WSRF-SG) • WS-BaseFaults (WSRF-BF) • JSR • 168 : Simple Portlets • 286 : 186 update • 160 : JMX • WSN Specifications: • WS-BaseNotification • WS-Topics • (WS-BrokeredNotification) • WS-* Standards • SOAP • WSDL • WS-Addressing • ISO: • ISO3166 countries • ISO4217 currencies • ISO1915 geo-location • X-* • XML • XSD • XSL • XSLT • xPath • xQuery • OGC • Web Coverage Processing Service • Web Coverage Service • Web Feature Service • Web MapContext • Web Map Service • Web MapTile Service • Web Processing Service • Web Service Common • OGF Standard: • Glue Schema (2) • ………. • Comply with: • OAI-PMH • OAI-ORE

  38. Find us • www.gcube-system.org www.d4science.eu Pasquale Pagano D4Science-II Technical Director pasquale.pagano@isti.cnr.it Thank You For Your Attention

More Related