210 likes | 348 Views
Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview. Dean N. Williams, Don E. Middleton, Ian T. Foster, and David E. Bernholdt On behalf of the ESG-CET Team Project Web Site: http://esg-pcmdi.llnl.gov Mid-Term Project Review Rockville, MD May 11, 2009.
E N D
Earth System Grid Center For Enabling Technologies (ESG-CET)Introduction and Overview Dean N. Williams, Don E. Middleton, Ian T. Foster, and David E. BernholdtOn behalf of the ESG-CET Team Project Web Site: http://esg-pcmdi.llnl.gov Mid-Term Project Review Rockville, MD May 11, 2009
Agenda Introduction and Overview Overall Architecture Design Gateway Data Node Break Accomplishments Collaborations and Partnerships Recap of Morning Presentations Lunch Research and Development Break Demonstration Future Work Summary • Review folder • http://esg-pcmdi.llnl.gov/review-folder • Review presentations • http://esg-pcmdi.llnl.gov/review-folder/presentations
A Brief History: ESG-I, 2000-2001 • The emerging challenge of climate data • Proposal to DOE’s Next Generation Internet (NGI) program in March 1999 • ANL, LANL, LBNL, LLNL, NCAR, USC/ISI • Data movement and replication • Prototype climate “data browser” • “Hottest Infrastructure” award at SC2000 • NGI cut short, follow-on funding from OBER & MICS • Ideas on the table, partnerships, experience • Minimal end-user deployment or use • Began development of SciDAC proposal
A Brief History: ESG-II, 2001-2006 • SciDAC Program announced, began proposal in 2000 • ANL, LANL, LBNL, LLNL, NCAR, ORNL, USC/ISI • “Turning Climate Datasets into Community Resources” • New focus on web-based portals, metadata, seamless access to archival storage, security, operational service • Uncertain about size of audience, hoping for 100-200 • Very positive mid-term assessment in 2003 • PCMDI accepted WGCM/CMIP role in 2004 • Operational CCSM portal in 2004 • Operational IPCC/CMIP portal later in 2004 • In 2006, 200 TB of data, 4000 users, 130TB served
Purpose and Scope Purpose • Provide climate researchers worldwide with access to data, information, models, analysis tools, and computational resources required to make sense of enormous climate simulation datasets Scope • Petabyte-scale data volumes • Gateway to climate change data products, model outputs and informational sites (i.e., globally federated sites) • Comprehensive registry of climate change Earth Science research results and components • Support climate change and its partner scientists, analysts, data managers, educators and decision makers • Resource to national and international science and societal benefit initiatives • Resource to climate change data products through interoperable web service and climate analysis tools
Objectives • Meet specific distributed database, data access, and data movement needs of national and international climate projects • Provide a universal and secure web-based data access portal for broad multi-model data collections • Provide a wide-range of Grid-enabled climate data analysis tools and diagnostic methods to international climate centers and U.S. government agencies. • Develop Grid technology that enhances data accessibility and usability • Make newly developed tools and technologies available for use in other domains
Project Team Key Institutional PI Project Co-PI Project Lead PI Executive Committee • ANL • Rachana Ananthakrishnan • Ian Foster • Neill Miller • Frank Siebenlist • LBNL • Junmin Gu • Vijaya Natarajan • Arie Shoshani • Alex Sim • LLNL • Robert Drach • Dean N. Williams • LANL • Phil Jones • NCAR • David Brown • Julien Chastang • Luca Cinquini • Peter Fox • Danielle Harper • Nathan Hook • NCAR (cont.) • Don Middleton • Eric Nienhouse • Gary Strand • Patrick West • Hannah Wilcox • Nathaniel Wilhelmi • Stephan Zednik • PMEL • Steve Hankin • Roland Schweitzer • ORNL • David Bernholdt • Meili Chen • Jens Schwidder • Sudharshan Vazhkudai • USC/ISI • S. Bharathi • Ann Chervenak • Robert Schuler • Mei-Hui Su
Concept Overview Standard Browser, Web Services Workstation Applications, Thick Clients
Capabilities, Usage, and Impact Average downloads: 400 to 600 GB/day Capabilities • “Virtual Datasets” created through subsetting and aggregation • Metadata-based search and discovery • Bulk data access • Web-based access Usage • Archive Facts • NCAR Gateway • Data holdings: 198 TB • Registered users: 13,000+ • Data Downloaded:100 TB • http://www.earthsystemgrid.org • PCMDI/LLNL CMIP3 Gateway • Data holdings: 35 TB • Registered users: 3,000+ • Data Downloaded:600+ TB • http://www-pcmdi.llnl.gov Over 500 sites worldwide Over 500 scientific papers published based CMIP3 data
Data Integration Challenges Facing Climate Science • Modeling groups will generate more data in the near future than exist today • Large part of research consists of writing programs to analyze data • How best to collect, distribute, and find data on a much larger scale? • At each stage tools could be developed to improve efficiency • Substantially more ambitious community modeling projects (Petabyte (PB 1015) and Exabyte (EB 1018)) will require a distributed database • Metadata describing extended modeling simulations (e.g., atmospheric aerosols and chemistry, carbon cycle, dynamic vegetation, etc.) (But wait there’s more: economy, public health, energy, etc. ) • How to make information understandable to end-users so that they can interpret the data correctly • More users than just Working Group (WG) 1-science. (WG2-impacts and WG3-mitigation) (Policy makers, economists, health officials, etc.) • Integration of multiple analysis tools, formats, data from unknown sources • Trust and security on a global scale (not just an agency or country, but worldwide )
Complexity of Data Distribution Future coupled runs will produce much larger data sets Storage and retrieval needs new thinking Additional quality assurance data and software Tools to facilitate publication and cataloging of output Publication - the act of putting data in the database and making it visible to others Cataloging - describes information about where a data set, file or database entity is located Automated updating of output availability/status pages Automated notification to users with updates tailored to their interests (new, withdrawn, replaced data) Sophisticated discovery capabilities Common data transfer tasks can be automated
It’s All About the Data • Data publication • Data access • Data viewing • Data sharing • Data versioning • Data replication • Data products • Data delivery • Standards and interoperability
Strategic Challenges for ESG-CET • Sustain and build upon the existing ESG archives • Address future scientific needs for data management and analysis by extending support for sharing and diagnosing climate simulation data • Coupled Model Intercomparison Project, Phase 5 (CMIP5) for scientists contributing to the IPCC Fifth Assessment Report (AR5) in 2010 • SciDAC II: A Scalable and Extensible Earth System Model for Climate Change Science • The Climate Science Computational End Station (CCES) • The North American Regional Climate Change Assessment Program (NARCCAP) • Other wide-ranging climate model evaluation activities • How to make information understandable to end-users so that they can interpret the data correctly • Local and remote analysis and visualization tools in a distributed environment (i.e., subsetting, concatenating, regridding, filtering, …) • Integrating analysis into a distributed environment • Providing climate diagnostics • Delivering climate component software to the community
CMIP5 (IPCC AR5) is a Major Driver for ESG Development CMIP5 multi-model archive expected to include 3 suites of experiments (“Near-Term” decadal prediction, “Long-Term century & longer), and “Atmosphere-Only”) 40+ models 600+ TB “core” data, 6+ PB total data Contributed by 25+ modeling centers in 17+ countries Driver for scale of data, global distribution Timeline fixed by IPCC Already working with key international partners to establish testbed Program for Climate Model Diagnosis and Intercomparison - PCMDI (U.S.) National Center for Atmospheric Research - NCAR (U.S.) Oak Ridge National Laboratory – ORNL (U.S.) Geophysical Fluid Dynamics Laboratory - GFDL (U.S.) British Atmosphere Data Centre - BADC (U.K.) Max Planck Institute for Meteorology - MPIM (Germany) JAMSTEC and University of Tokyo Center for Climate System Research (Japan)
ESG-CET AR5 Timeline • 2008: Design and implement core functionality: • Browse and search • Registration • Single sign-on / security • Publication • Distributed metadata • Server-side processing • Early 2009: Testbed • Plan to include at least seven centers in the US, Europe, and Japan • 2009: Deal with system integration issues, develop production system • 2010: Modeling centers publish data • 2011-2012: Research and journal articles submissions • 2013:IPCC AR5 Assessment Report
Leverage best-in-class tools and capabilities developed elsewhere Increase outreach, ability to serve scientific community, impact Joint development of new ideas, technologies of common interest ESG-CET Collaborates Extensively Key: -Relying on ESG to reach their goals are highlighted in “italic blue” - Relying on ESG to develop toolsand technologies are highlighted in “italic red” - Relying on ESG to deliver their products to the climate science community are in “italic green”
Accomplishments: Development Two major accomplishments are the Gateway and the Data Node which form the main components of the ESG-CET architecture. • Gateway web application (new) • Data Node components integration (new publishing client integrated with existing TDS and LAS servers, and with Gateway) • Security architecture for federation across Gateways and partner Data Centers • OpenID for web SSO • MyProxy integration for rich client access • Web Services for user attributes retrieval • Architecture for metadata exchange among Gateways and partner Data Centers (based on OAI-PMH) • BeStMan middleware for deep storage files retrieval (new) • Handling and access of detailed model metadata (in collaboration with Earth System Curator)
Accomplishments: Operational • Sustained data deliver from 2004 – present from three ESG data portals • Register over 16,000 users worldwide • Over 700 TB downloaded (coming up on 1 PB milestone) • Reached milestone of 500 scientific research papers published based on CMIP3 • Added C-LAMP, NARCCAP, and CFMIP to the distributed archive
Future Plans Short-term: • Packaging and documentation of Gateway software • Packaging and documentation of the Data Node software • Integration with Data Mover Lite (DML) • Federation with partner data centers • Longer-term: • Gateway customization • Expanded visualization services • Gateway and Data Node invoking more of the LAS functionality • GIS services • Google Earth services • Remote query services for rich client access • User and Group workspaces • Server-side processing and analysis services