1 / 31

Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State Univ

CLIMDB/HYDRODB: A Web Harvester And Data Warehouse Approach To Building A Cross-site Climate And Hydrology Database. Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis, Oregon.

gerik
Download Presentation

Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State Univ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLIMDB/HYDRODB: A Web Harvester And Data Warehouse Approach To Building A Cross-site Climate And Hydrology Database Don Henshaw Andrews Experimental Forest LTERPacific Northwest Research Station, USFS Forest ServiceOregon State UniversityCorvallis, Oregon EAP ILTER 9 July 2007

  2. Long-Term ResearchLong-Term Ecological Research (LTER)U.S. Forest Service Research (USFS)International LTER (ILTER) • The 20-year review of LTER challenges the network to enhance its inter-site research activities by adopting a strategy for network-based research • USFS Research intends to increase collaboration and develop network products for existing Experimental Forests/Watersheds • International LTER collaboration EAP ILTER 9 July 2007

  3. LTER Network Information System (NIS)Goals • Allow and enhance discovery and access of information Foster development of network-level datasets Commit to populate climate and hydrology datasets • Facilitate synthesis and integration of information Improve discovery, access, aggregation, and visualization of data across multiple sites Overcome diversity in individual site information systems • Promote collaboration and community-building Develop partnerships between Information Technology and science communities LTER Network Information System Advisory Committee, 2003, 2004 EAP ILTER 9 July 2007

  4. ClimDB/HydroDB Objectives • Improve access to long-term collections of climatic and hydrological data • Long-Term Ecological Research (LTER) • 26 NSF-funded sites • Taiwan Ecological Research Network (ILTER) • U.S. Forest Service Research • Experimental Forests / Experimental Watersheds • Use web technologies to facilitate synthetic research • Maintain a current data warehouse of multi-site, multi-network, long-term data • Provide single portal accessibility with a query interface to download and graphically display data EAP ILTER 9 July 2007

  5. Data Providers Central Site Public User Centralized ClimDB/HydroDB Database Harvester Data Warehouse ClimDB/HydroDB Harvester / Database/ Query Interface LTER Data USFS Data Web Page display, graph, download Triggers on-demand auto-harvest HTTP Post Access Tools site-specific data mining Query interface Exchange Format Web Services SOAP, WSDL USGS Data NWS Data EAP ILTER 9 July 2007

  6. ClimDB/HydroDB ComponentsData Providers • Individual sites • Participating sites manage and control original source data within their local information systems • Sites provide data as a static or dynamically created file • Exchange format • Consistent, comma-delimited file • Flexibility allows contributors to add or remove parameters from harvest files at any time • Attributes and units standardized and based on a controlled vocabulary EAP ILTER 9 July 2007

  7. Centralized ClimDB/HydroDB Database Harvester Data Warehouse “Harvester” Mechanics Harvest Transform, QA,Load Exchange Data Feedback Error logs Site contact ClimHy Admin • The Quality Assurance (QA)/Feedback System: • Provides feedback through error and warning messages directly to the • client’s browser and through e-mail • Specifies errors in exchange format • Identifies data limit and integrity errors • Enables sites to quickly modify their datasets for successful re-harvesting

  8. Participant Web Page EAP ILTER 9 July 2007 http://www.fsl.orst.edu/climhy/harvest/harvest.htm

  9. Duplicate records found EAP ILTER 9 July 2007

  10. Illegal number of data fields in exchange file EAP ILTER 9 July 2007

  11. Failed min<mean<max relationship EAP ILTER 9 July 2007

  12. Georgia Coastal Ecosystem LTER Collaboration • Allows HydroDB to directly harvest U. S. Geological Survey (USGS) gauging station data from their webpage • Captures near real-time provisional USGS hydrological data on a weekly schedule • Harvests USGS historical data and replaces the provisional data with final archived versions on a regular basis • Generalized as a service to the broader LTER community EAP ILTER 9 July 2007

  13. Georgia Coastal Ecosystem LTER Collaboration USGS Data Harvesting Service

  14. Centralized ClimDB/HydroDB Database Transform, QA,Load Harvester Data Warehouse Centralized Architecture • Source data is loaded into a global schema in the relational database (RDBMS) • Calculates and loads aggregated data (monthly, annual) • The global schema for the data warehouse is based on highly normalized tables within the database • allows simple structures to house all site data and metadata • is extensible to additional daily measurements • The central data warehouse is persistent and participants can continually update and replace harvested data EAP ILTER 9 July 2007

  15. Public Access Web Page Data Access Page EAP ILTER 9 July 2007 http://www.fsl.orst.edu/climhy

  16. Data Acquisition Download or Graphical Display EAP ILTER 9 July 2007

  17. EAP ILTER 9 July 2007

  18. Metadata Reports Detail information for the general site, all stations, and all parameters. Metadata descriptions can also be downloaded as a PDF EAP ILTER 9 July 2007

  19. Georgia Coastal Everglades (GCE)Matlab Data Toolbox GUI dialog for retrieving ClimDB/HydroDB data EAP ILTER 9 July 2007 From Wade Sheldon (GCE)

  20. Imported data set (GCE data grid view, with flagged values displayed) Imported data set GCE tools editor window) ClimDB/HydroDB Metadata template EAP ILTER 9 July 2007

  21. EAP ILTER 9 July 2007

  22. Web Services demonstration of ClimDB EML Resource Description Climate Data ClimDB Centralized ClimDB Database(Andrews LTER) Diagram modified from Longjiang Ding, SDSC The centralized ClimDB database at Andrews LTER is populated 4. 1. Harvester sends XML request for data to Web Service Client (Harvester) 3. LTER Site climate data are returned to harvester in XML SOAP, WSDL, UDDI SOAP, WSDL, UDDI ClimDB Web Services Data Service Metadata Service Notification Service ClimDB Web Services Data Service 2. One web Service queries an LTER Site database, another exports the data, and another issues an email to the LTER Site data manager detailing success of query 5. XML ClimDB Config File Web Service wraps the entire centralized ClimDB database Wizard LTER Site

  23. Site Contribution Site contributions have increased dramatically in the past year for air temperature, precipitation, and stream discharge. • Participation includes: • 40 total sites • 24 LTER sites + 2 International LTER sites • 22 USFS sites • 11 sites include USGS gauging stations • 281 total measurement stations • 143 meteorological, 138 stream gauging (59 USGS) • 21 daily measurement parameters • 7,200,000 daily values EAP ILTER 9 July 2007

  24. Data Warehouse Content Primary emphasis • Observations: • Coverage of precipitation, discharge, and air temperature data is strong across sites. • We encourage sites to contribute relative humidity, soil temperature, wind speed & direction, and global radiation in datasets. Secondary emphasis EAP ILTER 9 July 2007

  25. ClimDB/HydroDB Web Access Summary Visitors to the ClimDB/HydroDB web interface are increasing and currently average 30 sessions per day.  Type of download Downloads Files Plots Displays Total 6700 38% 50% 12% Values based on data from February 2003 - August 2006 EAP ILTER 9 July 2007

  26. Status of Type of Use Values based on data plots from January - March 2004 EAP ILTER 9 July 2007

  27. Keys to Successful Implementation • Scientific interest • Scientist/modeler demand for current and comparable data • Need for synthetic data products • Organizational • Commitment to building network databases • Information management (15% LTER site budget) • Data access / release policies • Data collection standards • Planning meetings included Climatologists, Information Managers, Data Users/Modelers, and Field Technician participation • Incentives • Financial incentives • Value-added products returned to participating sites • Easy access, aggregated data, graphical displays, QA checks • Host site commitment • Leadership, time, resources EAP ILTER 9 July 2007

  28. Conclusions • The ClimDB/HydroDB approach is an effective bridge technology between older, more rigid data distribution models and modern service-oriented architectures • Establishes software and service development at the central node permitting rapid adaptation to changing needs • Maintains low-overhead, flexibility and technological neutrality for data providers • Additional "concentrator nodes" and middleware services can also be deployed very easily and rapidly within this model to improve efficiency and build bridges to other federated databases EAP ILTER 9 July 2007

  29. Acknowledgement • Funding was provided by • National Science Foundation (NSF) • Long-Term Ecological Research (LTER) supplemental funding • U. S. Forest Service Research and Development • Forest Health Monitoring (FHM) program • Pacific Northwest Research Station (PNW) • …to the Andrews Forest LTER at Oregon State University for • ClimDB/HydroDB development • …to individual sites for the preparation of climate and hydrology data • Visit ClimDB/HydroDB at http://www.fsl.orst.edu/climhy EAP ILTER 9 July 2007

  30. User Guide Section 1.3 Required Steps for Site Participation To participating the site will: • Provide the research areas, meteorological stations, gauged watersheds, and gauging station names and code names • Restructure local site data into a standardized daily exchange format • Use the online metadata forms to provide metadata for overall research area, for every weather station and for every parameter • Harvest data EAP ILTER 9 July 2007

More Related