1 / 47

Paul Avery University of Florida avery@phys.ufl

Open Science Grid Linking Universities and Laboratories in National CyberInfrastructure. Paul Avery University of Florida avery@phys.ufl.edu. SURA Infrastructure Workshop Austin, TX December 7, 2005. Bottom-up Collaboration: “Trillium”. Trillium = PPDG + GriPhyN + iVDGL

kamuzu
Download Presentation

Paul Avery University of Florida avery@phys.ufl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open Science Grid Linking Universities and Laboratories in National CyberInfrastructure Paul Avery University of Florida avery@phys.ufl.edu SURA Infrastructure WorkshopAustin, TXDecember 7, 2005 Paul Avery

  2. Bottom-up Collaboration: “Trillium” • Trillium = PPDG + GriPhyN + iVDGL • PPDG: $12M (DOE) (1999 – 2006) • GriPhyN: $12M (NSF) (2000 – 2005) • iVDGL: $14M (NSF) (2001 – 2006) • ~150 people with large overlaps between projects • Universities, labs, foreign partners • Strong driver for funding agency collaborations • Inter-agency: NSF – DOE • Intra-agency: Directorate – Directorate, Division – Division • Coordinated internally to meet broad goals • CS research, developing/supporting Virtual Data Toolkit (VDT) • Grid deployment, using VDT-based middleware • Unified entity when collaborating internationally Paul Avery

  3. Common Middleware: Virtual Data Toolkit VDT NMI Test Sources (CVS) Build Binaries Build & Test Condor pool 22+ Op. Systems Pacman cache Package Patching RPMs Build Binaries GPT src bundles Build Binaries Test Many Contributors A unique laboratory for testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software! Paul Avery

  4. VDT Growth Over 3 Years (1.3.8 now) www.griphyn.org/vdt/ VDT 1.1.8 First real use by LCG VDT 1.0 Globus 2.0b Condor 6.3.1 # of components VDT 1.1.11 Grid3 VDT 1.1.7 Switch to Globus 2.2 Paul Avery

  5. Globus 3.2.1 Condor 6.7.6 RLS 3.0 ClassAds 0.9.7 Replica 2.2.4 DOE/EDG CA certs ftsh 2.0.5 EDG mkgridmap EDG CRL Update GLUE Schema 1.0 VDS 1.3.5b Java Netlogger 3.2.4 Gatekeeper-Authz MyProxy1.11 KX509 System Profiler GSI OpenSSH 3.4 Monalisa 1.2.32 PyGlobus 1.0.6 MySQL UberFTP 1.11 DRM 1.2.6a VOMS 1.4.0 VOMS Admin 0.7.5 Tomcat PRIMA 0.2 Certificate Scripts Apache jClarens 0.5.3 New GridFTP Server GUMS 1.0.1 Components of VDT 1.3.5 Paul Avery

  6. VDT Collaborative Relationships Partner science projects Partner networking projects Partner outreach projects Requirements Deployment, Feedback Prototyping & experiments • Other linkages • Work force • CS researchers • Industry Computer Science Research Virtual Data Toolkit Science, ENG, Education Communities Techniques & software Tech Transfer Globus, Condor, NMI, TeraGrid, OSG EGEE, WLCG, Asia, South America QuarkNet, CHEPREO, Digital Divide U.S.Grids International Outreach Paul Avery

  7. Major Science Driver:Large Hadron Collider (LHC) @ CERN • 27 km Tunnel in Switzerland & France TOTEM CMS ALICE LHCb Search for • Origin of Mass • New fundamental forces • Supersymmetry • Other new particles • 2007 – ? ATLAS Paul Avery

  8. LHC: Petascale Global Science • Complexity: Millions of individual detector channels • Scale: PetaOps (CPU), 100s of Petabytes (Data) • Distribution: Global distribution of people & resources BaBar/D0 Example - 2004 700+ Physicists 100+ Institutes 35+ Countries CMS Example- 2007 5000+ Physicists 250+ Institutes 60+ Countries Paul Avery

  9. Korea Russia UK USA U Florida Caltech UCSD Iowa FIU Maryland LHC Global Data Grid (2007+) • 5000 physicists, 60 countries • 10s of Petabytes/yr by 2008 • 1000 Petabytes in < 10 yrs? CMS Experiment Online System CERN Computer Center 150 - 1500 MB/s Tier 0 10-40 Gb/s Tier 1 >10 Gb/s Tier 2 2.5-10 Gb/s Tier 3 Tier 4 Physics caches PCs Paul Avery

  10. Grid3 and Open Science Grid Paul Avery

  11. Grid3: A National Grid Infrastructure • October 2003 – July 2005 • 32 sites, 3,500 CPUs: Universities + 4 national labs • Sites in US, Korea, Brazil, Taiwan • Applications in HEP, LIGO, SDSS, Genomics, fMRI, CS Brazil www.ivdgl.org/grid3 Paul Avery

  12. Grid3 Lessons Learned • How to operate a Grid as a facility • Tools, services, error recovery, procedures, docs, organization • Delegation of responsibilities (Project, VO, service, site, …) • Crucial role of Grid Operations Center (GOC) • How to support people  people relations • Face-face meetings, phone cons, 1-1 interactions, mail lists, etc. • How to test and validate Grid tools and applications • Vital role of testbeds • How to scale algorithms, software, process • Some successes, but “interesting” failure modes still occur • How to apply distributed cyberinfrastructure • Successful production runs for several applications Paul Avery

  13. http://www.opensciencegrid.org Paul Avery

  14. Open Science Grid: July 20, 2005 • Production Grid: 50+ sites, 15,000 CPUs “present”(available but not at one time) • Sites in US, Korea, Brazil, Taiwan • Integration Grid: 10-12 sites Taiwan, S.Korea Sao Paolo Paul Avery

  15. OSG Operations Snapshot November 7: 30 days Paul Avery

  16. OSG Participating Disciplines Paul Avery

  17. OSG Grid Partners Paul Avery

  18. Example of Partnership:WLCG and EGEE Paul Avery

  19. OSG Technical Groups & Activities • Technical Groups address and coordinate technical areas • Propose and carry out activities related to their given areas • Liaise & collaborate with other peer projects (U.S. & international) • Participate in relevant standards organizations. • Chairs participate in Blueprint, Integration and Deployment activities • Activities are well-defined, scoped tasks contributing to OSG • Each Activity has deliverables and a plan • … is self-organized and operated • … is overseen & sponsored by one or more Technical Groups TGs and Activities are where the real work gets done Paul Avery

  20. OSG Technical Groups (deprecated!) Paul Avery

  21. OSG Activities Paul Avery

  22. OSG Integration Testbed:Testing & Validating Middleware Taiwan Brazil Korea Paul Avery

  23. Networks Paul Avery

  24. Evolving Science Requirements for Networks (DOE High Performance Network Workshop) See http://www.doecollaboratory.org/meetings/hpnpw/ Paul Avery

  25. UltraLight Integrating Advanced Networking in Applications http://www.ultralight.org 10 Gb/s+ network • Caltech, UF, FIU, UM, MIT • SLAC, FNAL • Int’l partners • Level(3), Cisco, NLR Paul Avery

  26. Education Training Communications Paul Avery

  27. iVDGL, GriPhyN Education/Outreach Basics • $200K/yr • Led by UT Brownsville • Workshops, portals, tutorials • Partnerships with QuarkNet, CHEPREO, LIGO E/O, … Paul Avery

  28. Grid Training Activities • June 2004: First US Grid Tutorial (South Padre Island, Tx) • 36 students, diverse origins and types • July 2005: Second Grid Tutorial (South Padre Island, Tx) • 42 students, simpler physical setup (laptops) • Reaching a wider audience • Lectures, exercises, video, on web • Students, postdocs, scientists • Coordination of training activities • “Grid Cookbook” (Trauner & Yafchak) • More tutorials, 3-4/year • CHEPREO tutorial in 2006? Paul Avery

  29. QuarkNet/GriPhyN e-Lab Project http://quarknet.uchicago.edu/elab/cosmic/home.jsp Paul Avery

  30. CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University • Physics Learning Center • CMS Research • iVDGL Grid Activities • AMPATH network (S. America) • Funded September 2003 • $4M initially (3 years) • MPS, CISE, EHR, INT

  31. Grids and the Digital Divide Background • World Summit on Information Society • HEP Standing Committee on Inter-regional Connectivity (SCIC) Themes • Global collaborations, Grids and addressing the Digital Divide • Focus on poorly connected regions • Brazil (2004), Korea (2005) Paul Avery

  32. Science Grid Communications Broad set of activities • (Katie Yurkewicz) • News releases, PR, etc. • Science Grid This Week • OSG Newsletter • Not restricted to OSG www.interactions.org/sgtw Paul Avery

  33. 2000 2001 2002 2003 2004 2005 2006 2007 Grid Timeline First US-LHCGrid Testbeds Grid Communications Grid3 operations GriPhyN, $12M UltraLight, $2M Start of LHC LIGO Grid DISUN, $10M iVDGL, $14M CHEPREO, $4M OSG operations VDT 1.0 Grid Summer Schools PPDG, $9.5M Digital Divide Workshops Paul Avery

  34. Future of OSG CyberInfrastructure • OSG is a unique national infrastructure for science • Large CPU, storage and network capability crucial for science • Supporting advanced middleware • Long-term support of the Virtual Data Toolkit (new disciplines & international collaborations • OSG currently supported by a “patchwork” of projects • Collaborating projects, separately funded • Developing workplan for long-term support • Maturing, hardening facility • Extending facility to lower barriers to participation • Oct. 27 presentation to DOE and NSF Paul Avery

  35. OSG Consortium Meeting: Jan 23-25 • University of Florida (Gainesville) • About 100 – 120 people expected • Funding agency invitees • Schedule • Monday Morning: Applications plenary (rapporteurs) • Monday Afternoon: Partner Grid projects plenary • Tuesday Morning: Parallel • Tuesday Afternoon: Plenary • Wednesday Morning: Parallel • Wednesday Afternoon: Plenary • Thursday: OSG Council meeting Taiwan, S.Korea Sao Paolo Paul Avery

  36. Disaster Planning Emergency Response Paul Avery

  37. Grids and Disaster Planning / Emergency Response • Inspired by recent events • Dec. 2004 tsunami in Indonesia • Aug. 2005 Katrina hurricane and subsequent flooding • (Quite different time scales!) • Connection of DP/ER to Grids • Resources to simulate detailed physical & human consequences of disasters • Priority pooling of resources for a societal good • In principle, a resilient distributed resource • Ensemble approach well suited to Grid/cluster computing • E.g., given a storm’s parameters & errors, bracket likely outcomes • Huge number of jobs required • Embarrassingly parallel Paul Avery

  38. DP/ER Scenarios • Simulating physical scenarios • Hurricanes, storm surges, floods, forest fires • Pollutant dispersal: chemical, oil, biological and nuclear spills • Disease epidemics • Earthquakes, tsunamis • Nuclear attacks • Loss of network nexus points (deliberate or side effect) • Astronomical impacts • Simulating human responses to these situations • Roadways, evacuations, availability of resources • Detailed models (geography, transportation, cities, institutions) • Coupling human response models to specific physical scenarios • Other possibilities • “Evacuation” of important data to safe storage Paul Avery

  39. DP/ER and Grids: Some Implications • DP/ER scenarios are not equally amenable to Grid approach • E.g., tsunami vs hurricane-induced flooding • Specialized Grids can be envisioned for very short response times • But all can be simulated “offline” by researchers • Other “longer term” scenarios • ER is an extreme example of priority computing • Priority use of IT resources is common (conferences, etc) • Is ER priority computing different in principle? • Other implications • Requires long-term engagement with DP/ER research communities • (Atmospheric, ocean, coastal ocean, social/behavioral, economic) • Specific communities with specific applications to execute • Digital Divide: resources to solve problems of interest to 3rd World • Forcing function for Grid standards? • Legal liabilities? Paul Avery

  40. Open Science Grid www.opensciencegrid.org Grid3 www.ivdgl.org/grid3 Virtual Data Toolkit www.griphyn.org/vdt GriPhyN www.griphyn.org iVDGL www.ivdgl.org PPDG www.ppdg.net CHEPREO www.chepreo.org UltraLight www.ultralight.org Globus www.globus.org Condor www.cs.wisc.edu/condor WLCG www.cern.ch/lcg EGEE www.eu-egee.org Grid Project References Paul Avery

  41. Extra Slides Paul Avery

  42. Grid3 Use by VOs Over 13 Months Paul Avery

  43. CMS: “Compact” Muon Solenoid Inconsequential humans Paul Avery

  44. LHC: Beyond Moore’s Law LHC CPU Requirements Moore’s Law (2000) Paul Avery

  45. Grids and Globally Distributed Teams • Non-hierarchical: Chaotic analyses + productions • Superimpose significant random data flows Paul Avery

  46. Sloan Data Galaxy cluster size distribution Sloan Digital Sky Survey (SDSS)Using Virtual Data in GriPhyN Paul Avery

  47. Cardiff AEI/Golm • The LIGO Scientific Collaboration (LSC)and the LIGO Grid • LIGO Grid: 6 US sites + 3 EU sites (Cardiff/UK, AEI/Germany) Birmingham• * LHO, LLO: LIGO observatory sites * LSC: LIGO Scientific Collaboration Paul Avery

More Related