260 likes | 427 Views
Grids: TACC Case Study. Ashok Adiga, Ph.D. Distributed & Grid Computing Group Texas Advanced Computing Center The University of Texas at Austin adiga@tacc.utexas.edu (512) 471-8196. Outline. Overview of TACC Grid Computing Activities Building a Campus Grid – UT Grid
E N D
Grids: TACC Case Study Ashok Adiga, Ph.D. Distributed & Grid Computing Group Texas Advanced Computing Center The University of Texas at Austin adiga@tacc.utexas.edu (512) 471-8196 TEXAS ADVANCED COMPUTING CENTER
Outline • Overview of TACC Grid Computing Activities • Building a Campus Grid – UT Grid • Addressing common Use Cases • Scheduling & Flow • Grid Portals • Conclusions 2
TACC Grid Program • Building Grids at TACC • Campus Grid (UT Grid) • State Grid (TIGRE) • National Grid (ETF) • Grid Hardware Resources • Wide range of hardware resources available to research community at UT and partners • Grid Software Resources • NMI Components, NPACKage • User Portals, GridPort • Job schedulers: LSF Multicluster, Community Scheduling Framework • United Devices (desktop grids) • Significantly leveraging NMI Components and experience 3
TACC Resources: Providing Comprehensive, Balanced Capabilities 4
TeraGrid (National) • NSF Extensible Terascale Facility (ETF) project • build and deploy the world's largest, fastest, distributed computational infrastructure for general scientific research • Current Members: • San Diego Supercomputing Center, NCSA, Argonne National Laboratory, Pittsburg Supercomputing Center, California Institute of Technology • Currently has 40 Gbps backbone with hubs in Los Angeles & Chicago • 3 New Members added in September 2003 • The University of Texas (led by TACC) • Oakridge National Labs • Indiana U/Purdue U 5
Teragrid (National) • UT awarded $3.2M to join NSF ETF in September 2003 • Establish 10 Gbps network connection to ETF backbone • Provide access to high-end computers capable of 6.2 teraflops, a new terascale visualization system, and a 2.8-petabyte mass storage system • Provide access to geoscience data collections used in environmental, geological climate and biological research: • high-resolution digital terrain data • worldwide hydrological data • global gravity data • high-resolution X-ray computed tomography data • Current software stack includes: Globus (GSI, GRAM, GridFTP), MPICH-G2, Condor-G, GPT, MyProxy, SRB 6
TIGRE (State-wide Grid) • Texas Internet Grid for Research and Education • computational grid to integrate computing & storage systems, databases, visualization laboratories and displays, and instruments and sensors across Texas. • Current TIGRE particpants: • Rice • Texas A&M • Texas Tech University • Univ of Houston • Univ of Texas at Austin (TACC) • Grid software for TIGRE Testbed: • Globus, MPICH-G2, NWS, SRB • Other local packages must be integrated • Goal: track NMI GRIDS 7
UT Grid (Campus Grid) • Mission: integrate and simplify the usage of the diverse computational, storage, visualization, data, and instrument resources of UT to facilitate new, powerful paradigms for research and education. • UT Austin Participants: • Texas Advanced Computing Center (TACC) • Institute for Computational Engineering & Sciences (ICES) • Information Technology Services (ITS) • Center for Instructional Technologies (CIT) • College of Engineering (COE) 8
What is a Campus Grid? • Important differences from enterprise grids • Researchers generally more independent than in company with tight focus on mission, profits • No central IT group governs researchers’ systems • paid for out of grants, so distributed authority • owners of PCs, clusters have total reconfigure and participate if willing • Lots of heterogeneity; lots of low-cost, poorly-supported systems • Accounting potentially less important • Focus on increasing research effectiveness allows tackling problems early (scheduling, workflow, etc.) 9
UT Grid: Approach • Unique characteristics present opportunities • Some campus researchers want to be on bleeding edge, unlike commercial enterprises • TACC provides high-end systems that researchers require • Campus users have trust relationships initially with TACC, but not each other • How to build a campus grid: • Build a hub & spoke grid first • Address both productivity and grid R&D 10
UT Grid: Logical View • Integrate distributed TACCresources first (Globus, LSF, NWS,SRB, United Devices, GridPort) TACC HPC, Vis, Storage (actually spread across two campuses) 11
UT Grid: Logical View • Next add other UTresources in one bldg.as spoke usingsame tools andprocedures TACC HPC, Vis, Storage ICES Cluster ICES Cluster ICES Cluster 12
UT Grid: Logical View • Next add other UTresources in one bldg.as spoke usingsame tools andprocedures GEO Cluster TACC HPC, Vis, Storage GEO Cluster ICES Cluster ICES Cluster ICES Cluster 13
UT Grid: Logical View • Next add other UTresources in one bldg.as spoke usingsame tools andprocedures BIO Cluster BIO Cluster PGE Cluster GEO Cluster TACC HPC, Vis, Storage PGE Cluster GEO Cluster ICES Cluster ICES Cluster ICES Cluster 14
UT Grid: Logical View • Finally negotiateconnectionsbetween spokesfor willing participantsto develop a P2P grid. BIO Cluster BIO Cluster PGE Cluster GEO Cluster TACC HPC, Vis, Storage PGE Cluster GEO Cluster ICES Cluster ICES Cluster ICES Cluster 15
UT Grid: Physical View Ext nets Research campus NOC GAATN CMS NOC Switch TACC Storage TACC PWR4 PGE ACES TACC Cluster Switch TACC Cluster PGE Cluster Switch TACC Vis ICES Cluster PGE Cluster PGE Cluster ICES Cluster Main campus 16
UT Grid: Focus • Address users interested only in increased productivity • Some users just want to be more productive with TACC resources and their own (and others): scheduling throughput, data collections, workflow • Install ‘lowest common denominator’ software only on TACC production resources, user spokes for productivity: Globus 2.x, GridPort 2.x, WebSphere, LSF MultiCluster, SRB, NWS, United Devices, etc. 17
UT Grid: Focus • Address users interested in grid R&D issues • Some users want to conduct grid-related R&D:grid scheduling, performance modeling, meta-applications, P2P storage, etc. • Also install bleeding-edge software to support grid R&D on TACC testbed and willing spoke systems: Globus 3.0 and other OGSA software, GridPort 3.x, Common Scheduling Framework, etc. 18
Scheduling & Workflow • Use Case: Researcher wants to run climate modeling job on a compute cluster and view results using a specified visualization resource • Grid middleware requirements: • Schedule job to “best” compute cluster • Forward results to specified visualization resource • Support advanced reservations on vis. resource • Currently solved using LSF Multicluster & Globus (GSI, GridFTP, GRAM) • Evaluating CSF meta-scheduler for future use 19
What is CSF? • CSF (Community Scheduler Framework): • Open source meta-scheduler framework contributed by Platform Computing to Globus for possible inclusion in the Globus Toolkit • Developed with the latest version of OGSI – grid guideline being developed with Global Grid Forum (OGSA) • Extensible framework for implementing meta-schedulers • Supports heterogeneous workload execution software (LSF, PBS, SGE) • Negotiate advanced reservations (WS-agreement) • Select best resource for a given job based on specified policies • Provides standard API to submit and manage jobs 20
VO A GT3.0 GT3.0 Queuing Service Queuing Service CA GT3.0 GT3.0 RM Adapter for LSF RM Adapter for LSF Job Service Job Service Reservation Service Reservation Service GT3.0 RM Adapter for PBS Example CSF Configuration VO B CA LSF PBS 21
Grid Portals • Use Case: Researcher logs on using a single grid portal account which enables her to • Be authenticated across all resources on the grid • Submit and manage job sequences on the entire grid • View account allocations and usage • View current status of all grid resources • Transfer files between grid resources • GridPort provides base services used to create customized portals (e.g. HotPages). Technologies: • Security: GSI, SSH, MyProxy • Job Execution: GRAM Gatekeeper • Information Services: MDS, NWS, Custom information scripts • File Management: GridFTP 22
GridPort Application Portals • UT/Texas Grids: • http://gridport.tacc.utexas.edu • http://tigre.hipcat.net • NPACI/PACI/TeraGrid HotPages (also @PACI/NCSA ) • https://hotpage.npaci.edu • http://hotpage.teragrid.org • https://hotpage.paci.org • Telescience/BIRN (Biomedical Informatics Research Network) • https://gridport.npaci.edu/Telescience • DOE Fusion Grid Portal • Will use GridPort based portal to run scheduling experiments using portals and CSF at upcoming Supercomputing 2003 • Contributing and founding member of NMI Portals Project: • Open Grid Computing Environments (OGCE) 24
Conclusions • Grid technologies progressing & improving but still ‘raw’ • Cautious outreach to campus community • UT campus grid under construction, working with beta users now • Computational Science problems have not changed: • Users want easier tools, familiar user environments (e.g. command line) or easy portals • Workflow appears to be desirable tool: • GridFlow/GridSteer Project under way • Working with advanced file mgmt and scheduling to automate distributed tasks 25
TACC Grid Computing Activities Participants • Participants include most of the TACC Distributed & Grid Computing Group: • Ashok Adiga • Jay Boisseau • Maytal Dahan • Eric Roberts • Akhil Seth • Mary Thomas • Tomislav Urban • David Walling • As of Dec. 1, Edward Walker (formerly of Platform Computing) 26