170 likes | 300 Views
Campus Grids December 9, 2010 Preston Smith Purdue University. Campus Grid? What?. Fundamentally, a campus grid is an avenue for sharing computing resources A “campus” isn’t necessarily a college campus. Can be a lab, a corporation, etc.
E N D
Campus Grids December 9, 2010 Preston Smith Purdue University
Campus Grid? What? • Fundamentally, a campus grid is an avenue for sharing computing resources • A “campus” isn’t necessarily a college campus. • Can be a lab, a corporation, etc. • A campus grid allows an institution to build a cyberinfrastructure utilizing its existing investment in information technology
Who does this? • Purdue, obviously! • But, we’re not the only ones.
DiaGrid • Many of these academic institutions participate in DiaGrid • A federation of campus grids that enables sharing of cycles between institutions
Resource Sharingat Purdue • Community Clusters • As we’ve heard, community clusters allow us to combine and share HPC clusters • Cycle Scavenging • Purdue data digests says that there are 27,000 desktop computers around campus • A conservative estimate of 2 cores per computer: • 50,000+ cores sitting idle around Purdue alone! • To further share cycles and scavenge from computers around campus, we use Condor
Condor • Developed at the University of Wisconsin-Madison by a research group led by MironLivny • Began as a “hunter of idle workstations”, but has evolved into grid computing software, cluster batch systems, and workflow tools • The technology making the Campus Grid at Purdur possible • When a system is not used by it’s “owner”, Condor can run science codes on it
Community Clusters ->Condor • Backfilling on idle HPC cluster nodes • Condor runs on idle cluster nodes (nearly 30,000 cores today) when a node isn’t busy with jobs from the primary scheduler
Central Cluster Usage Condor: 15.7% PBS: 81%
Condor on Desktops • Centrally operated student labs provide~8000WindowsCPUs • Centrally supported workstations have Condor available for installation • Purdue IT is moderately centralized, but not totally • Large numbers of the 27,000 machines at Purdue aren’t operated by ITaP!
Condor in Distributed IT • Collaborate distributed IT organizations – Many colleges and departments operate over 1000 machines each • Agriculture, Computer Science, Engineering, Management, Physical Facilities, Liberal Arts, Education • Many of these departments contribute resources into the campus grid today • How we help them: • Provide preconfigured, managed packages to ease deployment burden for IT organizations (RPM, deb, .exe) • Building a campus grid is not a technology problem, but a people problem!
Unexpected Bonus! • IT Cost Reduction • University requiring $15M of cost-savings from IT over 3 years • Power reduction in IT counts towards savings • Condor can manage power on machines • If there is computation to be done, it will do it. • If not, Condor can hibernate and wake them up if work arrives A win for both scientists and accountants!
Sure, but how useful is this? • The campus grid is ideal for loosely-coupled, serial, or workflow jobs • A couple hour runtime is perfect • I know, you wonder, “who does that? Doesn’t everybody need a huge, expensive cluster for massive parallel jobs?”
Science on a Campus Grid • 2 year study of the TeraGrid revealed • 66% of all jobs were single CPU • 80% of those jobs ran for 2 hours or less • From November, 2008 to November 1, 2010 Purdue users: • Ran 35.4 million single-CPU jobs • Using 40 million hours of computer time • Average runtime of 1.35 hours
Campus Grid Science • Simulating a database of hypothetical zeolite structures • Journal of Industrial & Engineering Chemistry Research • Backbone structure of the infectious epsilon15 virus capsid revealed by electron cryomicroscopy • Nature • Reassessing the Source of Long-Period Comets • Science • Analysis of relationships between Wikipedia contributors
Summary • A campus grid is a way to create a cyberinfrastructure on an institution’s existing IT investment • A large class of work is suitable to run on opportunistic, single-CPU resources
The End Questions? http://www.rcac.purdue.edu/boilergrid