1 / 25

Distributed Grid Computing at ISIS using the Grid MP System

Distributed Grid Computing at ISIS using the Grid MP System. Tom Griffin, ISIS Facility & University of Manchester / UMIST. What do I mean by ‘Distributed Grid’?. A way of speeding up large, compute intensive tasks Break large jobs into smaller chunks

abiola
Download Presentation

Distributed Grid Computing at ISIS using the Grid MP System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST

  2. What do I mean by ‘Distributed Grid’? • A way of speeding up large, compute intensive tasks • Break large jobs into smaller chunks • Send these chunks out to (distributed) machines • Distributed machines do the work • Collate and merge the results

  3. Spare Cycles Concept • Typical PC usage is about 10% • Most PCs not used at all after 5pm • Even with ‘heavily used’ (Outlook, Word, IE) PCs, the CPU is still grossly underutilised • Everyone wants a fast PC! • Can we use (“steal?”) their unused CPU cycles? • SETI@home, World Community Grid (www.worldcommunitygrid.org)

  4. Possible Software Implementations • Toolkit e.g. COSM • Low level toolkit – source code level integration • So time consuming work, for each application • Entropia DC Grid • Trial run at ISIS two years ago. Some success • Company bought out and in limbo (?) • United Devices Grid MP • What we’re currently using • Quite expensive • Condor • Free (academic research project) • In our experience 2 yrs ago, not reliable with Windows

  5. The United Devices System • Server hardware • We use two, dual Xeon servers + 280 client licenses • Could (will) easily cope with more clients • Software • Servers run RedHat Linux Advanced Server / DB2 • Clients available for Windows, Linux, SPARCs and Macs • Programming • MGSI – Web Services interface – XML, SOAP • Accessed with C++ and Java classes etc • Management Console • Web browser based • Can manage services, jobs, devices etc

  6. Visual Introduction to the Grid

  7. Installing and Deploying the System • Servers • Complete set up in under 3 hours • Virtually self maintaining • Clients • Windows only so far • MSI Installer • approx 20 seconds • SMS • MP Agent User • Install to other OSs looks straightforward

  8. Suitable / Unsuitable Applications • CPU Intensive • Low to moderate memory use • Not too much file output • Coarse grained • Command line / batch driven • Licensing issues?

  9. Objects within the Grid • Program • Job • Jobstep • Data Set • Data • Workunit • Client

  10. How to write Grid Programs • Fairly easy to write • Interface to grid via Web Services • So far used: C++, Java, Perl, C# (any .Net language) • Think about how to split your data and merge results • Wrap and upload your executable • Write the application service • Pre and Post processing • Use the Grid

  11. Wrapping Your Executable • Executable + any dlls etc • Standard data files • Compression • Encryption • Capture screen output • Set Environmental Variables • Command Line

  12. Application Service • Pre-processing • Partition data • Package data partitions • Log in to the Grid server • Create a Job and Job Step • Create a Data Set • Create Datas and upload data packages • Create Workunits • Set the Job running • Post-Processing • Retrieve results • Merge results

  13. Example Application: HMC Hybrid Monte Carlo method of global optimisation to solve molecular crystal structures from powder diffraction data • Parametric problem • e.g. vary parameters such as acceptance ratio, to scan a 3D grid • each run completely independent of any other • Send one run to each machine on the grid

  14. Running HMC on the Grid • Unchanged exe • User edits or creates an appropriate settings file • User runs “my” HMC submit program • Splits bat file into one line per machine • Uploads chunks to the Grid server • Grid server distributes Workunits to clients • User monitors the job with their web browser • Clients return results to the Grid server • User runs HMC retrieve program • Downloads results

  15. More on HMC Submit… • Split the batch file into lines • Create a dataset (to hold our data) • Package data (command line and zmatrix files etc) • Associate data with dataset • Upload data packages to Grid server • Create Workunits from the dataset • Create a Job to hold the Workunits

  16. Yet more… • Program written in C++ • Uses C++ classes to ‘hide’ SOAP calls dsHMC.data_set_gid = mgsi->createDataSet(dsHMC); ud::uuid MgsiClient::createDataSet(const DataSet &data_set) throw(MgsiException) { SOAPMethod request("createDataSet", "urn://ud.com/mgsi"); request.AddParameter("authkey") << authkey; request.AddParameter("data_set") << data_set; const SOAPResponse &response = call(request, const_cast<SOAPParameter *>(&request.GetParameter((size_t)0))); ud::uuid retval; response.GetReturnValue() >> retval; return retval; } • Auto generated by ‘Axis C++’ from WSDL file • Also a C++ HTTPs file transfer program

  17. Performance • Linear: 50 devices ≈ 50 times faster • Affected by size of Workunit • Overhead for distribution is ≈ 1minute • Risk of device being switched off

  18. Example 2: MD Manager • Molecular Dynamics simulation(s) • Program written in C# • Generated from WSDL (and modified) C# classes to hide SOAP • Wrote generic C# HTTP file transfer classes • ‘Interactive’ program • Typical runtime ~10 hours per single simulation • Need to investigate ‘grids’ of simulations

  19. A B C A B C D E F D E F G H I G H I Temperature  Pressure  • But in 3-dimensions • and with ‘ordering restrictions’ • plus a post processing stage

  20. Who Else Does This? • Johnson & Johnson • Novartis • GSK • National Physical Laboratory • Accelrys • IBM • World Community Grid • http://www.worldcommunitygrid.org/ • Currently the Human Proteome Folding project

  21. Problems Encountered & Support • Technical Problems • Mercifully few! • Main issue has been RAM thresholding (now resolved) • Encryption of certain files causes a problem • Support • So far been very good • Responses to queries always next day (time difference) and always insightful • Ease of setup / maintenance • Installed and fullyrunning in ~3 hours • Next to no maintenance required, other than backup

  22. ‘Social’ Issues • Easiest thing to blame • Too abstract for some users (no big box) • Stealing my cycles • Expansion leads to political problems

  23. Completed Funded Seeking funding Future Developments - Expansion • Expansion • Proposal accepted for an additional 400 licenses • Giving us a total of 480 • Change in licensing model $50k $45k • Bottom Line: Costs • Setup, server licenses, 80 client licenses + support – $18k – CMSD $50k • Total ≈ $250k $83k

  24. Summary • Grid is here and running smoothly • Easy to use • Excellent performance • Vast amount of compute power available • Future looks good

More Related