1 / 51

Clouds, Interoperation and PRAGMA

Clouds, Interoperation and PRAGMA. Phili p M. Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer Center Calit2. Remember the Grid Promise?.

raheem
Download Presentation

Clouds, Interoperation and PRAGMA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clouds, Interoperation and PRAGMA Philip M. Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer Center Calit2

  2. Remember the Grid Promise? The Grid is an emerging infrastructure that will fundamentally change the way we think about-and use-computing. The word Grid is used by analogy with the electric power grid, which provides pervasive access to electricity and has had a dramatic impact on human capabilities and society The grid: blueprint for a new computing infrastructure, Foster, Kesselman. From Preface of first edition, Aug 1998

  3. Some Things that Happened on the Way to Cloud Computing • Web Version 1.0 (1995) • 1 Cluster on Top 500 (June 1998) • Dot Com Bust (2000) • Clusters > 50% of Top 500 (June 2004) • Web Version 2.0 (2004) • Cloud Computing (EC2 Beta - 2006) • Clusters > 80% of Top 500 (Nov. 2008)

  4. Gartner Emerging Tech 2005

  5. Gartner Emerging Tech - 2008

  6. Gartner Emerging Tech 2010

  7. What is fundamentally different about Cloud computing vs. Grid Computing • Cloud computing – You adapt the infrastructure to your application • Should be less time consuming • Grid computing – you adapt your application to the infrastructure • Generally is more time consuming • Cloud computing has a financial model that seems to work – grid never had a financial model • The Grid “Barter” economy only valid for provider-to-provider trade. Pure consumers had no bargaining power

  8. IaaS – Of Most Interest to PRAGMA Amazon EC2 Rackspace Sun GoGrid 3Tera IBM Run (virtual) computers to solve your problem, using your software

  9. Cloud Hype • “Others do all the hard work for you” • “You never have to manage hardware again” • “It’s always more efficient to outsource” • “You can have a cluster in 8 clicks of the mouse” • “It’s infinitely scalable” • …

  10. Amazon Web Services • Amazon EC2 – catalytic event in 2006 that really started cloud computing • Web Services access for • Compute (EC2) • Storage (S3, EBS) • Messaging (SQS) • Monitoring (CloudWatch) • + 20 (!) More services • “I thought this was supposed be simple”

  11. Basic EC2 Elastic Compute Cloud (EC2) Amazon Cloud Storage Copy AMI & Boot S3 – Simple Storage Service EBS – Elastic Block Store Amazon Machine Images (AMIs) • AMIs are copied from S3 and booted in EC2 to create a “running instance” • When instance is shutdown, all changes are lost • Can save as a new AMI

  12. Basic EC2 • AMI (Amazon Machine Image) is copied from S3 to EC2 for booting • Can boot multiple copies of an AMI as a “group” • Not a cluster, all running instances are independent • Clusters Instances are about $2/Hr (8 cores) ($17K/year) • If you make changes to your AMI while running and want them saved • Must repack to make a new AMI • Or use Elastic Block Store (EBS) on a per-instance basis

  13. Some Challenges in EC2 • Defining the contents of your Virtual Machine (Software Stack) • Preparing, packing, uploading image • Understanding limitations and execution model • Debugging when something goes wrong • Remembering to turn off your VM • Smallest 64-bit VM is ~$250/month running 7x24

  14. One Problem: too many choices

  15. Reality for Scientific Applications • The complete software stack is critical to proper operation • Libraries • compiler/interpreter versions • file system location • Kernel • This is the fundamental reason that the Grid is hard: my cluster is not the same environment as your cluster • Electrons are universal, software packages are not

  16. People and Science are Distributed • PRAGMA – Pacific Rim Applications and Grid Middleware Assembly • Scientists are from different countries • Data is distributed • Cyber Infrastructure to enable collaboration • When scientists are using the same software on the same data • Infrastructure is no longer in the way • It needs to be their software (not my software)

  17. PRAGMA’s Distributed Infrastructure Grid/Clouds UZH Switzerland UZH Switzerland JLU China AIST OsakaU UTsukuba Japan CNIC China KISTI KMU Korea IndianaU USA SDSC USA LZU China LZU China ASGC NCHC Taiwan HKU HongKong UoHyd India ASTI Philippines NECTEC KU Thailand CeNAT-ITCR Costa Rica HCMUT HUT IOIT-Hanoi IOIT-HCM Vietnam UValle Colombia MIMOS USM Malaysia UChile Chile MU Australia BESTGrid New Zealand 26 institutions in 17 countries/regions,23 compute sites, 10VM sites

  18. Our Goals • Enable Specialized Applications to run easily on distributed resources • Investigate Virtualization as a practical mechanism • Multiple VM Infrastructures (Xen, KVM, OpenNebula, Rocks, WebOS, EC2) • Use Geogrid Applications as a driver of the process

  19. GeoGrid Applications as Driver I am not part of GeoGrid, but PRAGMA members are!

  20. Deploy Three Different Software Stacks on the PRAGMA Cloud • QuiQuake • Simulator of ground motion map when earthquake occurs • Invoked when big earthquake occurs • HotSpot • Find high temperature area from Satellite • Run daily basis (when ASTER data arrives from NASA) • WMS server • Provides satellite images via WMS protocol • Run daily basis, but the number of requests is not stable. Source: Dr. Yoshio Tanaka, AIST, Japan

  21. Example of current configuration WMS Server QuiQuake Hot spot • Fix nodes to each application • Should be more adaptive and elastic according to the requirements. Source: Dr. Yoshio Tanaka, AIST, Japan

  22. 1st step: Adaptive resource allocation in a single system Change nodes for each application according to the situation and requirements. WMS Server QuiQuake Hot Spot Big Earthquake ! WMS Server QuiQuake Hot Spot WMS Server QuiQuake Hot spot Increase WMS requests Source: Dr. Yoshio Tanaka, AIST, Japan

  23. 2ndStep: Extend to distributed environments Terra/ASTER ALOS/PALSAR TDRS OCC (AIST) JAXA NASA ERSDAC UCSD NCHC Source: Dr. Yoshio Tanaka, AIST, Japan

  24. What are the Essential Steps • AIST/Geogrid creates their VM image • Image made available in “centralized” storage • PRAGMA sites copy Geogrid images to local clouds • Assign IP addresses • What happens if image is in KVM and site is Xen? • Modified images are booted • Geogrid infrastructure now ready to use

  25. Basic Operation • VM image authored locally, uploaded into VM-image repository (Gfarm from U. Tsukuba) • At local sites: • Image copied from repository • Local copy modified (automatic) to run on specific infrastructure • Local copy booted • For running in EC2, adapted methods automated in Rocks to modify, bundle, and upload after local copy to UCSD.

  26. VM Deployment Phase I - Manual http://goc.pragma-grid.net/mediawiki-1.16.2/index.php/Bloss%2BGeoGrid # rocks add host vm container=… # rocks set host interface subnet … # rocks set host interface ip … # rocks list host interface … # rocks list host vm … showdisks=yes # cd /state/partition1/xen/disks # wgethttp://www.apgrid.org/frontend... # gunzip geobloss.hda.gz # lomount –diskimagegeobloss.hda -partition1 /media # vi /media/boot/grub/grub.conf … # vi /media/etc/sysconfig/networkscripts/ifc… … # vi /media/etc/sysconfig/network … # vi /media/etc/resolv.conf … # vi /etc/hosts … # vi /etc/auto.home … # vi /media/root/.ssh/authorized_keys … # umount /media # rocks set host boot action=os … # Rocks start host vmgeobloss… Website Geogrid + Bloss Geogrid + Bloss VM devel server frontend Geogrid + Bloss vm-container-0-0 vm-container-0-1 vm-container-0-2 vm-container-…. VM hosting server

  27. What we learned in manual approach AIST, UCSD and NCHC met in Taiwan for 1.5 days to test in Feb 2011 • Much faster than Grid deployment of the same infrastructure • It is not too difficult to modify a Xen image and run under KVM • Nearly all of the steps could be automated • Need a better method than “put image on a website” for sharing

  28. Centralized VM Image Repository VM images depository and sharing Gfarm Client Gfarm meta-server Gfarm file server vmdb.txt Geogrid + Bloss Fmotif Nyouga QuickQuake Gfarm file server Gfarm file server Gfarm Client Gfarm file server Gfarm file server Gfarm file server

  29. Gfarm using Native tools

  30. VM Deployment Phase II - Automated http://goc.pragma-grid.net/mediawiki-1.16.2/index.php/VM_deployment_script $ vm-deploy quiquake vm-container-0-2 Quiquake Gfarm Client VM development server frontend S vm-deploy Quiquake vm-container-0-0 Fmotif vm-container-0-1 Nyouga Geogrid + Bloss vmdb.txt vm-container-0-2 Gfarm Client quiquake, xen-kvm,AIST/quiquake.img.gz,… Fmotif,kvm,NCHC/fmotif.hda.gz,… Quiquake vm-container-…. Gfarm Cloud VM hosting server

  31. Put all together Store VM images in Gfarm systems Run vm-deploy scripts at PRAGMA Sites Copy VM images on Demand from gFarm Modify/start VM instances at PRAGMA sites Manage jobs with Condor S S gFC gFC VM Image copied from gFarm VM Image copied from gFarm Condor Master gFS slave slave SDSC (USA) Rocks Xen AIST (Japan) OpenNebulaXen gFS GFARM Grid File System (Japan) S S gFC gFC AIST QuickQuake + Condor VM Image copied from gFarm VM Image copied from gFarm gFS NCHC Fmotif gFS gFS slave slave gFS UCSD Autodock + Condor NCHC (Taiwan) OpenNebula KVM IU (USA) Rocks Xen AIST Web Map Service + Condor AIST Geogrid + Bloss AIST HotSpot + Condor S S gFS gFC gFS gFC VM Image copied from gFarm VM Image copied from gFarm gFS gFS slave slave LZU (China) Rocks KVM S = VM deploy Script Osaka (Japan) Rocks Xen gFC = Grid Farm Client = Grid Farm Server gFS

  32. Moving more quickly with PRAGMA Cloud • PRAGMA 21 – Oct 2011 • 4 sites: AIST, NCHC, UCSD, and EC2 (North America) • SC’11 – Nov 2011 • New Sites: • Osaka University • Lanzhou University • Indiana University • CNIC • EC2 – Asia Pacific

  33. Condor Pool + EC2 Web Interface • 4 different private clusters • 1 EC2 Data Center • Controlled from Condor Manager in AIST, Japan

  34. Cloud Sites Integrated in Geogrid Execution Pool PRAGMA Compute Cloud JLU China AIST OsakaU Japan CNIC China IndianaU USA LZU China LZU China NCHC Taiwan SDSC USA UoHyd India ASTI Philippines MIMOS Malaysia

  35. Roles of Each Site PRAGMA+Geogrid • AIST – Application driver with natural distributed computing/people setup • NCHC – Authoring of VMs in a familiar web environment. Significant Diversity of VM infra • UCSD – Lower-level details of automating VM “fixup” and rebundling for EC2 We are all founding members of PRAGMA

  36. NCHC WebOS/Cloud Authoring Portal Users start with well-defined Base Image then add their software

  37. Getting things working in EC2 • Short Background on Rocks Clusters • Mechanisms for using Rocks to create an EC2 compatible image • Adapting methodology to support non-Rocks defined images

  38. Rocks – http:// www.rocksclusters.org • Technology transfer of commodity clustering to application scientists • Rocks is a cluster/System Configuration on a CD • Clustering software (PBS, SGE, Ganglia, Condor, … ) • Highly programmatic software configuration management • Put CDs in Raw Hardware, Drink Coffee, Have Cluster. • Extensible using “Rolls” • Large user community • Over 1PFlop of known clusters • Active user / support list of 2000+ users • Active Development • 2 software releases per year • Code Development at SDSC • Other Developers (UCSD, Univ of Tromso, External Rolls • Supports Redhat Linux, Scientific Linux, Centos and Solaris • Can build Real, Virtual, and Hybrid Combinations (2 – 1000s) Rocks Core Development NSF award #OCI-0721623

  39. Key Rocks Concepts • Define components of clusters as Logical Appliances (Compute, Web, Mgmt, Login DB, PFS Metadata, PFS Data, … ) • Share common configuration among appliances • Graph decomposition of the full cluster SW and Config • Rolls are the building blocks: reusable components (Package + Config + Subgraph) • Use installer’s (Redhat Anaconda, Solaris Jumpstart) text format to describe an appliance configuration • Walk the Rocks graph to compile this definition • Heterogeneous Hardware (Real and Virtual HW) with no additional effort

  40. A Mid-Sized Cluster Resource Includes : Computing, Database, Storage, Virtual Clusters, Login, Management Appliances Triton Resource • Large Memory PSDAF • 256 GB & 512 GB Nodes (32 core) • 8TB Total • 128 GB/sec • ~ 9TF • Shared Resource • Cluster • 16 GB/Node • 4 - 8TB Total • 256 GB/sec • ~ 20 TF x256 x28 UCSD Research Labs • Large Scale Storage • (Delivery by Mid May) • 2 PB ( 384 TB Today) • ~60 GB/sec ( 7 GB/s ) • ~ 2600 (384 Disks Now) Campus Research Network http://tritonresource.sdsc.edu

  41. What’s in YOUR cluster?

  42. How Rocks Treats Virtual Hardware • It’s just another piece of HW. • If RedHat supports it, so does Rocks • Allows mixture of real and virtual hardware in the same cluster • Because Rocks supports heterogeneous HW clusters • Re-use of all of the software configuration mechanics • E.g., a compute appliance is compute appliance, regardless of “Hardware” Virtual HW must meet minimum HW Specs • 1GB memory • 36GB Disk space* • Private-network Ethernet • + Public Network on Frontend * Not strict – EC2 images are 10GB

  43. Extended Condor Pool (Very Similar to AIST GeoGrid) Rocks Frontend Job Submit Condor Collector Scheduler Cluster Private Network (e.g. 10.1.x.n) Cloud 0 Cloud 1 Node 0 Node 1 Identical system images Node n Condor Pool with both local and cloud resources

  44. Complete Recipe Upload Image to Amazon S3 Boot AMI as an Amazon Instance Register Image as EC2 AMI 3 5 4 Rocks Frontend Kickstart Guest VM ec2_enable=true 1 VM Container Amazon EC2 Cloud Guest VM Bundle as S3 Image 2 Optional: Test and Rebuild of Image “Compiled” VM Image Disk Storage Local Hardware

  45. At the Command Line: provided by the Rocks EC2 Roll/Xen Rolls • rocks set host boot action=install compute-0-0 • rocks set host attr compute-0-0 ec2_enable true • rocks start host vm compute-0-0 • After reboot inspect, then shut down • rocks create ec2 bundle compute-0-0 • rocks upload ec2 bundle compute-0-0 <s3bucket> • ec2-register <s3bucket>/image.manifest.xml • ec2-run instances <ami>

  46. Modify to Support Non-Rocks Imagesfor PRAGMA Experiment Gfarm Upload Image to Amazon S3 Register Image as EC2 AMI Rocks Frontend 4 5 vm-deploy nyouga2 vm-container-0-20 1 VM Container Amazon EC2 Cloud Guest VM Bundle as S3 Image Boot AMI as an Amazon Instance 6 3 Disk Storage “Modified” VM Image Makeec2.sh <image file> 2 Local Hardware

  47. Observations • This is much faster than our Grid deployments • Integration of private and commercial cloud is at proof-of-principle state • Haven’t scratched the surface of when one expands into an external cloud • Networking among instances in different clouds has pitfalls (firewalls, addressing, etc) • Users can focus on the creation of their software stack

  48. Heterogenous Clouds

  49. More Information Online

  50. Revisit Cloud Hype • “Others do allsome of the hard work for you” • “You never still have to manage hardware again” • “It’s alwayssometimes more efficient to outsource” • “You can have a cluster in 8 clicks of the mouse, but it may not have your software” • “It’s infinitely scalable” • Location of data is important • Interoperability across cloud infrastructures is possible • …

More Related