1 / 34

Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances

Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances. Mason Katz San Diego Supercomputer Center IEEE Cluster 2002. Outline . Problem definition What is so hard about clusters? Distinction between Software Packages (bits)

erika
Download Presentation

Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances Mason Katz San Diego Supercomputer Center IEEE Cluster 2002

  2. Outline • Problem definition • What is so hard about clusters? • Distinction between • Software Packages (bits) • System Configuration (functionality and state) • Programmatic software installation with: • XML, SQL, HTTP, Kickstart • Future Work San Diego Supercomputer Center

  3. Build this cluster • Build a 128 node cluster • Known configuration • Consistent configuration • Repeatable configuration • Do this in an afternoon • Problems • How to install software? • How to configure software? • We manage clusters with (re)installation • So we care a lot about this problem • Other strategies still must solve this San Diego Supercomputer Center

  4. The Myth of the Homogeneous COTS Cluster • Hardware is not homogeneous • Different chipset revisions • Chipset of the day (e.g. Linksys Ethernet cards) • Different disk sizes (e.g. changing sector sizes) • Vendors do not know this is happening! • Entropy happens • Hardware components fail • Cannot replace with the same components past a single Moore cycle • A Cluster is not just compute nodes (appliances) • Fileserver Nodes • Management Nodes • Login Nodes San Diego Supercomputer Center

  5. What Heterogeneity Means • Hardware • Cannot blindly replicate machine software • AKA system imaging / disk cloning • Requires patching the system after cloning • Need to manage system software at a higher level • Software • Subsets of a cluster have unique software configuration • One “golden image” cannot build a cluster • Multiple images replicate common configuration • Need to manage system software at a higher level San Diego Supercomputer Center

  6. Description Based Software Installation

  7. Packages vs. Configuration Collection of all possible software packages (AKA Distribution) Descriptive information to configure a node Kickstart file RPMs Appliances Compute Node IO Server Web Server San Diego Supercomputer Center

  8. Software Packages Collection of all possible software packages (AKA Distribution) Descriptive information to configure a node Kickstart file RPMs Appliances Compute Node IO Server Web Server San Diego Supercomputer Center

  9. System Configuration Collection of all possible software packages (AKA Distribution) Descriptive information to configure a node Kickstart file RPMs Appliances Compute Node IO Server Web Server San Diego Supercomputer Center

  10. Setup & Packages (20%) cdrom zerombr yes bootloader --location mbr --useLilo skipx auth --useshadow --enablemd5 clearpart --all part /boot --size 128 part swap --size 128 part / --size 4096 part /export --size 1 --grow lang en_US langsupport --default en_US keyboard us mouse genericps/2 timezone --utc GMT rootpw --iscrypted nrDq4Vb42jjQ. text install reboot %packages @Base @Emacs @GNOME Post Configuration (80%) %post cat > /etc/nsswitch.conf << 'EOF' passwd: files shadow: files group: files hosts: files dns bootparams: files ethers: files EOF cat > /etc/ntp.conf << 'EOF' server ntp.ucsd.edu server 127.127.1.1 fudge 127.127.1.1 stratum 10 authenticate no driftfile /etc/ntp/drift EOF /bin/mkdir -p /etc/ntp cat > /etc/ntp/step-tickers << 'EOF' ntp.ucsd.edu EOF /usr/sbin/ntpdate ntp.ucsd.edu /sbin/hwclock --systohc What is a Kickstart File? San Diego Supercomputer Center

  11. Issues • High level description of software installation • List of packages (RPMs) • System configuration (network, disk, accounts, …) • Post installation scripts • De facto standard for Linux • Single ASCII file • Simple, clean, and portable • Installer can handle simple hardware differences • Monolithic • No macro language (as of RedHat 7.3 this is changing) • Differences require forking (and code replication) • Cut-and-Paste is not a code re-use model San Diego Supercomputer Center

  12. XML Kickstart

  13. It looks something like this San Diego Supercomputer Center

  14. Implementation • Nodes • Single purpose modules • Kickstart file snippets (XML tags map to kickstart commands) • Over 100 node files in Rocks • Graph • Defines interconnections for nodes • Think OOP or dependencies (class, #include) • A single default graph file in Rocks • Macros • SQL Database holds site and node specific state • Node files may contain <var name=“state”/> tags San Diego Supercomputer Center

  15. Composition • Aggregate Functionality • Scripting • IsA perl-development • IsA python-development • IsA tcl-development San Diego Supercomputer Center

  16. Functional Differences • Specify only the deltas • Desktop IsA • Standalone • Laptop IsA • Standalone • Pcmcia San Diego Supercomputer Center

  17. Architecture Differences • Conditional inheritance • Annotate edges with target architectures • if i386 • Base IsA lilo • if ia64 • Base IsA elilo San Diego Supercomputer Center

  18. Putting it all together - “Complete” Appliances (compute, NFS, frontend, desktop, …) - Some key shared configuration nodes (slave-node, node, base) San Diego Supercomputer Center

  19. Sample Node File <?xml version="1.0" standalone="no"?> <!DOCTYPE kickstart SYSTEM "@KICKSTART_DTD@" [<!ENTITY ssh "openssh">]> <kickstart> <description> Enable SSH </description> <package>&ssh;</package> <package>&ssh;-clients</package> <package>&ssh;-server</package> <package>&ssh;-askpass</package> <post> cat &gt; /etc/ssh/ssh_config &lt;&lt; 'EOF’ <!-- default client setup --> Host * ForwardX11 yes ForwardAgent yes EOF chmod o+rx /root mkdir /root/.ssh chmod o+rx /root/.ssh </post> </kickstart>> San Diego Supercomputer Center

  20. Sample Graph File <?xml version="1.0" standalone="no"?> <!DOCTYPE kickstart SYSTEM "@GRAPH_DTD@"> <graph> <description> Default Graph for NPACI Rocks. </description> <edge from="base" to="scripting"/> <edge from="base" to="ssh"/> <edge from="base" to="ssl"/> <edge from="base" to="lilo" arch="i386"/> <edge from="base" to="elilo" arch="ia64"/> … <edge from="node" to="base" weight="80"/> <edge from="node" to="accounting"/> <edge from="slave-node" to="node"/> <edge from="slave-node" to="nis-client"/> <edge from="slave-node" to="autofs-client"/> <edge from="slave-node" to="dhcp-client"/> <edge from="slave-node" to="snmp-server"/> <edge from="slave-node" to="node-certs"/> <edge from="compute" to="slave-node"/> <edge from="compute" to="usher-server"/> <edge from="master-node" to="node"/> <edge from="master-node" to="x11"/> <edge from="master-node" to="usher-client"/> </graph> San Diego Supercomputer Center

  21. Cluster SQL Database

  22. Nodes and Groups Nodes Table Memberships Table San Diego Supercomputer Center

  23. Groups and Appliances Memberships Table Appliances Table San Diego Supercomputer Center

  24. Simple key - value pairs • Used to configure DHCP and to customize appliance kickstart files San Diego Supercomputer Center

  25. Putting it together

  26. Space-Time and HTTP Node Appliances Frontends/Servers DHCP IP + Kickstart URL Kickstart RQST Generate File kpp SQL DB Request Package Serve Packages kgen Install Package • HTTP: • Kickstart URL (Generator) can be anywhere • Package Server can be (a different) anywhere Post Config Reboot San Diego Supercomputer Center

  27. Practice

  28. 256 Node Scaling • Attempt a TOP 500 Run on a two fused 128 node PIII (1GHz, 1GB mem) clusters • 100 Mbit ethernet, Gigabit to frontend. • Myrinet 2000. 128 port switch on each cluster • Questions • What LINPACK performance could we get? • Would Rocks scale to 256 nodes? • Could we set up/teardown and run benchmarks in the allotted 48 hours? • SDSC’s Teragrid Itanium2 system is about this size San Diego Supercomputer Center

  29. Setup New Frontend • Fri Night: Built new frontend. Physical rewiring of Myrinet, added Ethernet switch. • Sat: Initial LINPACK runs, and debugging hardware failures, 240 node Myri run. • Sun: Submitted 256 Ethernet run, re-partitioned clusters, complete re-installation (40 min) 8 Cross Connects (Myrinet) 128 nodes (120 on Myrinet) 128 nodes (120 on Myrinet) San Diego Supercomputer Center

  30. Some Results 240 Dual PIII (1Ghz, 1GB) - Myrinet • 285 GFlops • 59.5% Peak • Over 22 hours of continuous computing San Diego Supercomputer Center

  31. Installation, Reboot, Performance • < 15 minutes to reinstall 32 node subcluster (rebuilt myri driver) • 2.3min for 128 node reboot 32 Node Re-Install Start Finsish Reboot Start HPL San Diego Supercomputer Center

  32. Future Work • Other backend targets • Solaris Jumpstart • Windows Installation • Supporting on-the-fly system patching • Cfengine approach • But using the XML graph for programmability • Traversal order • Subtleties with order of evaluation for XML nodes • Ordering requirements != Code reuse requirements • Dynamic cluster re-configuration • Node re-targets appliance type according to system need • Autonomous clusters? San Diego Supercomputer Center

  33. Summary • Installation/Customization is done in a straightforward programmatic way • Leverages existing standard technologies • Scaling is excellent • HTTP is used as a transport for reliability/performance • Configuration Server does not have to be in the cluster • Package Server does not have to be in the cluster • (Sounds grid-like) San Diego Supercomputer Center

  34. www.rocksclusters.org San Diego Supercomputer Center

More Related