530 likes | 665 Views
Grid Infrastructure. Eddie.Aronovich@cs.tau.ac.il. What is it ?. SERVERS. Clients. IT all about IT. Hardware utilization. SOA & Web services. Decompose processing into services Each service works independently Main components: Universal Description, Discovery and Integration
E N D
Grid Infrastructure Eddie.Aronovich@cs.tau.ac.il
What is it ? SERVERS Clients Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
IT all about IT Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Hardware utilization Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
SOA & Web services Decompose processing into services Each service works independently Main components: • Universal Description, Discovery and Integration • Simple Object Access Protocol • Web Services Description Language W3C standard Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
THE WORLD NEEDS ONLY FIVE COMPUTERS(Thomas J. Watson) Google grid Microsoft's live.com Yahoo! Amazon.com eBay Salesforce.com Well, that's O(5) ;) Greg Matter (http://blogs.sun.com/Gregp/entry/the_world_needs_only_five) Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Scaling Scale-up • Add more resources within the system • Does not requires changes in the applications • Limited extension • Singe point of failure Scape-out • Add more systems • Architecture dependent (needs change of code) • Economically Howto ? • Split the operation into groups • Perform each group on a different machine Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
How fast can parallelization be ? Let: • α be the proportion of the process that can not be parallelized. • P – number of processors • S – System speedup Amdhals law: S = 1 / (α + (1- α ) / P ) Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Cluster types High availability • Active-Active • Active-Passive • Heart beat Load Balancing Cluster • Round robin (weighted/non-weighted) • System status aware (session, cpu load, etc) Compute cluster • Queuing system (condor, hadoop, open-pbs, LSF, etc.) • Single system image (ScaleMP, SSI, Mosix, nomad,etc.) Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Condor script ################# # Sample script # ################# Executable = /bin/hostname when_to_transfer_output = ON_EXIT_OR_EVICT Log = {file name}.log Error = err.$(Process) Output = out.$(Process) Requirements = substr(Machine,0,4)=="dopp" && ARCH=="X86_64" Arguments = +-u notification = Complete Universe = VANILLA Queue 10
From a single PC to a Grid Farm of PCs Enterprise grid: Mutualization of resources in a company Volunteer computing: CPU cycles made available by PC owners Grid infrastructure: Internet + disk and storage resources + services for information management ( data collection, transfer and analysis) Examples: Seti@home Africa@home Example: EGEE
Batch to On-Line scale Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Key Cloud Services Attributes Off-Site, Thirds-party provider Access via Internet Minimal/no IT skills required to “implement” Provisioning - self-service requesting; near real-time deployment; dynamic & fine-grained scaling Fine-grained usage-based pricing model UI - browser and successors Web services APIs as System Interface Shared resources/common versions Source: IDC, Sep 2008
What is “Grid” Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
What is Grid Computing ? Definition is not widely agreed Foster & Kesselman: Computing resources are not administered centrally. Open standards are used. Non-trivial quality of service is achieved. Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Other definitions "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations." (Plaszczak/Wellner) "a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements“ (Buyya ) "a service for sharing computer power and data storage capacity over the Internet." (CERN) Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Institute A Institute C Institute B Institute D Institute E Institute F Virtual Organization • What’s a VO? • People in different organisations seeking to cooperate and share resources across their organisational boundaries • Why establish a Grid? • Share data • Pool computers • Collaborate • The initial vision: “The Grid” • The present reality: Many “grids” • Each grid is an infrastructure enabling one or more “virtual organisations” to share computing resources VO1 VO2 Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Mobile Access G R I D M I D D L E W A R E Supercomputer, PC-Cluster Workstation Data-storage, Sensors, Experiments Visualising Internet, networks The Grid Metaphor Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Stand alone computer Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Stand alone computer Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Stand alone computer Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
Replica Catalogue Input “sandbox” “User interface” DataSets info Output “sandbox” Resource Broker SE & CE info Job Submit Event Author. &Authen. Input “sandbox” + Broker Info Job Query Output “sandbox” Publish Job Status Storage Element Logging & Book-keeping Computing Element Job Status Middleware components – The batch approach Information Service Eddie Aronovich – Operating System course (TAU CS, Jan 2009)
UI RB node Replica Location Server Network Server Workload Manager Inform. Service Job Contr. Characts. & status Computing Element Storage Element
Job Status UI RB node submitted Replica Location Server Network Server Workload Manager Inform. Service UI: allows users to access the functionalities of the WMS (via command line, GUI, C++ and Java APIs) Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
edg-job-submit myjob.jdl Myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000; Rank = other.GlueCEStateFreeCPUs; Job Status UI RB node submitted Replica Location Server Network Server Workload Manager Inform. Service Job Contr. - CondorG CE characts & status SE characts & status Job Description Language (JDL) to specify job characteristics and requirements Computing Element Storage Element
submitted waiting UI NS: network daemon responsible for accepting incoming requests RB node Job Status Replica Location Server Network Server Job Input Sandbox files Workload Manager Inform. Service RB storage Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI RB node Job Status Job submission Replica Location Server Network Server Job Workload manager Inform. Service RB storage WM: acts to satisfy the request Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Job Status submitted waiting UI RB node Job submission Replica Location Server Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage Where must this job be executed ? Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI RB node Job Status Job submission Matchmaker: responsible to find the “best” CE for a job Replica Location Server Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
Where are (which SEs) the needed data ? submitted waiting UI RB node Job Status Job submission Replica Location Server Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage What is the status of the Grid ? Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI RB node Job Status Job submission Replica Location Server Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage CE choice Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI RB node Job Status Job submission Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Adapter Job Contr. - CondorG Job Adapter: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, PFN, etc.) CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI ready RB node Job Status Job submission Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Job Contr. CE characts & status Job Controller: responsible for the actual job management operations (done via CondorG) SE characts & status Computing Element Storage Element
submitted waiting UI ready scheduled RB node Job Status Job submission Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG CE characts & status SE characts & status Job Computing Element Storage Element
“Compute element” – reminder! Job request I.S. Logging Logging Info system Globus gatekeeper gridmapfile Grid gate node Local resource management system:Condor / PBS / LSF master Homogeneous set of worker nodes
submitted waiting UI ready scheduled running Job RB node Job Status Job submission Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Input Sandbox files “Grid enabled” data transfers/ accesses Storage Element Computing Element
submitted waiting UI ready scheduled running done RB node Job Status Job submission Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Output Sandbox files Computing Element Storage Element
submitted waiting UI ready scheduled running done RB node Job Status Job submission edg-job-get-output <dg-job-id> Replica Location Server Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Computing Element Storage Element
UI RB node Job Status Job submission submitted Replica Location Server Network Server waiting RB storage ready Workload Manager Output Sandbox files Inform. Service scheduled Job Contr. - CondorG running done cleared Computing Element Storage Element
UI RB node Job monitoring edg-job-status <dg-job-id> edg-job-get-logging-info <dg-job-id> Network Server LB: receives and stores job events; processes corresponding job status Workload Manager Job status Logging & Bookkeeping Job Contr. - CondorG Log Monitor Log of job events LM: parses CondorG log file (where CondorG logs info about jobs) and notifies LB Computing Element
Approaches to Security: 1 The Poor Security House Grid Operation and Security by Eddie Aronovich, Mar 2008
Approaches to Security: 2 The Paranoid Security House Grid Operation and Security by Eddie Aronovich, Mar 2008
Approaches to Security: 3 The Realistic Security House Grid Operation and Security by Eddie Aronovich, Mar 2008
Mapping certificate to local user • Site use local accounting system • Pool of users dedicated for the Grid • Each user is mapped using gridmap file or VOMS • Mapping can implement local policy on external users Grid Operation and Security by Eddie Aronovich, Mar 2008
Cert Independent Scotland ID Certificate Request User send public key to CA along with proof of identity. User generatespublic/privatekey pair. CA confirms identity, signs certificate and sends back to user. CertificateRequest Public Key Public Certificate Authority Private Key encrypted on local disk slide based on presentation given by Carl Kesselman at GGF Summer School 2004 Grid Operation and Security by Eddie Aronovich, Mar 2008
Name Issuer: CA Public Key Signature Inside the Certificate • Standard (X.509) defined format. • User identification (e.g. full name). • Users Public key. • A “signature” from a CA created by encoding a unique string (a hash) generated from the users identification, users public key and the name of the CA. The signature is encoded using the CA’s private key. This has the effect of: • Proving that the certificate came from the CA. • Vouching for the users identification. • Vouching for the binding of the users public key to their identification. Grid Operation and Security by Eddie Aronovich, Mar 2008
A’s certificate Verify CA signature Random phrase Encrypt with A’ s private key Encrypted phrase Decrypt with A’ s public key Compare with original phrase Mutual Authentication • A sends their certificate; • B verifies signature in A’s certificate; • B sends to A a challenge string; • A encrypts the challenge string with his private key; • A sends encrypted challenge to B • B uses A’s public key to decrypt the challenge. • B compares the decrypted string with the original challenge • If they match, B verified A’s identity and A can not repudiate it. B A Grid Operation and Security by Eddie Aronovich, Mar 2008
Proxy certificate • Avoid passphrase re-enter by creating a proxy • Proxy consists of a new certificate and a private key • Proxy certificate contains the owner's identity (modified) • Remote party receives proxy's certificate (signed by the owner), and owner's certificate. • Proxy certificate is life-time limited • Chain of trust from the CA to proxy through the owner Grid Operation and Security by Eddie Aronovich, Mar 2008