1 / 59

"Grid middleware is easy to install, configure, secure, debug and manage - across multiple sites"

"Grid middleware is easy to install, configure, secure, debug and manage - across multiple sites". "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) ( Full list of project members ) Paul Brebner University College London

afia
Download Presentation

"Grid middleware is easy to install, configure, secure, debug and manage - across multiple sites"

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. "Grid middleware is easy to install, configure, secure, debug and manage - across multiple sites" "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members) Paul Brebner University College London P.Brebner@cs.ucl.ac.uk

  2. Grid Complexity – The Grid will be BIG

  3. Grid Complexity - growing

  4. Grid Complexity – built on the internet

  5. Grid Complexity – but more complex

  6. Grid Simplicity – Start with something simple • OGSA • OGSI • GT3.2 – exemplar of a Grid SOA • Initially evaluate installation, configuration, and security • Then performance and scalability, deployment, architectural choices, etc.

  7. Grid Realism – But realistic test-bed • Heterogeneous platforms • Linux, Solaris, Windows • Cross-organisational • Four nodes • Independently administered • Firewalls and access restrictions • Security • UK e-Science CA

  8. Grid Confusion – What is Globus? • How is Globus intended to be used? • 1: Science as first-order services: Middleware for building and hosting Grid Applications, by exposing science code as Grid services. • 2: Middleware as services: As a set of high level Grid services, composed to provide new Grid functionality. Science isn’t first-order service, but managed by Grid services.

  9. Grid Confusion – Science services or Grid services Client 1 E=mc2

  10. Grid Confusion – Science services or Grid services Client 1 D=A+2B+C2 E=mc2

  11. Grid Confusion – Science services or Grid services Client 2 1 D=A+2B+C2 D=A+2B+C2 E=mc2 E = mc2

  12. Grid Confusion – How to evaluate • Do we evaluate GT3 as middleware for hosting Grid services, or as a toolkit for constructing Grid middleware? • If the first, only need GT3 Core – just the container. If the second, need “All Services” (and more – there’s no scheduler).

  13. Grid Simplicity – Incremental • Start with Core Package • Add Security • Then try “All Services” • Simple enough – in theory

  14. Grid Steps – single node GT3 Install Install OS/HW

  15. Grid Steps – single node Configure GT3 Install Install OS/HW

  16. Grid Steps – single node Deploy Configure GT3 Install Install OS/HW

  17. Grid Steps – single node Run Deploy Configure GT3 Install Install OS/HW

  18. Grid Steps – Multiple sites GT3

  19. Grid Steps – Multiple sites GT3 GT3 GT3 GT3

  20. Grid Steps – Multiple sites Interoperate GT3 GT3 GT3 GT3

  21. Grid Steps – Multiple sites Secure Interoperate GT3 GT3 GT3 GT3 GT3 GT3

  22. Grid Steps – Multiple sites Manage Secure Interoperate GT3 GT3 GT3 GT3 GT3 GT3

  23. Grid Reality – What we found • Port number management • Host access • Remote visibility of installation, container, services • Installation by System Administrators • Tomcat or Test container • Compilation issues on Solaris • Exponential increase in testing complexity as number of nodes increases.

  24. Grid Reality – What we found • Port number management • Post number conflicts (with other services) • What port is the container running on?

  25. Grid Reality – What we found • Host access • Is the container visible on that port externally? • From which machines? • For which users? • Non-trivial to test/debug if/when something goes wrong

  26. Grid Reality – What we found • Remote visibility of installation, container, services • What infrastructure is installed? • What packages and versions? • How is it configured? • What state is it in?

  27. Grid Reality – What we found • Installation by System Administrators • Division of roles • Didn’t meet expectations • Extra effort to support multiple roles • System Administrators – install, configure and secure • Globus Administrators – test, maintain • Globus Developers – develop, deploy, test/use Grid services

  28. Grid Reality – What we found • Tomcat or Test container • Differences in deployment, configuration, and management • With Tomcat, increased potential for centralised management, and sand-boxing of run-time environment

  29. Grid Reality – What we found • Compilation issues on Solaris • Took longer than expected • Only Linux testing and support can be taken for granted

  30. Grid Reality – What we found • Exponential increase in testing complexity as number of nodes increases • Testing (and maintaining) interoperability between m client machines, and n servers gets complicated. • How well will this scale for 100s, 1000s of nodes?

  31. Grid Reality – Security • In theory just had to • obtain (and update) host, client, and CA certificates • convert • install • configure • generate (and update) proxies. • However, parts of “All Services” package also needed.

  32. Grid Security - What we found • Interactions between security for multiple installations • Essential to test non-secure interoperability first • Windows client-side security • Testing and viewing security configuration • Debugging secure calls • Client side security is programmatic • Security management scalability • Construction and maintenance of user accounts and grid-map file entries.

  33. Grid Security - What we found • Interactions between security for multiple installations • For testing may want • multiple versions, or duplicates (with different configurations) of same versions. • One container with no security, and another container with security • May want test/production environments

  34. Grid Security - What we found • Essential to test non-secure interoperability first • Trying to test interoperability and security simultaneously wasn’t fun

  35. Grid Security - What we found • Windows client-side security • Still havn’t got it working • Not obvious exactly what parts of Globus are needed for client side code with security (no “client plus security” package).

  36. Grid Security - What we found • Testing and viewing security configuration • Need to be able to view/edit and check security configuration for containers and services • Confusion about hierarchical security settings • Virtual Organisations, clusters, servers, containers, factories, services, methods, and instances. • Remotely • Validate security deployment before run-time

  37. Grid Security - What we found • Debugging secure calls (or any stateful service) • Proxy interceptor approach (e.g. TCPMON) won’t work with stateful services • As grid handle returned to client contains the port number of the instance, not the proxy • But proxies are an important design pattern for SOAs… • GT4/WS-RF may be different • Handle resolvers, WS-Addressing and WS-RenewableReferences

  38. Grid Security - What we found • Client side security is programmatic • Client side code modifications required to call services/methods with required protocols • Should be declarative • Sensitive to server side security credentials

  39. Grid Security - What we found • Security management scalability • Construction and maintenance of user accounts and grid-map file entries. • For each server, each user needs an account, and an entry in the container gridmap file (mapping client certificate to account) • May also need service specific gridmap files • Not scalable for large numbers of users, servers, services. • Alternatives? • Tool support • Role based authentication • Shared accounts or certificates

  40. Grid Recommendations • If Globus is middleware, then need: • Platform independent, automatic, installation. • Tool support for configuration and deployment creation, validation, viewing and editing. • Management console for grid, nodes, globus packages, containers and services. • Support for remote, location independent, cross-organisational, multiple role scenarios.

  41. Grid Recommendations (continued) • If Globus is middleware, then need: • Remote deployment and management of services. • Remote distributed debugging of grid installations, services, and applications. • Tool support, and more scalable processes for security.

  42. Grid Alternatives • Next we plan to evaluate the two architectural choices in more detail • Science exposed as services, vs science code managed by higher level grid services. • Explore alternative mechanisms for: • Load balancing and resource management • Directory services (service and resource discovery) • Data movement approaches (e.g. SOAP Attachments vs GridFTP)

  43. Grid Performance • First approach (initial results) • Scientific benchmark (SciMark2.0) modified to measure throughput, and invoked as a Stateful Grid Service • Metric is Calls Per Minute (CPM) – one unit of work. • No data movement, just computation and memory load. • JVM: 512MB Heap and –server (of course ) • Good performance and scalability • Security has minimal overhead • Problem with client side timeouts as response times increase

  44. Grid Performance Tomcat Fastest: 3.6s (Edinburgh) Slowest: 25s (UCL)

  45. Grid Performance 95% of predicted maximum throughput

  46. Grid Performance • Tomcat vs Test container • No difference on 3 out of 4 nodes • But 67% faster on one node (Newcastle, slowest Intel box) • Attachments will work with GT3 and Tomcat • But not with security • Limit of 1GB (DIME) • Bug in Axis – doesn’t clean up temporary files.

  47. Grid Performance • Stateful instances can be problematic • Intermittent unreliability • On some runs, 1 exception in 300 calls (reliability of .9967) • But non-repeatable, SOAP/network related? • What is the safe response to exceptions? Can’t just retry. • Possible to kill container (relies on clients being well behaved): • By invoking same instance/method more than once. • By consuming container resources • But instances can be passivated/activated in theory • Could be used to enable fine-grain (per instance) control over resource usage.

  48. Grid Deployment • How to install and configure Grid infrastructure and services - scalably and securely? • Install GT3 infrastructure and security manually • MMJFS allows executable code to be staged automatically (But not services - could provide a deployment service). • Install bootstrapping code, and then install and deploy all other code and security automatically. • Using SmartFrog (HP) in the lab, and then test-bed. • Configuring GT3 security remotely is an open-issue, as is “trust” with System Administrators.

  49. Grid Dreams - Debugging • Debugging distributed systems is tricky • Need better support for cross-cutting non-functional concerns such as deployment and debugging. • (One) problem with debugging services is not knowing the context of errors (to aid diagnosis or cure) – a service is just an interface. • Deployment aware debugging: • Starting from functional work-flows, generate deployment-flows, which are executed prior to, or concurrent with, functional work-flows. • If failure in functional work-flow, then corresponding deployment-flow is examined to determine likely causes, and parts are re-executed.

  50. Grid Dreams - Debugging • Backtrack through deployment steps (Like peeling an onion) • Some steps will need to be reversed • Track dependencies, and redundant operations. • This approach may fix an (interesting) sub-class of problems: • Those which can be fixed by simply redoing (or replicating) (part of) the installation, E.g. • Intermittent failure of container or services • Resource starvation or overload • Security problems that can be fixed with reconfiguration or refresh of certificates/proxies. • But not: • network, or all configuration and security/access problems.

More Related