1 / 77

Information Systems describing resources

Information Systems describing resources. Grid Middleware 4 David Groep, lecture series 2005-2006. Outline. Taxonomy of information systems hierarchies and republishers Grid Monitoring Architecture push and pull, subscriptions Performance of an IS collecting information sensors

lore
Download Presentation

Information Systems describing resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Systemsdescribing resources Grid Middleware 4 David Groep, lecture series 2005-2006

  2. Grid Middleware IV 2 Outline • Taxonomy of information systems • hierarchies and republishers • Grid Monitoring Architecture • push and pull, subscriptions • Performance of an IS • collecting information • sensors • IS content: schemas and approaches

  3. Grid Middleware IV 3 Grid Information Systems Concerns data • shared between administrative domains • for use by multiple people or VOs So it does not include things like • cluster temperature monitoring • debugging streams • accounting history

  4. Grid Middleware IV 4 Classification of information systems • Which monitoring systems types are suitable for grid? • Paper: • http://www.cs.man.ac.uk/~zanikols/fgcs05.pdf Different types are: • Level 0 • self-contained not accessible by programs (but only e.g. web) • Level 1 • events are accessible remotely at the single producer level • Level 2 • includes republishers with fixed functionality • Level 3 • supports hierarchies of republishers

  5. Grid Middleware IV 5 System taxonomy: levels of systems • Components used in information systems • and taxonomy levels graphics and concept from S. Zanikolas et al., FGCS 21 (2005) 163-188

  6. Grid Middleware IV 6 Information system classes Level 2 or 3 system are suitable Reference architecture: GMA • Grid Monitoring Architecture requirements • (performance) information with relatively short lifetime • frequent updates • (should) carry quality-of-information status as well • but: when you get down to it, almost anything fits in this architecture • including directories with • relatively static information • suitable mainly for resource state

  7. Grid Middleware IV 7 Grid Monitoring Architecture • Definition of terms and roles (GWD-GP-16-2) Functions: • Registry (directory) • Add, Update, Remove, Search • Producer • Maintain Registration, Accept Query, Accept (Un)subscribe, Locate Consumer, Notify, Initiate (Un)subscribe • Consumer • Locate Producer, Initiate Query, ~ (Un)subscribe, Maintain Registration, Accept Notification, ~ (Un)subscribe, Locate Event Schema

  8. Grid Middleware IV 8 GMA: Intermediaries Also referred to as ‘republishers’ make it a level-3 system Examples • Latest Producer • return the ‘last’ value of an event • Archiver (history producer) • storage of historical monitoring data • e.g. accounting records

  9. Grid Middleware IV 9 Directories • Information providers ‘publish’ information to a directory • Directories may be linked in networked hierarchies • Information is usually also in a DIT-like structure(Directory Information Tree) • Typical implementation: LDAP

  10. Grid Middleware IV 10 Approaches to sending information Orthogonal to the topology is the information flow model • Push model • information gets published regardless of its use • bet it’s there (in higher-level aggregators) when it’s needed • e.g. Condor Hawkeye, LCG BDII • Hybrid • information location gets published • consumers can subscribe to information and from then on continuously get it • e.g. R-GMA, (MDS4?) • Pull model • information is retrieved on-demand, and you cannot subscribe • e.g. MDS-2

  11. Grid Middleware IV 11 Information Systems Examples shown in this lecture • Monitoring and Discovery Service (MDS) • Relational Grid Monitoring Arch (R-GMA) • Hawk eye • Berkeley-DataBase Information Index (BDII)

  12. Grid Middleware IV 12 1 – MDS2 • Part of GT2.x • Typical use: resource selection by brokers • Architecture • decentralized • hierarchical • soft-state protocols with timeouts • supports caching in index servers • Security: GSI (optional)

  13. Grid Middleware IV 13 MDS2 Architecture graphic: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  14. Grid Middleware IV 14 MDS2 information flow • Soft-state registration of GRISes with GIISes • time out on the registration (TTL and nextUpdate) • Data retrieved on-demand from underlying GRIS • timeout on the answer • resources silently drop out if they fail • GRISes collect information using scripts • GIISes can be collated in arbitrary hierarchies

  15. Grid Middleware IV 15 2 – R-GMA • ‘straight’ implementation of the GMA • uses a relational representation of the data • notification/subscription directly from the source • implementation in Java • developed in EU DataGrid and EGEE JRA1 • UK cluster, Steve Fisher (RAL), et al.

  16. Grid Middleware IV 16 R-GMA Archirecture

  17. Grid Middleware IV 17 MON Box • Every site has a MON box to proxy information • local cache of info in memory • through-channel to systems behind a firewall • producers/consumers connect actively to the MON box • Multiple producers can publish in the same table • joins can be done, but only via a secondary producer • Usually deployed with a single registry

  18. Grid Middleware IV 18 R-GMA plain SQL interface • bosui:davidg:1001$ rgma • Welcome to the R-GMA virtual database for Virtual Organisations. • ================================================================ • Your local R-GMA server is: • https://eg.nikhef.nl:8443/R-GMA • You are connected to the following R-GMA Registry services: • https://lcgic01.gridpp.rl.ac.uk:8443/R-GMA/RegistryServlet • You are connected to the following R-GMA Schema service: • https://lcgic01.gridpp.rl.ac.uk:8443/R-GMA/SchemaServlet • Type "help" for a list of commands. • rgma> show tables • +------------------------------------------+ • | Table Name | • +------------------------------------------+ • | ArchiverTestTable | • | ... | • | GlueCE | • | ... | • +------------------------------------------+

  19. Grid Middleware IV 19 Queries • rgma> select UniqueID,Name,TotalCPUs from GlueCE WHERE UniqueID LIKE '%ulakbim%'; • +--------------------------------------------------+---------+-----------+ • | UniqueID | Name | TotalCPUs | • +--------------------------------------------------+---------+-----------+ • | ce.ulakbim.gov.tr:2119/jobmanager-lcgpbs-seegrid | seegrid | 126 | • | ce.ulakbim.gov.tr:2119/jobmanager-lcgpbs-trgrida | trgrida | 126 | • | ce.ulakbim.gov.tr:2119/jobmanager-lcgpbs-lhcb | lhcb | 126 | • ...

  20. Grid Middleware IV 20 3 – Hawkeye • Condor information system • publishes class-ads for • matchmaking • fault detection • periodic updates to the agents by the modules • information kept in the agents

  21. Grid Middleware IV 21 Hawkeye architecture graphic: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  22. Grid Middleware IV 22 4 – BDII & GIP • BDII conceptually similar to Hawkeye • but data is pulled rather than pushed • mentioned here because of it’s wide-spread deployment in EGEE/LCG, OSG, &c • Generic Information Providers (GIP) • scripting framework to produce LDIF • static values overridden by output from scripts • periodically, LDAP queries sent to subordinate directories • with time-out on the answer • previous answer is persistent for a defined amount of time • contrary to MDS2, BDII will never forget Paper:http://indico.cern.ch/materialDisplay.py?contribId=126&sessionId=23&materialId=paper&confId=0

  23. Grid Middleware IV 23 RB RB BDII organisation BDII Site BDII

  24. Grid Middleware IV 24 BDII scaling • OpenLDAP update (write) is not optimized • with SleepyCat Berkeley DB, simultaneous read/write lead to timeouts • So, put in a forwarder service that redirects to a pool of OpenLDAP/DB backends that swap roles

  25. Grid Middleware IV 25 WS style information systems • MDS4 • based on WS-RF, WS-Notification mechanisms • provides a common aggregator framework for • index service (republisher) • trigger service (send events, mails, execute programs) • archive service • NAREGI Distributed Information Service • Aggregator collect information from various sources • put these as CIM objects in a database • OGSA-DAI front-end to the database with CIM objects • PS: OGSA-DAI (Data Access & Integration) is a system for providing uniform grid access to database resources

  26. Grid Middleware IV 26 MDS4 Aggregator Framework

  27. Grid Middleware IV 27 NAREGI Distributed Information Service graphic:Satoshi Matuoka, Tokyo Institute of Technology & NII, NAREGI

  28. Grid Middleware IV 28 Status • Both developed and available • neither been tested yet at the very large scale • i.e. O(1000) resources, thousands of simultaneous queries

  29. Hierarchies and Views

  30. Grid Middleware IV 30 Views on the information system • For resource information • information view on those resources to which the viewer potientially has access • a single global root is neither feasible nor needed • a per-VO or per-infrastructure view is sufficient • For ‘application level’ monitoring • fine-grained access control needed • at the VO or user level • attributes in the schema may have different privacy levels • requires view management like in regular databases

  31. Grid Middleware IV 31 Typical hierarchical top levels today • per-infrastructure • e.g. EGEE/LCG, OSG, NAREGI • used by many VOs • needs support at the infrastructure level • per-VO view • prevalent in ‘grass-roots’ deployment • all systems can support both • although not all in the same way:R-GMA works with per-site mon boxes that (today) use a central registry -> one per infrastructure

  32. Performance an example of a grid performance study

  33. Grid Middleware IV 33 Performance analysis • Best paper so far: X. Zhang, J. Freschl, J. Schopf, A performance study of monitoring and information services for distributed systems, in: Proceedings of the 12th IEEE High Performance Distributed Computing (HPDC-12 2003), IEEE Computer Society Press, Seattle, WA, USA, 2003, pp. 270–282. • Perf results on R-GMA are outdated, but basics still do hold • MDS2 has since been replaced with MDS4 (in GT4) • The three systems selected are indicative of the different classes, and thus it’s a very valuable comparison! Data in the next slides by Jennifer Schopf from the GridForum NL/ISOC NL Masterclass 2005

  34. Grid Middleware IV 34 Roles of components in the comparison ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  35. Grid Middleware IV 35 Performance analysis • Three ‘characteristics’ systems • MDS2 (pull system, with and without caching) • R-GMA (hybrid, straight GMA implementation w/Relational IF) • Hawkeye (push system, from Condor) • Tests done on a small test bed (~7 systems) • scaling has not been tested • but results are at least comparable ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  36. Grid Middleware IV 36 Performance analysis: other facts • Keep in mind that MDS2 & Hawkeye are programmed in C • R-GMA is in Java • This R-GMA version relied heavily on threads • i.e. implementation was straight translation of architecture • JVM and Linux kernel 2.4 don’t like too many O(500) threads…

  37. Grid Middleware IV 37 Model for evaluation • paper attempts to compare similar properties in the three systems • deploy in a standard mode (as depicted) ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  38. Grid Middleware IV 38 Experiments in Zhang et al. • How many users can query an information server at a time? • How many users can query a directory server? • How does an information server scale with the amount of data in it? • How does an aggregator scale with the number of information servers registered to it? ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  39. Grid Middleware IV 39 2 4 1 3 Experiments ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  40. Grid Middleware IV 40 Comparing Information Systems • We also looked at the queries in depth - NetLogger • 3 phases • Connect, Process, Response Response Process Connect ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  41. Grid Middleware IV 41 Testbed • Lucky cluster at Argonne • 7 nodes, each has two 1133 MHz Intel PIII CPUs (with a 512 KB cache) and 512 MB main memory • Users simulated at the UC nodes • 20 P3 Linux nodes, mostly 1.1 GHz • R-GMA has an issue with the shared file system, so we also simulated users on Lucky nodes • All figures are 10 minute averages • Queries happening with a one second wait between each query (think synchronous send with a 1 second wait) ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  42. Grid Middleware IV 42 Metrics • Throughput • Number of requests processed per second • Response time • Average amount of time (in sec) to handle a request • Load • percentage of CPU cycles spent in user mode and system mode, recorded by Ganglia • High when running small number compute intensive aps • Load1 • average number of processes in the ready queue waiting to run, 1 minute average, from Ganglia • High when large number of aps blocking on I/O ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  43. Grid Middleware IV 43 Information Server Throughputvs. Number of Users (Larger number is better) ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  44. Grid Middleware IV 44 Query Times 400 users 50 users (Smaller number is better) ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  45. Grid Middleware IV 45 Experiment 1 Summary • Caching can significantly improve performance of the information server • Particularly desirable if one wishes the server to scale well with an increasing number of users • When setting up an information server, care should be taken to make sure the server is on a well-connected machine • Network behavior plays a larger role than expected • If this is not an option, thought should be given to duplicating the server if more than 200 users are expected to query it ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  46. Grid Middleware IV 46 Directory Server Throughput (Larger number is better) ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  47. Grid Middleware IV 47 Directory Server CPU Load (Smaller number is better) ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  48. Grid Middleware IV 48 Query Times 400 users 50 users (Smaller number is better) ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  49. Grid Middleware IV 49 Experiment 2 Summary • Because of the network contention issues, the placement of a directory server on a highly connected machine will play a large role in the scalability as the number of users grows • Significant loads are seen even with only a few users, it will be important that this service be run on a dedicated machine, or that it be duplicated as the number of users grows. ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

  50. Grid Middleware IV 50 Information Server Scalabilitywith Information Collectors (Larger number is better) ideas, graphics, results: J. Schopf, GFNL masterclass 2005: Distributed Monitoring and Information Services for the Grid

More Related