1 / 22

A Standards Based Alarms Service for Monitoring Federated Networks

A Standards Based Alarms Service for Monitoring Federated Networks. Kostas Kavoussanakis, Jeremy Nowell , Charaka Palansuriya, Florian Scharinger, Arthur Trew ICNS 2009 Valencia 24 April 2009. Project Background. EPCC is supercomputing centre at University of Edinburgh

megara
Download Presentation

A Standards Based Alarms Service for Monitoring Federated Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Standards Based Alarms Service for Monitoring Federated Networks Kostas Kavoussanakis, Jeremy Nowell, Charaka Palansuriya, Florian Scharinger, Arthur Trew ICNS 2009 Valencia 24 April 2009

  2. Project Background • EPCC is supercomputing centre at University of Edinburgh • Host UK national academic HPC service • Academic and industrial consultancy • http://www.epcc.ed.ac.uk/ • EPCC has been working in area of network monitoring for Grids for 5 years • First within EGEE project, now more widely Jeremy Nowell - A Standards Based Alarms Service

  3. Overview • Challenges of monitoring federated networks • Standards-based network monitoring • Why an Alarms Service • Architecture • Examples • Future Work Jeremy Nowell - A Standards Based Alarms Service 3

  4. Federated Networks Jeremy Nowell - A Standards Based Alarms Service

  5. Types Tools Data Formats Administrative Domains User Groups Network Monitoring Challenges backbone iperf ping end-to-end netflow perfSONAR Network Monitoring project Flat file MAN SQL End user NREN RRD NOC GOC Jeremy Nowell - A Standards Based Alarms Service

  6. Federated Networks for Grids • For Grids need • unified view • end-to-end performance • real achievable application performance Jeremy Nowell - A Standards Based Alarms Service

  7. Federated Network Monitoring Strategy • Use existing tools and data • Do not try and force adoption of single tool across large multi-administrative domains • Instead provide framework for accessing distributed data • Use standards-based solutions where possible • Access wide range of data • Allow interoperability between grids, projects and networks Jeremy Nowell - A Standards Based Alarms Service

  8. Standards-Based Network Monitoring • Data federation through use of schema provided by Open Grid Forum (OGF) Network Measurements Working Group (NM-WG) NM-WG Schema allows interoperability between clients and measurement frameworks Jeremy Nowell - A Standards Based Alarms Service

  9. Standards Based Network Monitoring • EPCC has developed tools for accessing historical network performance data from multiple measurement frameworks • e2emonit • End-to-end metrics (TCP/UDP achievable bandwidth, RTT, packet loss, OWDV) • Active measurement tools (iperf, ping, udpmon) • perfSONAR • Developed by collaboration including GÉANT2, ESnet, Internet2 • Passive data for router interfaces • Utilisation, input errors, output drops • Traceroute information Jeremy Nowell - A Standards Based Alarms Service

  10. ALARMS But… • Historical data only useful for diagnosing problems when you already know something is wrong • What users really needed are… Jeremy Nowell - A Standards Based Alarms Service

  11. Requirements • A network Alarms Service • Allows the timely detection of problems • Notifies users • Gives an “at a glance” view of network status Jeremy Nowell - A Standards Based Alarms Service

  12. Specific Requirements • Motivated by the LHCOPN • 10 Gb/s private network for moving data generated by the LHC • perfSONAR based monitoring solution deployed and operated by DANTE • Need following alarms as minimum • Unexpected path changes • Routing out of private network • Router Interface Congestion • Packets lost Jeremy Nowell - A Standards Based Alarms Service

  13. Strategy • Query • Detect • Notify Jeremy Nowell - A Standards Based Alarms Service

  14. Architecture Jeremy Nowell - A Standards Based Alarms Service

  15. Details • Query • NM-WG standard queries to perfSONAR RRD and HADES Measurement Archives • Passive Router Data – interface errors, drops, utilisation • Traceroute Information • Detect • Rules based mechanism to process data against rules defined in configuration files • DROOLS library • Notify • Output status in form usable by Nagios • Status display, notifications, history • Easily implement more status notifiers Jeremy Nowell - A Standards Based Alarms Service

  16. Examples Jeremy Nowell - A Standards Based Alarms Service

  17. Examples Jeremy Nowell - A Standards Based Alarms Service

  18. Examples Jeremy Nowell - A Standards Based Alarms Service

  19. Current Status • Prototype is currently being used by DANTE to monitor some LHCOPN paths and interfaces, for the required alarm conditions • Test functionality • Gather feedback from users • Will be further developed and deployed to monitor whole of LHCOPN during this year • Actively looking for other users Jeremy Nowell - A Standards Based Alarms Service

  20. Further Work • Implement more alarm conditions • Send status information to other consumers, eg network weather map • Think about data processing • eg “cleaning” of data to remove bad data points • Statistical processing etc Jeremy Nowell - A Standards Based Alarms Service

  21. Summary • Monitoring of federated networks is a challenge • An Alarms Service is critical for problem discovery • The LHCOPN is being monitored using an initial version • and will be developed further to be deployed to monitor the whole network Jeremy Nowell - A Standards Based Alarms Service

  22. Acknowledgements • Funding • UK Joint Information Systems Committee (JISC) • EGEEII (INFSO-RI-031688) • DEISA2 (RI-222919) • Collaboration • DANTE • DFN WiN-Labor Erlangen • LHC-OPN Jeremy Nowell - A Standards Based Alarms Service

More Related