1 / 3

Background

Background. OPS measure of reliability was instrumental in improving site reliability And was “easily” understood Became a metric shown to funding agencies as an indicator of MoU adherence SAM & Nagios frameworks very useful in relaying monitors and alarms to sites

amalia
Download Presentation

Background

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Background • OPS measure of reliability was instrumental in improving site reliability • And was “easily” understood • Became a metric shown to funding agencies as an indicator of MoU adherence • SAM & Nagios frameworks very useful in relaying monitors and alarms to sites • VO-specific monitoring started in SAM/Nagios and then was also implemented in dashboards • Reporting comes from the SAM/Nagios tests; but experiments rely on the dashboard tests

  2. What now? • OPS tests don’t represent what the experiment sees of a site • But we need to keep them for reporting; OR replace them with something more representative but still understandable • VO-tests: • Are the SAM/Nagios VO tests useful? • Should they be replaced by the dashboard tests (are they different?) • Still need to publish to SAM/Nagios to enable relaying to sites and to the report generation

  3. Some comments • Some sites see the utility of having OPS and VO tests (but SAM-VO tests could be replaced by the dashboard tests) • Tests may not always execute if a site is busy • esp. if run as normal user • Better to run as part of pilot? (solves “special” WN, queue issue too)

More Related