220 likes | 684 Views
Operational Procedures Manual. Ioannis Liabotis <ilaboti at grnet.gr>. Current Status. Last Update Summer 2007 Needs Urgent Update Things changed since then Status of the Draft Revision 22 Cyril -16 Jan 08 3.1 First Time for ROC managers Emails changed Register Support staff details
E N D
Operational Procedures Manual Ioannis Liabotis <ilaboti at grnet.gr>
Current Status Last Update • Summer 2007 Needs Urgent Update • Things changed since then Status of the Draft • Revision 22 Cyril -16 Jan 08 • 3.1 First Time for ROC managers • Emails changed • Register Support staff details • 3.2 First time for COD team members • LCG-ROLLOUT and GOCDB links changed EGEE Operations (SA1), EGEE 07, Budapest, 4 Oct 07
Proposed Changes 4.1.2 Site downtime scheduling • Set downtime in GOC DB – BROADCST automatic • Removed set monitoring off step in downtime procedure 7.1 COD team regular tasks • Removed step 5: Host Certificate Lifetime • Removed step 6: SAM test <Service-name>-host-cert-valid • Added step 10:If a decision like site suspension is taken in the Operations Meeting the team on Duty must fill up the handover to inform all COD team. 7.2 Problem detection • Changed some links 7.7 PPS and USA sites • COD Teams are not responsible for raising tickets against PPS and USA sites EGEE Operations (SA1), EGEE 07, Budapest, 4 Oct 07
Proposed Changes 8.9 Certificate Validity tests • Removed Changed some invalid Links Changes by Ioannis • 3.2 First time for COD team members • Changed some links to GOCDB • 4.1.1 Introducing a new site • Monitoring should be off • Monitoring goes on when site is certified • More changes related to monitoring statuses EGEE Operations (SA1), EGEE 07, Budapest, 4 Oct 07
New Changes Description of the OPS Manual update procedure OPS Manual Restructuring to make it compliant with the SLD Proposed time line Approve all changed apart from update procedure and restructuring and release it now Prepare a draft of the restructuring procedure and ask ROC managers approval EGEE Operations (SA1), EGEE 07, Budapest, 4 Oct 07
OPS Manual Restructuring Draft to come out of the Parallel sessions. EGEE Operations (SA1), EGEE 07, Budapest, 4 Oct 07
Update Procedure Change requests come from users of the manual (sites, COD) and SA1 management Use the COD support unit to ask for changes in the OPS manual. Requests for changes are either directly registered in GGUS or spotted in OPS, ROC meetings or mailing lists. A proper ticket must be opened. The type of change is evaluated If it is simply editorial is simply accepted and added to the OPS manual. No BROADCAST is sent. Wiki and EDMS document are changed. EGEE Operations (SA1), EGEE 07, Budapest, 4 Oct 07
Update Procedures If it affects work of COD, COD had to agree via mailing list. If COD cannot agree then the issue goes to the ROC managers After approval wiki and EDMS doc are updated and BROADCAST is sent. If change affects sites ROC managers and sites need to approve the changes? This has to be seen in more detail to avoid delays. EGEE Operations (SA1), EGEE 07, Budapest, 4 Oct 07
Outcome of Discussion with Best Practices What we do if alarms to production nodes with monitoring OFF!! • 1. COD Does not open a ticket to the site • 2. If this happens COD opens a ticket to GOCDB and writes it to the hand over report to be raised in the OPS meeting. Nodes in downtimes are distinguished in CIC portal • tickets are not opened against them. Nodes that are not production in a production site or nodes not in GOCDB (until the issue with the GOCDB/BDII synchronization) • We must be able to distinguish them in the Alarm • Recommend to sites to have the nodes out of BDII. • If they cannot then set the nodes in downtime in GOCDB. • Otherwise COD might open tickets against them. • Register in GOCDB and put into scheduled downtime. EGEE Operations (SA1), EGEE 07, Budapest, 4 Oct 07
Actions Finish the Changes based on he current decisions Propose the restructuring and send it to the mailing list EGEE Operations (SA1), EGEE 07, Budapest, 4 Oct 07