1 / 33

B est E ver A larm S ystem T oolkit

B est E ver A larm S ystem T oolkit. Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009. Previous Attempts at SNS. ALH, soft-IOCs and EDM screens Issues GUI Static Layouts N clicks to see (some of the) active alarms Configuration

Download Presentation

B est E ver A larm S ystem T oolkit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

  2. Previous Attempts at SNS ALH, soft-IOCs and EDM screens Issues GUI Static Layouts N clicks to see (some of the) active alarms Configuration .. was bad  Always too many alarms Changes required contacting one of the 2 experts,restart ALH, … Information Operator guidance? Related displays? Most frequent alarm? Timeline of alarm?

  3. New End-User View: Alarm Table • All currentalarms • new, ack’ed • Sort by PV,Descr., Time, Severity, … • Optional: Annunciate • Acknowledge one or multiple alarms • Select by PV or description • BNL/RHIC type un-ack’

  4. Another View: Alarm Tree • All alarms • Disabled, inactive, new, ack’ed • Hierarchical • Optionally only showactive alarms • Ack’/Un-ack’ PVs or sub-tree

  5. Guidance, Related Displays, Commands • Basic Text • Start EDM screen • Open web page • Run ext. command Hierarchical:Including info of parent entries Merges Guidance etc. from all selected alarms

  6. .. Within CSS • Alarms • History of PV • EPICS Config.

  7. E-Log Entries • “Logbook”from context menucreates text w/basic info aboutselected alarms.Edit, submit. • Pluggable implementation, not limited to Oracle-based SNS ELog

  8. Online Configuration Changes .. may require Authentication/Authorization • Log in/out while CSS is running

  9. Configure PV • Again online • Especially usefulfor operators toupdate guidanceand relatedscreens.

  10. Logging • ..into generic CSS log also used for error/warn/info/debug messages • Alarm Server: State transitions, Annunciations • Alarm GUI: Ack/Un-Ack requests, Config changes • Generic Message History Viewer • Example w/ Filter on TEXT=CONFIG

  11. Logging: Get timeline • Example: Filter on TYPE, PV 6. All OK 4. Problem fixed 5. Ack’ed by operator 3. Alarm Server annunciates 1. PV triggers,clears, triggers again 2. Alarm Server latches alarm

  12. All Sorts of Web Reports

  13. Technical View IOCs PV Updates (Channel Access, …) • Tomcat • Reports Alarm Server Current Alarms: Acknowledged? Transient? Annunciated? Log Messages Alarm Updates Ack’; Config Updates Annunciations Alarm Cfg & State RDB JMS ALARM_SERVER ALARM_CLIENT LOG TALK Alarm Client GUI JMS2Speech JMS2RDB MessageRDB CSS Applications

  14. Latch highest severity, or non-latching like ALH “ack. transient” Annunciate Chatter filter ala ALH Alarm only if severity persists some minimum time .. or alarm happens >=N times within period Optional formula-based alarm enablement: Enable if “(pv_x > 5 && pv_y < 7) || pv_z==1” … but we prefer to move that logic into IOC When acknowledging MAJOR alarm, subsequent MINOR alarms not annunciated ALH would again blink/require ack’ General Alarm Server Behavior

  15. Best Ever Alarm System Tools, Indeed .. but Tools are only half the issue Good configuration requires plan & follow-up. B. Hollifield, E. Habibi,"Alarm Management: Seven (??) Effective Methods for Optimum Performance", ISA, 2007

  16. Alarm Philosophy Goal: Help operators take correct actions Alarms with guidance, related displays Manageable alarm rate (<150/day) Operators will respond to every alarm(corollary to manageable rate)

  17. DOES IT REQUIRE IMMEDIATE OPERATOR ACTION? What action? Alarm guidance! Not “make elog entry”, “tell next shift”, … Consider consequence of no action Is it the best alarm? Would other subsystems, with better PVs, alarm at the same time? What’s a valid alarm?

  18. How are alarms added? Alarm triggers: PVs on IOCs But more than just setting HIGH, HIHI, HSV, HHSV HYST is good idea Dynamic limits, enable based on machine state,... Requires thought, communication, documentation Added to alarm server with Guidance: How to respond Related screen: Reason for alarm (limits, …), link to screens mentioned in guidance Link to rationalization info (wiki)

  19. Mostly: How long will beam be off? Impact/Consequence Grid

  20. .. combined with Response Time • This part is still evolving…

  21. Example: Elevated Temp/Press/Res.Err./… Immediate action required? Do something to prevent interlock trip Impact, Consequence? Beam off: Reset & OK, 5 minutes? Cryo cold box trip: Off for a day? Time to respond? 10 minutes to prevent interlock?  MINOR? MAJOR? Guidance: “Open Valve 47 a bit, …” Related Displays: Screen that shows Temp, Valve, …

  22. “Safety System” Alarms Protection Systems not per se high priority Action is required, but we’re safe for now, it won’t get worse if we wait Pick One “Mommy, I need to gooo!” “Mommy, I went” (Does it require operator action? How much time is there?)

  23. Avoid Multiple Alarm Levels Analog PVs for Temp/Press/Res.Err./…: Easy to set LOLO, LOW, HIGH, HIHI Consider: Do they require significantly different operator actions? Will there be a lot of time after the HIGH to react before a follow-up HIHI alarm? In most cases, HIGH & HIHI only double the alarm traffic Set only HSV to generate single, early alarm Adding HHSV alarm assuming that the first one is ignored only worsens the problem

  24. Bad Example: Old SNS ‘MEBT’ Alarms • Each amplifier trip:≥ 3 ~identicalalarms, no guidance • Rethought w/ subsystemengineer, IOC programmerand operators: 1 better alarm

  25. Alarms for Redundant Pumps

  26. Control System Pump1 on/off status Pump2 on/off status Simple Config setting: Pump Off => Alarm: It’s normal for the ‘backup’ to be off Both running is usually bad as well Except during tests or switchover During maintenance, both can be off Alarm Generation: Redundant Pumps the wrong way

  27. Redundant Pumps Control System Pump1 on/off status Pump2 on/off status Number of running pumps Configurable number of desired pumps Alarm System: Running == Desired? … with delay to handle tests, switchover Same applies to devices that are only needed on-demand Required Pumps: 1

  28. Weekly Review: How Many? Top 10?

  29. A lot of information available • How often did PV trigger? • For how long? • When? • Temporary issue?Or need HYST,alarm delay,fix to hardware?

  30. Weekly Check: Stale, Forgotten?

  31. What about the DESY Alarm System? IOC Other CSS Interconnection Server No Channel Access Monitor of selected alarm PVs! IOCs push all alarms via new protocol into Interconn. Server. JMS Filt.Alrm ALARM LOG JMS2RDB Filters LDAP GUI: Similar to SNS GUI shown here RDB

  32. Design Choices Similar alarm table and tree GUIs JMS for communication slightly different messages, though DESY IOCs send all alarms, then filtered in AMS DESY: All IOC alarms should show up in AMS, zero additional configuration At SNS, how many of the 350000 PVs would send alarms?We want to make the addition of alarms simple, but not automatic, and encourage guidance, related displays. DESY/SNS: LDAP vs. RDB for configuration/state Choice was based on available infrastructure. JMS Listeners SNS: Logger, Annunciator DESY: Logger, Send SMS, EMail, Voice Mail

  33. Summary • BEAST operational at SNS since Feb’09 • DESY AMS is similar and has beenoperational for longer • Pick either, but good configuration requires work in any case • Started with previous “annunciated” alarms • ~300, no guidance, no related displays • Now ~400, all with guidance, rel. displays, links to operational procedures • “Philosophy” helps decide what gets added and how • Immediate Operator Action? Consequence?Response Time? • Weekly review spots troubles and tries to improve configuration

More Related