160 likes | 370 Views
LAS. Lemon Alarm System Miroslav Siket, Karol Stanislawek CERN-IT/FIO-FD. Outline. Scope of LAS SURE and history LAS architecture Exceptions vs. SURE alarms LAS logic and LAS GUI Current status SURE phase-out and LAS phase-in plan Future and concluding notes. Scope.
E N D
LAS Lemon Alarm System Miroslav Siket, Karol Stanislawek CERN-IT/FIO-FD
Outline • Scope of LAS • SURE and history • LAS architecture • Exceptions vs. SURE alarms • LAS logic and LAS GUI • Current status • SURE phase-out and LAS phase-in plan • Future and concluding notes FIO Group meeting
Scope • Provide alarm system for the operators in the Computer Centre at CERN • Scalable to 10k+ machines, 300+ alarms • Provide high availability solution FIO Group meeting
SURE and history • SURE is 13 years old • Scalability issues • Old interface • Missing features • Further maintenance issues • Previously considered systems • PVSS, Spectrum, LASER, EDG prototype,… • Either hard to interface (bad support) or not scalable to desired number of machines/alarms • Configuration limited FIO Group meeting
LAS architecture FIO Group meeting
LAS Schema FIO Group meeting
Exceptions vs. SURE alarms • Sure alarms are based on comparing individual single- valued metrics with reference values • Exceptions • Are based on correlation of multiple metrics • Allow multi-valued metrics and on-behalf metrics • Allow regular expressions • Allow logical operations • Allow basic mathematical operations (+,-,*,/) • Allow corrective actions (actuators) up to n-times or within given time window • Allow distinguishing of the alarm state (failed actuator,…) • Example: (10004:7 > 100 && (10005:3 – 34:5)>100:56) FIO Group meeting
LAS business logic • Evaluation of exceptions -> alarms • Entity status derived out of CDB • Reductions – horizontal and vertical • Hide and inhibit FIO Group meeting
Notifications, RSS, SMS,… • Built in notification mechanisms: • E-mail • SMS • RSS (requires SSL aware RSS reader) • Configurable per user as to what entities are to be reported on and what are the days/hours during which the notifications are enabled (SMS) • Possible aggregation of the notifications FIO Group meeting
LAS GUI • Running in any recent browser with: • support XMLHTTPRequest() call - AJAX • is DOM 1.1 compliant • supports CSS • SSL strong encryption • Netscape 6+, IE5+, FF 0.5+, Mozilla 1.0+,… • Requires that you have FlashPlayer installed • Allows multiple users – actions as synchronized • Configurable FIO Group meeting
Current status • Work in progress • Pre-production version running, together with GUI • Preparing operators and user GUIs • Testing and optimizing business logic of LAS • Preparing notification and RSS structures • Synchronization of SURE alarms with Exceptions (major CDB endeavour) • Waiting for “reliable” hw for the servers • Preparing for testing of HA solutions (Oracle RAC/DG) FIO Group meeting
SURE phase-out plan • Several phases are needed • Replacement of ForSure sensor with sensor-sure that would be based on exceptions • Synchronization of SURE alarms and Exceptions with extra SURE server running in sync mode • Lemon-sensor-remote and UIMON port to LAS • Phasing in lemon-host-check • Running LAS in parallel with SURE for a month • Training of operators • LAS production FIO Group meeting
Future and conclusions • Future alarm system based on modern technologies • More useful, provides direct information to the service managers about status of their services FIO Group meeting