1 / 45

Safety Critical Computer Systems - Open Questions and Approaches

Safety Critical Computer Systems - Open Questions and Approaches. Andreas Gerstinger Institute for Computer Technology February 16, 2007. Agenda. Safety-Critical Systems Project Partners Three research topics Safety Engineering Diversity Software Metrics Conclusion and Outlook.

brygid
Download Presentation

Safety Critical Computer Systems - Open Questions and Approaches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Safety Critical Computer Systems - Open Questions and Approaches Andreas Gerstinger Institute for Computer Technology February 16, 2007

  2. Agenda • Safety-Critical Systems • Project Partners • Three research topics • Safety Engineering • Diversity • Software Metrics • Conclusion and Outlook

  3. Safety-Critical Systems

  4. Safety Critical Systems • A safety-critical computer system is a computer system whose failure may cause injury or death to human beings or the environment • Examples: • Aircraft control system (fly-by-wire,...) • Nuclear power station control system • Control systems in cars (anti-lock brakes,...) • Health systems (heart pacemakers,...) • Railway control systems • Communication systems • Wireless Sensor Networks Applications?

  5. SYSARI Project • SYSARI = SYstem SAfety Research in Industry • Goal of the project • to conduct and promote the research in system safety engineering and safety-critical system design and development • Close cooperation between ICT and Industry • One "shared" Employee (me) • Students conducting practical Diploma Theses • PhD Theses

  6. What is Safety? “The avoidance of death, injury or poor health to customers, employees, contractors and the general public; also avoidance of damage to property and the environment” Safety is also defined as "freedom from unacceptable risk of harm" A basic concept in System Safety Engineering is the avoidance of "hazards" Safety is NOT an absolute quantity!

  7. Safety vs. Security • These two concepts are often mixed up • In German, there is just one term for both!

  8. SILs and Dangerous Failure Probability

  9. Project Partners

  10. Austrian High Tech company World leader in air traffic control communication systems 700 employees, company based in Vienna, customers all over the world http://www.frequentis.com Project Partner:

  11. Enables communication between aircraft and controller Communication link must never fail! Requirements: Safety High Availability and Reliability Fault Tolerance Other domains: railway ambulance, police, fire brigade,... maritime Safety Integrity Level 2 Frequentis Voice Communication System

  12. French company 68000 employees worldwide Mission critical information systems 25000 researchers Nobel Prize in Physics 2007 awarded to Albert Fert, scientific director of Thales research lab http://www.thalesgroup.com Project Partner:

  13. Signalling and Switching Axle Counters Applications for ETCS An incorrect output may lead to an incorrect signal causing a major accident! Safety Integrity Level 4 (highest) Railway Signalling Systems

  14. (Old) Interlocking Systems Mechanical / Electromechanical Systems

  15. Signal Box / Interlocking Tower • Electric system with some electronics

  16. Modern Signal Box / Interlocking Tower • Lots of electronics and computer systems

  17. Safety Engineering

  18. What is a Hazard? • Hazard • physical condition of platform that threatens the safety of personnel or the platform, i.e. can lead to an accident • a condition of the platform that, unless mitigated, can develop into an accident through a sequence of normal events and actions • "an accident waiting to happen" • Examples • oil spilled on staircase • failed train detection system at an automatic railway level crossing • loss of thrust control on a jet engine • loss of communication • distorted communication • undetectably incorrect output

  19. Hazard Severity Level (Example)

  20. Hazard Probability Level (Example)

  21. Risk Classification Scheme (Example)

  22. Risk Class Definition (Example)

  23. Risk Acceptability • Having identified the level of risk for the product we must determine how acceptable & tolerable that risk is • Regulator / Customer • Society • Operators • Decision criteria for risk acceptance / rejection • Absolute vs. relative risk (compare with previous, background) • Risk-cost trade-offs • Risk-benefit of technological options

  24. Risk Tolerability Hazard Severity Probability Risk Risk Criteria Risk Reduction Measures Tolerable? No Yes

  25. Diversity

  26. Diversity • Goal: Fault Tolerance/Detection • Diversity is "a means of achieving all or part of the specified requirements in more than one independent and dissimilar manner." • Can tolerate/detect a wide range of faults "The most certain and effectual check upon errors which arise in the process of computation, is to cause the same computations to be made by separate and independent computers; and this check is rendered still more decisive if they make their computations by different methods." Dionysius Lardner, 1834

  27. Layers of Diversity

  28. Examples for Diversity • Specification Diversity • Design Diversity • Data Diversity • Time Diversity • Hardware Diversity • Compiler Diversity • Automated Systematic Diversity • Testing Diversity • Diverse Safety Arguments • … Some faults to be targeted: programming bugs, specification faults, compiler faults, CPU faults, random hardware faults (e.g. bit flips), security attacks,...

  29. Use of two diverse compilers to compile one common source code Compiler Diversity

  30. Compiler Diversity: Issues • Targeted Faults: • Systematic compiler faults • Some Heisenbugs • Some systematic and permanent hardware faults (if executed on one board) • Issues: • To some degree possible with one compiler and different compile options (optimization on/off,…) • If compilers from different manufacturers are taken, independence must be ensured

  31. Systematic Automatic Diversity • Artificial introduction of diversity to tolerate HW Faults • (Automatic) Transformation of program P to a semantically equivalent program P' which uses the HW differently • e.g. different memory areas, different registers, different comparisons,... if A=B then  if A-B = 0 then A or B  not (not A and not B)

  32. Systematic Automatic Diversity • What can be "diversified": • memory usage • execution sequence • statement structures • array references • data coding • register usage • addressing modes • pointers • mathematical and logic rules

  33. Systematic Automatic Diversity: Issues • Targeted Faults: • Systematic hardware faults • Permanent random hardware faults • Issues: • Can be performed on source code or assembler level • If performed on source code level, it must be ensured that compiler does not "cancel out" diversity • (Software) Fault injection experiments showed an improvement of a factor ~100 regarding HW faults

  34. Position P can be calculated based on speedometer and accelerometer readings Voter can also be implemented diversely PositionA and PositionB could be transmitted in different formats Example: Diverse Calculation of Position

  35. Open Issues • How can diversity be used most efficiently? • Can diversity be introduced automatically? • Which faults are detected/tolerated to which extent? • How can the quality fo the diversity be measured? • Can diversity be also used to detect security intrusions?

  36. Software Metrics

  37. Problems Which metrics should safety-critical software fulfill? Which coding rules are good and useful? What are the desired ranges for metrics? Which metrics influence maintainability? Software Metrics for Safety-Critical Systems

  38. Some RAW Metrics...

  39. Outline of Method • Create a questionnaire with relevant questions regarding software quality and get answers from expert developers for various software packages they work with • Automatically measure potentially interesting metrics of the software packages • Correlate questionnaire responses with the measured metrics to find out which metric correlates with which property

  40. Graph 3: Code Clarity vs. Return Points

  41. Graph 4: Internal Quality vs. CC

  42. Summary of Results • Strongest correlation with perceived internal quality: • Comment density • Control Flow Anomalies • No correlation with perceived internal quality: • Cyclomatic Complexity • Average Method Size • Average File Size • ...

  43. Conclusion and Outlook

  44. Further Related Topics • Agile Methods in Safety Critical Development • Hazard Analysis Methods • Safety Standards • Safety of Operating Systems • COTS Components for Safety-Critical Systems • Safety Aspects of Modern Programming Languages (Java, C#.NET) • Fault Detection, Correction and Tolerance • Safety and Security Harmonisation • Linux in Safety-Critical Environments • Online Tests to detect hardware faults

  45. Conclusion • Many open issues in this field... • All research activities in SYSARI project practically motivated • Number of safety-critical systems increases • International Standards play a vital role (e.g. IEC 61508) Contact: Andreas Gerstinger: gerstinger@ict.tuwien.ac.at

More Related