Enrico Tronci Computer Science Department - Sapienza University of Rome

Enrico Tronci Computer Science Department - Sapienza University of Rome Via Salaria, 113 - 00198 Roma – Italy tronci@di.uniroma1.it Exploiting Transition Locality in Disk Based Verification ofConcurrent and Hybrid SystemsFrom Hybrid Systems to Graph Search Schloss Dagstuhl 29.11.2009 - 04.12.2009 Seminar 09491 Graph Search Engineering

Enrico Tronci: http://www.dsi.uniroma1.it/~tronci Goal Automatic Safety Verification of Concurrent and Hybrid Systems. Example Given an undesired state BAD (e.g. an error state) we want to know under which conditions, if any, our system can reach BAD.

Enrico Tronci: http://www.dsi.uniroma1.it/~tronci Contents • Model Checking Problem • Hybrid Systems • From Hybrid System Verification to Explicit Model Checking • Explicit Model Checking • BFS Based Explicit Model Checking • Disk Based BFS

Enrico Tronci: http://www.dsi.uniroma1.it/~tronci Model Checking Game System Model + Param. Ranges + Disturbances Reqs (undesired/desired states)‏ Init States Model Checker Yes I.e. no sequence of events (states) can possibly lead to an undesired state. Counterexample I.e. sequence of events (states) leading to undesired state.

Hybrid Systems • A Hybrid Systems is a dynamocal systems that can undertake continuous as well as discrete changes. • Most embedded systems are indeed hybrid systems where discrete values model (control) software states and continuous variables model the state of the physical system.

Example 1:A Battery Manager SystemPartially Supported by ESA Project AO5459System Software Functional Requirements TechniquesIntecs – Thales Alenia Space France La Sapienza University of Rome

Battery Manager System

BM Commands (actions): Connect/Disconnect Battery from Solar Array Disturbances Load Variation Battery Parameters Variation Vs Variation Battery Manager (BM)‏ Plant (Solar Array, Battery, Load)‏ BM Commands Battery Voltage (Vc), Current (Ic)‏ BMS as a Control System

Battery Manager System Level Requirements • Req1: The battery voltage Vc shall not exceed a given threshold Vc_max for a too long period of time. That is, (Vc <= Vc_max) for most of the time. • Req2: The battery voltage Vc shall not fall below a given threshold Vc_min for a too long period of time. That is: (Vc >= Vc_min) for most of the time. • Req3: The battery current Ic shall not exceed a given threshold Ic_max for a too long period of time. That is: (Ic <= Ic_max) for most of the time. • Req4: The battery current Ic shall not fall below a given threshold Ic_min for a too long period of time. That is: (Ic >= Ic_min) for most of the time. • The safety requirement (SafeReq) for the BMS is the logical AND of the above requirements: • SafeReq = (Req1 AND Req2 AND Req3 AND Req4). • Our verification goal will be checking that the system level safety requirement SafeReq be always satisfied.

Software BehaviourBattery Manager Policy for Solar Array • If ( (Vc >= 1.5*Vc_nom) or • (Ic >= 0.5*Ic_max) or (Vs <= Vc) or (Vc <= 0.2*Vc_nom))‏ • then BM will disconnect the solar array from the battery • else BM will connect the solar array to the battery. • That is: • If ( (Vc >= 1.5*Vc_nom) or • (Ic >= 0.5*Ic_max) or (Vs <= Vc) or (Vc <= 0.2*Vc_nom))‏ • then bmv:=0 • else bmv:=1.

Modelling Disturbances • Arbitrary disturbances are not tenable. • Example: if we allow Vs and R to assume arbitrary values we can have Vs to be 0 (no light) for a very long period of time and R to take its minimal value (i.e. a heavy electrical load). This will discharge the battery driving the battery voltage below its safety threshold. Thus Vs and R must be friendly to the battery. • As for R • When there is no light (Vs = 0) the value of R is as large as possible (light electrical load). This is reasonable since turning on or off of on board equipment can be scheduled to avoid heavy loads when there is no light. • As for Vs • It is not the case that there is no light (Vs = 0) for a too long time • There is light for a long enough time. • Explicitly defining a disturbance model may turn out to be an added value by itself since it can help avoiding system failures due to usage of the system under conditions for which it was never designed.

Modelling the Battery Manager System with Hybrid Systems… a tiny glimpse of it …

HS Model for Vs Timers

Identifying a Model Checker • HyTech cannot handle BMS model since rate conditions involve state variables (see LHS for Vc)‏ • PhaVer cannot handle BMS model since rate conditions are not linear because state variables C, Rbat, R (modelling disturbances) occur in coefficients: a10, b10, a01, a11, b11. • NuSMV + MathSat do bounded model checking whereas our goal is unbonded model checking. Accordingly, we need a model checker for nonlinear hybrid systems. There are none for continuous time nonlinear hybrid systems. However, by resorting to a discrete time model we can use CMurphi. We do this as usual by choosing a time step T small enough and replacing differential equations like (d x/dt) = f(x, u)‏ with x(t + 1) = x(t) + T*f(x(t), u(t))‏ In this way from the LHS model we get a CMurphi model for the BMS.

Constants for HS Model of the BMS const T : 0.01; -- Time step (in seconds) in DTHS model -- Sampling time (in seconds) of BM sampling & holding schema BM_SAMPLING_TIME : 10.0; -- This is TM in Section 9.2 -- Min time between disturbance variations (in seconds) D_SAMPLING_TIME : 10.0; -- This is TD in Section 9.4 MAX_VS : 100.0; -- max value of voltage from solar arrays MIN_VS : 0.0; -- min value of voltage from solar arrays MAX_LOAD : 10.0; -- max value of load (Ohm)‏ MIN_LOAD : 1.0; -- min value of load (Ohm)‏ MAX_C : 0.2; -- max value of battery capacitance (F)‏ MIN_C : 0.1; -- min value of battery capacitance (F)‏ MAX_RBAT : 0.05; -- max value of battery internal resistance (Ohm)‏ MIN_RBAT : 0.001; -- min value of battery internal resistance (Ohm)‏ Ic_min : -100; -- min allowed current in battery (A) (discharge)‏ Ic_max : 20; -- max allowed current in battery (A) (charge)‏ Vc_min : 1; -- min allowed voltage in battery (V)‏ Vc_nom : 50; -- nominal battery voltage (V)‏ Vc_max : 100; -- max allowed voltage in battery (V)‏ Rs : 4.0; -- internal resistance of solar array (Ohm)‏ -- max times (in seconds) for which safety requirements -- can be violated MAX_TIME_VC1_LARGE : 10.0 ; -- t_req1 in Section 9.11 MAX_TIME_VC1_SMALL : 10.0 ; -- t_req2 in Section 9.11 MAX_TIME_IC1_LARGE : 10.0 ; -- t_req3 in Section 9.11 MAX_TIME_IC1_SMALL : 10.0 ; -- t_req4 in Section 9.11

MAX_VS BM_SAMPLING_TIME (sec)‏ CPU Time (sec)‏ Reachable States Rules Fired Diameter Outcome 150 9 1216.16 502640 8042240 12641 No error found 150 10 117.47 149378 790048 3156 No error found 150 11 24.72 11119 176865 1038 Req2 of Section 7 Fails 200 9 1259.59 520083 8321328 12633 No error found 200 10 47.19 19366 309217 2034 Req1 of Section 7 Fails 200 11 25.01 11119 176865 1038 Req2 of Section 7 Fails Experimental results for BMS formal verification using CMurphi on a 2GHz Dual Core Linux Pentium PC with 2GB of RAM

Graphical representation of the counterexample generated for the configuration MAX_VS=200, BM_SAMPLING_TIME=10

Example 2Automatic Verification of a Turbogas Control SystemENEA Project ICARO Our goal is to verify the control system for a 2MW Co-generative Power Plant (ICARO). Verification consists in checking that the system under normal working conditions never reaches an undesired state. An undesired state is one in which the turbine rotation speed or the exhaust smokestemperature or the compressorpressure are out of range. Because of size and dynamics the system at hand cannot be handled using Hytech or UPPAAL. We succed in modeling and verifying it using CMurphi extended with finite precision real numbers.

Gas Turbine System Disturbances: electric users, param. var, etc Settings Fuel Valve Opening FG102 Controller Gas Turbine (Turbogas) Vrot, Texh, Pel, Pmc Vrot: Turbine Rotation speed Texh: Exhaust smokes Temperature Pel: Generated Electric Power Pmc: Compressor Pressure

Controller Vrot: Turbine Rotation speed Texh: Exhaust smokes Temperature Pel: Generated Electric Power Pmc: Compressor Pressure Vrot N1Gov MIN Offset Pel PowLim 12MW ADJ Limiter ExTLim Winner Texh Valve FG102 Opening Command Pmc

B A A B Cell i Kp Cell Output + - 10MW S X + -10MW SAT - Ki 1/s P X SAT >0? Reset at u + 4kW u = min( output N1Gov, output PowLim, output ExTLim) AND Winner name Winner != i?

S Cell i = “Power Limiter” A = 3000kW B = 10Mw P Power Limiter (PowLim)Electric Power Controller Pel Setpoint (+2MW) Output PowLim Pel Winner Vrot: Turbine Rotation speed Texh: Exhaust smokes Temperature Pel: Generated Electric Power Pmc: Compressor Pressure

N1 Governor (N1Gov)Turbine Rotation Speed Controller Vrot: Turbine Rotation speed Texh: Exhaust smokes Temperature Pel: Generated Electric Power Pmc: Compressor Pressure Accelleration 1/s 105% Deceleration Output N1 Governor network 6% + Pel X - S Cell i = “N1 Governor” A = 0 B = 10MW isle Kdr P Vrot Winner

Exhaust Temperature Limiter(ExTLim)Exhaust Smoke Temperature Controller Texh Cell i = “Exhaust Temperature Limiter” A = 0 B = 10MW P Pmc + S Offset Winner Output Exhaust Temperature Limiter Vrot: Turbine Rotation speed Texh: Exhaust smokes Temperature Pel: Generated Electric Power Pmc: Compressor Pressure

Gas Turbine Disturbances: el. users, par. var, etc. Texh Gas Turbine FG102 Vrot Pel Vrot: Turbine Rotation speed Texh: Exhaust smokes Temperature Pel: Generated Electric Power Pmc: Compressor Pressure

Modeling All subsystems are modeled as Finite State Automata (FSA). This implies: • Time is discrete. • State values range on finite precision real numbers (namely real(4, 2): 4 digit mantissa, 2 digit exponent). Going to discrete time brings in a sampling frequency F = 1/T. dx(t)/dt = f(x(t), u(t))  (x(t + 1) – x(t))/T = f(x(t), u(t))  x(t + 1) = x(t) + T*f(x(t), u(t))

Gas Turbine (as seen from Controller) Generated Electric Power: P(t + 1) = P(t) + (a1(P(t) – P0) + a2FG102(t) – a3u(t))T Smokes Temperature: Tf(t + 1) = Tf(t) + (b1(P(t) – P0) + b2FG102(t) – b3u(t))T Turbine Rotation Speed: V(t + 1) = V(t) + (c1(P(t) – P0) + c2FG102(t) – c3u(t))T User demand u(t + 1) = u(t) + MAX_D_U *ud (t)*T MAX_D_U = Max variation speed (time derivative) of user el. demand ud (t) = -1, 0, 1 (uncontrolled load disturbance) Coefficients a, b, c computed by fitting with plant log data.

Experimental Results Results on a INTEL Pentium 4, 2GHz Linux PC with 512 MB RAM. Murphi options: -b, -c, --cache, -m350

Fail trace: MAX_D_U = 2500 KW/sec 10 ms time step (100 Hz sampling frequency) Electric user demand (KW) Rotation speed (percentage of max = 22500 rpm) Allowed range for rotation speed: 40-120

Fail trace: MAX_D_U = 5000 Kw/sec 10 ms time step (100 Hz sampling frequency) Electric user demand (KW) Rotation speed (percentage of max = 22500 rpm) Allowed range for rotation speed: 40-120

Why does it work? Here we are interested in automatic verification of a control system in a neighborhood of its setpoint. A well designed controller keeps the whole system in a (small) neighborhood of the setpoint, thus, even if the system descritption can be fairly large, the size of the set of states that are reachable from the setpoint is small. An explicit model checker, like Murphi, can exploit this fact. Taking advantage of this fact, using a symbolic model checker may be hard. since the representation of the system transition relation can be so large that we may run out of memory even before starting the reachability analysis. Indeed this was our experience when we tried to use HyTech and SMV on our hybrid system verification problem.

From Hybrid System Verification to Explicit Model Checking • Finite Precision Real Numbers can be easily added to Murphi verifier. This allows easy modeling of hybrid systems with Murphi. • Nontrivial case studies presented: • Battery Manager System, • Automatic Verification of Turbogas Control System of a Co-generative Electric Power Plant (ICARO). • Our experimental results suggest that Murphi can be effectively used for automatic verification of Hybrid Control Systems.

Disk Based BFS • We present a disk based algorithm to delay State Explosion when using Explicit State Space Exploration to verify Finite State Systems.

Outline • We present a disk based verification algorithm exploits transition locality to decrease disk read accesses thus reducing the time overhead due to disk usage. • We present an implementation of our algorithm within the Murphi verifier. • We present experimental results showing that even using 1/10 of the RAM needed to complete verification, our disk based BFS algorithm is on average only 3 times slower than RAM Murphi with enough RAM to complete the verification task at hand. • We show that using our disk based Murphi on a Linux PC with 300MB of RAM, we can complete verification of a protocol with about 109 reachable states. This would require more than 5 GB of RAM using RAM Murphi.

K-transition iff level(s’) – level(s) = K Locality Transition k-local iff |level(s’) – level(s)| <= k -4 1 -3 0 1 1 0 -2 -1 1 0 1 1 -1 0 1 2 3 4

Locality Why locality is interesting? Our experimental results show that: For all protocol like systems, for most states, most transitions (typically more than 75%) are 1-local.

Exploiting Locality We show how locality can be used to reduce disk read accesses in the disk based BFS Explicit State Space Exploration algorithm presented in: U. Stern, D. Dill, Using Magnetic disk instead of main memory in the Murphi verifier, CAV 98 Exploiting locality we can typically speed up disk based BFS verification by a factor of 10. G. Della Penna, B. Intrigila, I. Melatti, E. Tronci, M. Zilli. Exploiting Transition Locality in Automatic Verification of Finite-State Concurrent Systems. Int. J. On: Software Tools For Technology Transfer (STTT). Vol. 6, N. 4, 2004.

1 2 3 4 5 6 7 8 9 10 Startstate()‏ We only use some of the disk table blocks (disk cloud) to remove old state signatures from M and to remove old states from Q_unck. Idea h M (recently visited states)‏ Old + New Insert()‏ (s, h)‏ /* Global Variables */ hash table M; file D; FIFO queue Q_ck; FIFO queue Q_unck; int disk_cloud_size; /* number of blocks to be read from file D */ Search()‏ Q_ck New (+ old)‏ BFS front Q_unck New + old States candidates to be on next BFS level RAM Search(){M = empty; D = empty; Q_ck = empty; Q_unck = empty; for each startstate s {Insert(s);} do /* search loop */{ while (Q_ck is not empty){ s = dequeue(Q_ck); for all s' in successors(s)‏ {Insert(s');} } Checktable(); } while (Q_ck is not empty);} Checktable()‏ Use D to filter out old states DISK Disk cloud D : Disk table with visited states

Insert()‏ Insert(state s) { h = hash(s); /* compute signature h of state s */ if (h is not in M) { insert h in M; enqueue((s, h), Q_unck); if (M is full) Checktable(); /* clean up when M is full */ } }

Chektable() Checktable() /* old/new check for main memory table */{ DiskCloud = GetDiskCloud(); /* Randomly choose indexes of disk blocks to read (disk cloud) */ Calibration_Required = QueryCalibration(); /* clean up M */ for each Block in D { if (Block is in DiskCloud or Calibration_Required) { for all state signatures h in Block { if (h is in M) { remove h from M;}}}} /* M now has only new states, … almost … because of D random sampling */ /* remove old states from state queue Q_unck and M and add new states to disk */ while (Q_unck is not empty) { (s, h) = dequeue(Q_unck); if (h is in M) {append h to D; remove h from M; enqueue(Q_ck, s);}}

Chektable() (continued)‏ remove all entries from M;/* reset hash table */ /* adjust as needed disk cloud size (i.e. number of disk table blocks used to remove old states) */ if (Calibration_Required) { if ( (there exists a state on disk that is not in the disk cloud) && ( there exists a state in M that is in the disk cloud or is on disk)‏ )‏ {Calibrate(deleted_in_cloud,deleted_not_in_cloud);} if (disk access rate has been too long above a given critical limit)‏ {reset disk cloud size to its initial value with given probability P;} } } /* Checktable() */

GetDiskcloud()‏ GetDiskCloud(){ Randomly select disk_cloud_size blocks from diskaccording to given probability; Return the indexes of the selected blocks; }

QueryCalibration()‏ QueryCalibration()‏ { returns true every 10 calls to QueryCalibration(); }

Calibrate()‏ Calibrate(deleted_in_cloud, deleted_not_in_cloud) { deleted_states = deleted_in_cloud + deleted_not_in_cloud; beta = deleted_not_in_cloud / deleted_states; if (beta is close to 1) /* Too many deleted state are not in cloud */ /* low disk cloud effectiveness: increase disk access rate */ { /* increase disk_cloud_size by a given percentage */ disk_cloud_size = (1 + speedup)*disk_cloud_size; } else if (beta is close to 0) /* Most deleted state are not in cloud */ /* high disk cloud effectiveness: decrease disk access rate */ { /* decrease disk_cloud_size by a given percentage */ disk_cloud_size = (1 - slowdown)*disk_cloud_size; }} /* we used: speedup = slowdown = 0.15 */

mu –b –c g = MaxQ/Reach Protocol By Di Reach States Rules Max Q Min Mem g T (sec)‏ ns 96 12 2,455,257 8,477,970 1,388,415 145,564,125 0.57 1,211 n_peterson 20 241 2,871,372 25,842,348 46,657 15,290,000 0.02 764 newlist6 7 32 91 3,619,556 21,612,905 140,382 22,590,004 0.04 1,641 ldash 144 72 8,939,558 112,808,653 509,751 118,101,934 0.06 12,352 sci 60 94 9,299,127 30,037,227 347,299 67,333,575 0.04 2,852 sci 31151 64 95 75, 081, 011 254,261,319 2,927,550 562,768,255 0.04 35,904 sci 31171 68 143 126,784,943 447,583,731 4,720,612 964,926,331 0.04 99,904 kerb 148 15 7,614,392 9,859,187 4,730,277 738,152,956 0.62 2,830 mcslock1 16 111 12,783,541 76,701,246 392,757 70,201,817 0.03 3,279 newlist6 8 40 110 81,271,421 563,937,480 2,875,471 521,375,945 0.03 31,114 Experimental Results with RAM Murphi

M Prot ns n_peterson newlist6 7 mcslock1 Sci 31121 Sci 31151 Sci 31171 kerb ldash newlist6 8 States 1.348 1.178 1.366 1.346 1.260 ------ ------ ------ 1.566 ------ 1 Rules 1.487 1.178 1.365 1.346 1.279 ------ ------ ------ 1.528 ------ Time 1.734 2.148 1.703 1.915 1.811 ------ ------ ------ 2.037 ------ States 1.405 1.124 1.335 1.550 1.189 1.169 1.130 1.282 1.668 1.416 0.5 Rules 2.011 1.124 1.334 1.550 1.206 1.195 1.152 1.060 1.626 1.412 Time 2.144 2.056 1.765 2.477 1.798 1.828 1.421 1.234 2.226 2.612 States 1.373 1.199 1.384 1.703 1.183 1.143 1.097 1.279 1.702 1.406 0.1 Rules 1.645 1.199 1.382 1.703 1.200 1.167 1.115 1.080 1.658 1.405 Time 1.953 2.783 2.791 5.259 2.888 2.553 1.743 1.438 3.770 4.436 Local Disk Murphi vs RAM Murphi M = <RAM used in LD Murphi>/<Min Mem needed for RAM Murphi>, Dummy = <Dummy LD Murphi>/<Dummy RAM Murphi>, Dummy = States, Rules, Time

M 1 0.5 0.1 Min 1.703 1.234 1.438 Avg 1.891 1.954 2.961 Max 2.148 2.612 5.259 Local Disk Murphi vs RAM Murphi (continued)‏ Time Statistics M = <RAM used in LD Murphi>/<Min Mem needed for RAM Murphi>

Mem Prot ns n_peterson newlist6 7 mcslock1 Sci 31121 ldash States 1.000 1.000 1.000 1.000 1.000 0.355 1 Rules 1.000 1.000 1.000 1.000 1.000 0.245 Time 1.259 2.623 1.331 1.821 1.616 > 50.660 States 1.000 1.000 1.000 1.000 0.361 ----- 0.5 Rules 1.000 1.000 1.000 1.000 0.647 ----- Time 242.131 2.430 1.357 1.691 > 11.863 ---- States 0.747 0.527 0.253 0.137 ----- ----- 0.1 Rules 0.309 0.507 0.203 0.115 ----- ----- Time > 77.895 > 90.704 >42.817 >11.605 ---- ---- Disk Murphi vs RAM Murphi Mem = <RAM used in LD Murphi>/<Min Mem needed for RAM Murphi>, Dummy = <Dummy LD Murphi>/<Dummy RAM Murphi>, Dummy = States, Rules, Time

Mem Prot ns n_peterson newlist6 7 mcslock1 Sci 31121 ldash 1 Time 0.726 1.221 0.781 0.950 0.892 > 24 0.5 Time 112.934 1.182 0.768 0.683 > 6 > 24 0.1 Time > 39 > 32 > 15 > 2 > 6 > 24 Mem Min Avg Max 1 Time 0.726 > 4.762 > 24 0.5 Time 0.683 > 24.261 112.934 0.1 Time > 2 > 19.667 > 39 Comparing LDMurphi with DMurphi Time Statistics Mem = <RAM used in disk Murphi (LD or D)>/<Min Mem needed for RAM Murphi> Time = <Time DMurphi>/<Time LDMurphi>

Prot Param Bytes Reach Rules Max Num State in Q mcslock2 N=4 16 945,950,806 3,783,803,224 30,091,508 Diam Time (sec)‏ Mem (MB)‏ (LDMurphi)‏ Hmem (Bytes)‏ (RAM Murphi)‏ Qmem (Bytes)‏ (RAM Murphi)‏ TotMem (Bytes)‏ (RAM Murphi)‏ 153 406,275 300 4,729,754,030 481,464,128 5,211,218,158 A Large Protocol Hmem = state-signature * reachable-states = 5*945,950,806 = 4,729,754,030 Bytes Qmem = state-bytes * Max-Queue = 16*30,091,508 = 481,464,128 Bytes TotMem = Hmem + Qmem = 4,729,754,030 + 481,464,128 = 5,211,218,158 Bytes

Enrico Tronci Computer Science Department - Sapienza University of Rome

Enrico Tronci Computer Science Department - Sapienza University of Rome

Presentation Transcript

University of Gujrat Department of Computer Science

“Sapienza” University of Rome Department of Ophthalmology Ocular Immunovirology Service

University of Rome La “Sapienza” INFOCOM Department

Rutgers University Computer Science Department

Department of Computer Science University of Virginia

University of Rome “Sapienza” Department of Thoracic Surgery

Sapienza University of Rome libraries serving:

Department of Computer Science, Princeton University

Alexandria University Faculty of Science Computer Science Department

Department of Computer Science, University of Sheffield

A. Gandini University of Rome “La Sapienza”

University of Pisa Department of Computer Science

Valentina Notarberardino “La Sapienza” University of Rome

Sapienza University of Rome a short presentation

Columbia University Department of Computer Science

Concordia University Department of Computer Science

G1 Group @ Physics Dept , Sapienza University of Rome

“Sapienza” University of Rome Department of Ophthalmology Ocular Immunovirology Service

Columbia University Department of Computer Science

Department of Computer Science, University of Sheffield