440 likes | 552 Views
Automatic Trust Management for Adaptive Survivable Systems Howard Shrobe MIT AI Lab March 2002 PI Meeting Hilton Head. Outline. Overall Framework Review of Diagnostic Process Review of Computational Vulnerability Analysis Rational Choice and Utility Functions
E N D
Automatic Trust ManagementforAdaptive Survivable Systems Howard Shrobe MIT AI LabMarch 2002 PI Meeting Hilton Head
Outline • Overall Framework • Review of Diagnostic Process • Review of Computational Vulnerability Analysis • Rational Choice and Utility Functions • Self-Checking and Certified Computation
Dopey Doc Grumpy Sleepy Motivating Example Performance expectations Grammar Center Grammar Speech Processing text Display Start Voice Capture utterance Gui Directives query Omnibase Display Generator response Integrity Constraint For example: “Show me Blue Platoon’s maneuvers leading up to Phase Line Orange”. Should provide a correct map of Blue Platoon’s actions in a reasonable amount of time. Integrity constraints check that the data is valid.
What We Expect and What We Do If Not • When the command is issued: • We expect that the results will be generated within reasonable time • We expect that intermediate data will pass integrity constraints built into the system • If these conditions do not obtain: • We diagnose what went wrong • We obtain updated estimates of how likely it is that each resource has been compromised • We pick a reasonable place from which to restart the computation (if the integrity constraints failed). • We pick a new way to do the task and/or a new allocation of resources to the components of the task that maximizes the likelihood of success • What this requires: • Self Checking • Diagnosis • Other monitoring of health status • Decision Theoretic Choice (and therefore Utility functions).
Adaptive Survivable Systems • Techniques that enable self-monitoring and diagnosis • Driven by representations of structure and purpose • The application knows the purposes of its components • The application checks that these are achieved • If these purposes are not achieved, the application localizes and characterize the failure • Techniques that enable application adaptation • The application achieve its purpose as well as possible within the available infrastructure by choosing alternatives. • Driven by models of Trust (informed by diagnosis and monitoring) • Driven by models of computational alternatives • It must have more than one way to effect each critical computation • It should choose an alternative approach if the first one failed • It should make its initial choices in light of the trust model
The Active Trust Management Architecture Self Adaptive Survivable Systems Trust Model: Trustworthiness Compromises Attacks Perpetual Analytical Monitoring Rational Decision Making Trend Templates System Models & Domain Architecture Other Information Sources: Intrusion Detectors Rational Resource Allocation
Conditional probability = .2 Normal: Delay: 2,4 Delayed: Delay 4,+inf Accelerated: Delay -inf,2 Normal: Probability 90% Parasite: Probability 9% Other: Probability 1% Conditional probability = .4 Conditional probability = .3 Has models Has models Node17 Component 1 Located On Diagnosis as Likely Mode Identification Multi-Mode Multi-Tiered Diagnosis • We model each component as having multiple modes (normal and abnormal) • Another level of detail shows the dependence of computations on underlying resources • Each resource has models of its state of compromise • The modes of the resource models are linked to the modes of the computational models by conditional probabilities • The model forms a bayesian network
An Example System Description N H Normal .6 .15 Peak .1 .80 Off Peak .3 .05 N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .8 .3 Slow .2 .7 A B C N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .60 .05 Slow .25 .45 Slower .15 .50 D E Host1 Host2 Host3 Host4 Normal .9 Hacked .1 Normal .85 Hacked .15 Normal .7 Hacked .3 Normal .8 Hacked .2
N H Normal .6 .15 Peak .1 .80 Off Peak .3 .05 N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .8 .3 Slow .2 .7 A B C N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .60 .05 Slow .25 .45 Slower .15 .50 D E Host1 Host2 Host3 Host4 Normal .9 Hacked .1 Normal .85 Hacked .15 Normal .7 Hacked .3 Normal .8 Hacked .2 The System Description includes a Bayesian Network • The Model can be viewed as a Two-Tiered Bayesian Network • Resources with modes • Computations with modes • Conditional probabilities linking the modes
N H Normal .6 .15 Peak .1 .80 Off Peak .3 .05 N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .8 .3 Slow .2 .7 A B C N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .60 .05 Slow .25 .45 Slower .15 .50 D E The system description includes a behavioral model • The Model can also be viewed as a behavioral model with multiple modes per device • Each model has behavioral description • The modes have posterior probabilities linked by conditional probabilities to the probabilities of the modes of the resources
N H Normal .6 .15 Peak .1 .80 Off Peak .3 .05 N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .8 .3 Slow .2 .7 Discrepancy Observed Here A B C Conflict: A = NORMAL B = NORMAL C = NORMAL N H Normal .50 .05 Fast .25 .45 Slow .25 .50 N H Normal .60 .05 Slow .25 .45 Slower .15 .50 Least Likely Member of Conflict Most Likely Alternative is SLOW D E Integrating model based and Bayesian reasoning • Start with each behavioral model in the “normal” state • Repeat: Check for Consistency of the current model • If inconsistent, • Add a new node to the Bayesian network • This node represents the logical-and of the nodes in the conflict. • It’s truth-value is pinned at FALSE. • Prune out all possible solutions which are a super-set of the conflict set. • Pick another set of models from the remaining solutions • If consistent, add to the set of possible diagnoses • Continue until all inconsistent sets of models are found • Solve the Bayesian network
Adding Attack Models • An Attack Model specifies the set of attacks that are believed to be possible in the environment • Each resource has a set of vulnerabilities • Vulnerabilities enable attacks on that resource • A successful attack exploits the vulnerability, putting the resource into a non-normal behavioral mode • This is given as a set of conditional probabilities • If the attack succeeded on a resource of this type then the likelihood that the resource is in mode-x is P • This now forms a three tiered Bayesian network Has- vulerability Enables Host1 Buffer-Overflow Overflow-Attack Resource-type Normal .5 Causes Unix-Family .7 Slow
What the diagnostic process tells us • All non-conflicting combination of models are possible diagnoses • The posterior probabilities tell you how likely each diagnosis is. • This guides recovery processing • Each mode of each resource has a posterior probability • This guides resource selection in the future • The attack models couple the resource models, given a system wide view. • This informs the trust model • This couples to long-term monitoring, that looks for complex multi-stage attacks
Computational Vulnerability Analysis • Grounding the attack model in systematic analysis • Ontology of: • System Properties • System Types • System Structure • Control and Dependencies
Generating Attack ModelsThrough Vulnerability Analysis • The problem: Where does the attack model and its links to behavioral modes come from? • So far, by hand crafting • Vulnerability Analysis supplants this by a systematic analysis: • Forming an ontology of how computer systems are structured • Building models of the environment • Network topology: nodes, routers, switches, filter, firewalls • System types: hardware, operating systems • Server and user suites: Which servers and users run where • Analyzing how properties depend on resources • Analyzing the vulnerabilities of the resources
Modeling System Structure File System files Part-of resources Operating System Access Controller Hardware controls Logon Controller User Set Processor Part-of Part-of Input-to Job Admitter Memory Device Controllers controls Scheduler Work Load controls Input-to Devices Device Drivers Scheduler Policy Resides-In controls
Modeling the topology Machine name: sleepy OS Type: Windows-NT Server Suite: IIS….. User Authentication Pool: Dwarfs… Switch: subnet restrictions. …. Switch: subnet restrictions. …. Router: Enclave restrictions. …. Topology tells you: who can share (and sniff) which packets who can affect what types of connections to whom
Doc Life The AI Lab Topology (partial) Router Access pool Netchex Router Netchex Filters out Telnet. Server Switch 8th-Floor-1 8th-Floor-2 7th-Floor-1 Sakharov Wilson Lisp Access Pool Dwarf Access Pool Server Access Pool Truman Dopey Kenmore Sleepy Maytag Quincy-Adams General Access Pool Sneezy Creepy Crawler Jefferson
Modeling Dependencies • Start with the desirable properties of systems: • Reliable performance • Privacy of communications • Integrity and/or privacy of data • Analyze which system components impact those properties • Performance - scheduler • Privacy - access-controller • To affect a desirable property control a component that contributes to the delivery of that property
Controlling components (1) • One way to gain control of a component is to directly exploit a known vulnerability • One way to control a Microsoft IIS web server is to use a buffer overflow attack on it. IIS Web Server Process IIS Web Server Is vulnerable to Takes control of Buffer-Overflow Attack Buffer-Overflow Attack
Control- action Typical User Process Obtaining Access (1) • One way to gain access to an operation on an object is to find a process with an adequate capability and take control of the process Typical User File Typical User File Required for Read To Read User Read Typical User Process Posseses Capability User Read
An Example • Affecting reliable performance: • Control the scheduler - • The scheduler is a component that impacts performance • By modifying the scheduler’s policy parameters • The policy parameters are inputs to the scheduler • By gaining root access • The policy parameters require root access for writing • By using a buffer overflow attack on the web-server • The web-server process possesses root capabilities • The web-server process is vulnerable to a buffer-overflow attack. • For this attack to impact the performance all the actions must succeed • Each has an a priori probability based on its inherent difficulty and current evidence suggesting that it occurred.
Using Attack Scenarios • This information is captured in an Object-Oriented knowledge representation and rule-base system that reasons with it. • The inference process develops multi-stage attack scenarios • The scenarios are transformed into Trend Templates for recognition purposes • The scenarios are transformed into Bayesian network fragment for diagnostic purposes • The Bayesian network fragments in both cases update the posterior probabilities of the Trust Model
Attack Models and Monitoring Trust Model: Trustworthiness Compromises Attacks
Rational Choice and Utility Functions How do we use the trust model to select resources? Jon Doyle & Mike McGreachie
Making Choices and Utility Functions • Utility functions are used to assign a numerical value to a particular way of doing a task. • Utility functions are not a natural way for people to express themselves • What people can state easily is their preferences • In particular they can compare some combinations of variables to others, every else being equal • A typical set of preference statements: • I prefer convenience of use to high security if I’m not under attack. • I prefer high security to convenience if I’m under attack • The trick is to convert a set of such “ceteris parabus” preferences into a numerical utility function.
How to build a Utility Function, Step 1 • Convert preferences into an intermediate form using bit-vectors of boolean variables: Suppose we have f2 & -f4 > f3 (f1 being equal) This expresses preferences for bit vectors consistent with f2 & -f4 & -f3 (bit-vector *100) over those satisfying -(f2 & -f4) & f3 = (-f2 & f3) or (f4 & f3) We extend each disjunct for both value of the missing variable: (-f2 & f3 & f4), (-f2 & f3 & -f4) (-f2 & f3 & f4), (f2 & f3 & f4) giving three bit vectors: *111, *011, *010. Form three rules: *100 > *111, *100 > *011, *100 > *010
Building Utility Functions, Step 2 • Define a Directed-Graph representing the rules: Nodes in the graph are complete bit-vectors over all variables (grows exponentially!) Edges connect any two nodes indicated by the intermediate form of the preference rules. Source nodes are preferred to sink nodes. A node M is preferred to N if and only if there is a path in the graph from M to N Four simple utility functions are consistent with the rules: Minimizing longest outgoing path length Descendant number of descendant Maximizing longest incoming path length Topological rank in topological sort order
Building Utility Function: Step 3 • Decompose into smaller domains: • Utility Independence: when the utility of some features take on values independent of the utility of other features. i.e. if m1 and m2 use only features from Si and m1 > m2 when we add some features from Sj to both then m1 > m2 when we add some other features from Sj to both. • Partition the features into subsets so that each subset is Utility Independent of its complement • Start with singleton sets, then merge together two sets that are not Utility Independent. • Rules tell us when not UI • For example if I prefer convenience to security when not under attack but vice-versa if under attack, then the set {convenience, security} is dependent on “under attack”
Building Utility Function, Step 4 • The total utility function is a linear weighted sum of sub-utility functions one for each subset in the partition (Keeney & Raifa, 1976). • Need to generate the sub-utility functions and the scaling parameters. • Restrict the preference rules to each subset and create the preference graph for that subset. • However, when rules are restricted to a subset they may generate cycles even though they are globally consistent. • *01* > *10* and **01 > **10 is consistent but when restricted to feature 3 get both 1 > 0 and 0 > 1 ! • So remove rules from each set to break the cycles
Building Utility Functions, Step 5 • Choose the rules to remove by solving a big SAT problem with 2 types of terms: • First type of term says that a preference rule must be consistent with at least one sub-utility function • Second type says that at least one rule in each conflicting set must be removed • Use the solution to remove a rule from each conflict, then construct the subset’s graph without that rule. • Use the minimizing utility function for this subset. • Set the scaling factors for the sub-utility function by solving a set of linear inequalities. For each rule with left side L and right side R add the constraint: Sum(i) ti ui(L) > Sum(i) ti ui(R) for sub-domains i that intersect r.
Self-Checking and Certified ComputationHow does a computation detect that the wrong thing happened?How does it prove that the right thing happened? Konstantine Arkoudas
Plans and Computations • Associated with every computation is a “Plan” • The plan is an abstract description, providing: • Decomposition into components • Data and control flow relationships between these • Pre, Post and Maintain conditions for each component • Dependency links saying how the pre-conditions of each component and the main-goals of the computation are satisfied by the post-conditions of other components. • We want to check that these conditions are in fact satisfied as the computation proceeds. • If they’re not satisfied we should stop and diagnose • If the computation completes, we would like a proof that the right thing happened.
DPL’s: Denotation Proof Languages • A combined calculus of deductions and computations • Deductions are evaluated with respect to an Assumption Base which is a dynamically maintained piece of state. • The value of a Deduction is a proposition that is true in the Assumption Base. • The Primitive Deductions are the basic rules of a standard natural deduction system. • In addition, there is (assume p D) which temporarily adds p to the assumption base then evaluates D and returns the result then restores the assumption base.
Duality in DPLs • There is an operator phi, dual to lambda, for building deductive methods. Lambda builds functions, phi builds methods. • Expressions apply functions to arguments, Deduction !apply methods to arguments. • Both methods and functions can be higher-order and recursive. • Deductions and evaluations are distinguished on syntactic grounds, there is no need for a static type system to keep them apart (although the system could have a type system). • DPL’s are to scheme as HOL is ML
(define-method uncurry (premise) (dmatch premise ((-> ?P1 (-> ?P2 ?P3)) (Assume (and ?P1 ?P2) (?P1 by !(left-and (and ?P1 ?P2))) ((-> ?P2 ?P3) by !(modus-ponens premise ?P1)) (?P2 by !(right-and (and ?P1 ?P2))) (?P3 by !(modus-ponens (->?P2 ?P3) ?P2)))))) (define-method dn* (premise) (dmatch premise ((not (not ?P)) (?P by !(dn premise)) (dn* ?P)) (t !(claim premise))))
Combining the Plan and the Computation • DPL’s can freely intermix computation and deduction • We can adopt a style in which each computational step is annotated: • Before we begin we assert the pre-conditions of the overall computation (after checking that they hold). • Before each step we add the deduction that the pre-condition is satisfied for the reasons specified in the plan • After each step we add the claim that the post-conditions are satisfied (after checking that they are). • At the end of the computation we add the deduction that the overall post-conditions are satisfied for the reasons stated in the plan • Each proof step is done only for the specific arguments in this particular computation (not a general proof). • But the computation will error if the proof doesn’t hold in this case.
Certificates • There is a much simpler logic contained within this calculus • It has just the primitive methods and sequencing • Each application of a method to arguments with respect to the assumption base either returns a proposition that is implied by the assumption base or it errors. • A proof is a sequence of such applications • The proof checker is just the interpreter for this simple calculus. This is a very small TCB. • The result of every deduction is the full calculus can be justified by a proof expressed in this simpler logic • An extended interpreter captures and bundles up the primitive methods used in an extended language deduction. • Alternatively, use a Truth Maintenance System. • This proof in the simpler logic is a certificate that can be delivered with the results of the computation. • The certificate can be mechanically checked by a very simple interpreter.
Comparison to PCC • There is an obvious similarity to Proof Carrying Code • Both deliver a computation bundled with a proof • In Certified Computation: • The Proof is generated on the fly • The Proof applies only to the specific results computed • The Proof discovery is guided by the computation and is free of search. • The computation errors before completing if the proof wouldn’t hold • Formalism doesn’t depend on type system • Idea was used in Credible Compiler in which optimizers justify the optimization using on the fly certificates.
Summary • DPL’s are used to check computations, justify them when they work, signal conditions when things go wrong. • Diagnosis figures out what might be wrong, updates the trust model. • Vulnerability analysis produces attack models for diagnosis and trend templates for long-term monitoring. • Long-term monitoring collates alerts, updates the trust model • Rational choice chooses the best way to do the task (or retry the task) given the current trust model. • Utility functions can be generated automatically from preferences to support rational choice.