1 / 71

CS 541: Artificial Intelligence

CS 541: Artificial Intelligence. Lecture VI: Modeling Uncertainty and Bayesian Network. Schedule. CA: Yizhou Lin Email Address : ylin8@stevens.edu. Re-cap of Lecture IV – Part I. Inference in First-order logic: Reducing first-order inference to propositional inference Unification

kishi
Download Presentation

CS 541: Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 541: Artificial Intelligence Lecture VI: Modeling Uncertainty and Bayesian Network

  2. Schedule • CA: Yizhou Lin • Email Address: ylin8@stevens.edu

  3. Re-cap of Lecture IV – Part I • Inference in First-order logic: • Reducing first-order inference to propositional inference • Unification • Generalized Modus Ponens • Forward and backward chaining • Logic programming • Resolution: rules to transform to CNF

  4. Modeling Uncertainty Lecture VI—Part I

  5. Outline • Uncertainty • Probability • Syntax and Semantics • Inference • Independence and Bayes’ Rule

  6. Uncertainty

  7. Uncertainty • Let action At=leave for airport t minutes before flight • Will At get me there on time? • Problems: • 1) Partial observability (road state, other drivers' plans, etc.) • 2) Noisy sensors (KCBS traffic reports) • 3) Uncertainty in action outcomes (flat tire, etc.) • 4) Immense complexity of modelling and predicting traffic • Hence a purely logical approach either • 1) Risks falsehood: " A25 will get me there on time" • 2) Leads to conclusions that are too weak for decision making: A25will get me there on time if there's no accident on the bridge, and it doesn't rain and my tires remain intact etc etc." • A1440might reasonably be said to get me there on time but I'd have to stay overnight in the airport …

  8. Methods for handling uncertainty • Default or nonmonotonic logic: • Assume my car does not have a flat tire • Assume A25 works unless contradicted by evidence • Issues: What assumptions are reasonable? How to handle contradiction? • Rules with fudge factors: • Issues: Problems with combination, e.g., Sprinkler causes Rain?? • Probability • Given the available evidence, A25 will get me there on time with probability 0.04 • Mahaviracarya (9th Centuary.), Cardamo (1565) theory of gambling • Fuzzy logic handles degree of truth NOT uncertainty e.g., • WetGrass is true to degree 0.2

  9. Probability

  10. Probability • Probabilistic assertions summarizeeffects of • Laziness: failure to enumerate exceptions, qualifications, etc. • Ignorance: lack of relevant facts, initial conditions, etc. • Subjective or Bayesian probability: • Probabilities relate propositions to one's own state of knowledge • E.g., • These are not claims of a "probabilistic tendency" in the current situation • (but might be learned from past experience of similar situations) • Probabilities of propositions change with new evidence: • E.g., • (Analogous to logical entailment status , not truth.)

  11. Making decisions under uncertainty • Suppose I believe the following: • Which action to choose? • Depends on my preferences for missing flight vs. airport cuisine, etc. • Utility theory is used to represent and infer preferences • Decision theory = utility theory + probability theory

  12. Probability basics • Begin with a set -- the sample space • E.g., 6 possible rolls of a die. • is a sample point/possible world/atomic event • A probability space or probability model is a sample space with an assignment for every s.t. • , e.g., . • An eventA is any subset of • E.g.,

  13. Random variables • A random variable is a function from sample points to some range, e.g., the reals or Booleans • E.g., . • P induces a probability distribution for any r.v. X: • e.g.,

  14. Propositions • Think of a proposition as the event (set of sample points), where the proposition is true • Given Boolean random variables A and B: • event = set of sample points where • event = set of sample points where • event = points where and • Often in AI applications, the sample points are defined by the values of a set of random variables, • I.e., the sample space is the Cartesian product of the ranges of the variables • With Boolean variables, sample point = propositional logic model • E.g., , , or . • Proposition = disjunction of atomic events in which it is true • E.g.,

  15. Why use probability • The definitions imply that certain logically related events must have related probabilities • E.g., • de Finetti (1931): an agent who bets according to probabilities that violate these axioms can be forced to bet so as to lose money regardless of outcome.

  16. Syntax for propositions

  17. Syntax for propositions • Propositional or Boolean random variables • E.g., (do I have a cavity?) • is a proposition, also written • Discrete random variables (finiteor infinite) • E.g., is one of • is a proposition • Values must be exhaustive and mutually exclusive • Continuous random variables (bounded or unbounded) • E.g., ; also allow, e.g., . • Arbitrary Boolean combinations of basic propositions

  18. Prior probability • Prior or unconditional probabilities of propositions • E.g., and correspond to belief prior to arrival of any (new) evidence • Probability distribution gives values for all possible assignments: • (normalized, i.e., sums to 1) • Joint probability distribution for a set of r.v.s gives the probability of every atomic event on those r.v.s (i.e., every sample point) • = a matrix of values: • Every question about a domain can be answered by the joint distribution because every event is a sum of sample points

  19. Probability for continuous variables • Express distribution as a parameterized function of value: • = uniform density between and • Here is a density; integrates to . • I.e.,really means

  20. Gaussian density

  21. Conditional probability • Conditional or posterior probabilities • E.g., • i.e., given that toothache is all I know • NOT"if then 80% chance of " • Notation for conditional distributions: • = 2-element vector of 2-element vectors • If we know more, e.g., is also given, then we have • Note: the less specific belief remains valid after more evidence arrives, but is not always useful • New evidence may be irrelevant, allowing simplification, e.g., • This kind of inference, sanctioned by domain knowledge, is crucial

  22. Conditional probability • Definition of conditional probability: • Product rule gives an alternative formulation: • A general version holds for whole distributions, e.g., • (View as a set of equations, not matrix mult.) • Chain rule is derived by successive application of product rule:

  23. Inference

  24. Inference by enumeration • Start with the joint distribution: • For any proposition , sum the atomic events where it is true:

  25. Inference by enumeration • Start with the joint distribution: • For any proposition , sum the atomic events where it is true:

  26. Inference by enumeration • Start with the joint distribution: • For any proposition , sum the atomic events where it is true:

  27. Normalization • Start with the joint distribution: • General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables

  28. Inference by enumeration • Let be all the variables. Typically, we want • The posterior joint distribution of the query variables , given specific values for the evidence variables • Let the hidden variables be • Then the required summation of joint entries is done by summing out the hidden variables: • The terms in the summation are joint entries because , , and together exhaust the set of random variables • Obvious problems: • 1) Worst-case time complexity where is the largest arity • 2) Space complexity to store the joint distribution • 3) How to find the numbers for entries???

  29. Independence

  30. Independence • A and B are independent iff • or or • 16 entries reduced to 12; for independent biased coins, • Absolute independence powerful but rare • Dentistry is a large field with hundreds of variables, but none of which are independent. What to do?

  31. Conditional independence • has independent entries • If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: • (1) • The same independence holds if I haven't got a cavity: • (2) • is conditionally independent of given : • Equivalent statements:

  32. Conditional independence • Write out full joint distribution using chain rule: • I.e., 2+2+1 = 5 independent numbers (equations 1 and 2 remove 2) • In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in to linear in . • Conditional independence is our most basic and robust form of knowledge about uncertain environments.

  33. Bayes’ Rule

  34. Bayes’ Rule • Product rule Bayes' rule • Or in distribution form • Useful for assessing diagnostic probability from causal probability: • E.g., let be meningitis, be stiff neck: • Note: posterior probability of meningitis is still very small!

  35. Bayes' Rule and conditional independence • This is an example of a naive Bayes model: • Total number of parameters is linear in

  36. Wumpus world • iff [, ] contains a pit • iff [, ] is breezy • Include only in the probability model

  37. Specifying the probability model • The full joint distribution is • Apply product rule: • Do it this way to get . • First term: if pits are adjacent to breezes, otherwise • Second term: pits are placed randomly, probability 0.2 per square: for pits.

  38. Observation and query • We know the following facts: • Query is • Define s other than and • For inference by enumeration, we have • Grows exponentially with number of squares!

  39. Using conditional independence • Basic insight: observations are conditionally independent of other hidden squares given neighboring hidden squares • Define • Manipulate query into a form where we can use this

  40. Using conditional independence

  41. Using conditional independence

  42. Summary • Probability is a rigorous formalism for uncertain knowledge • Joint probability distribution species probability of every atomic event • Queries can be answered by summing over atomic events • For nontrivial domains, we must find a way to reduce the joint size • Independence and conditional independence provide the tools

  43. Bayesian Network Lecture VI—Part II

  44. Outline • Syntax • Semantics • Parameterized distributions

  45. Bayesian networks • A simple, graphical notation for conditional independence assertions, and hence for compact specification of full joint distributions • Syntax: • A set of nodes, one per variable • A directed, acyclic graph (link"directly influences") • A conditional distribution for each node given its parents: • In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over for each combination of parent values

  46. Examples • Topology of network encodes conditional independence assertions: • is independent of the other variables • and are conditionally independent given Cavity

  47. Example • I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set by minor earthquakes. Is there a burglar? • Variables: , , , , • Network topology reflects "causal" knowledge: • A burglar can set the alarm off • An earthquake can set the alarm off • The alarm can cause Mary to call • The alarm can cause John to call

  48. Example

  49. Compactness • A CPT for Boolean with Boolean parents has rows for the combinations of parent values • Each row requires one number for • The number for is just • If each variable has no more than parents, • the complete network requires numbers • I.e., grows linearly with , vs. for the full joint distribution • For burglary net, numbers (vs. )

  50. Compactness • Global semantics define the full joint distribution as the product of the local conditional distributions: • e.g., • =

More Related