Reasoning Under Uncertainty

Reasoning Under Uncertainty Artificial Intelligence Chapter 9

Part 1 Probability

Uncertainty • Hard for an agent to handle • Usually handled statistically • Statistical reasoning requires knowledge of probability

Uncertainty We want to build reasoning systems that deal with uncertainty by using the laws of probability theory to enable them to reach rational decisions even when there is not enough information to know for certain what the right decision should be.

Uncertainty We can’t usually handle uncertainty by means of predicate logic: • Computational complexity: Listing all the antecedents and consequences in the agent's environment is too big a job • Lack of domain knowledge: We can’t list every possibility because we do not know enough about the domain • Pragmatic factors: Use of predicate logic may be impractical for other reasons

Probability • Probability deals with uncertainty by assigning a numerical value to something • Range of values: real number between 0 and 1 • Sometime thought of as a bet - what are the odds? • Typically concerned with: • Evidence, or probability of observing some particular piece of evidence • Hypothesis, or probability that some hypothesis is true

Probability Probability is just a number indicating our belief in a statement. It is often taken to be our degree of willingness to bet on a particular outcome. It varies between 0 (no belief) and 1(absolute confidence).

Axioms of probability • For any events A, B • 0 ≤ P(A) ≤ 1 • P(Ω) = 1 and P(φ) = 0 • P(A ∨ B) = P(A) + P(B) - P(A ∧ B)

Prior Probability P(H) is the a priori probability that a specified hypothesis is true. This is often called the prior probability, or just the prior. This is the unconditional probability, without taking any evidence into consideration.

Prior Probability Example: Assume that public health experts have examined 1000,000 people at random, and found that 10 of them have measles. Now we pick a person at random; what is the a priori probability - without any evidence of any kind - that the person has the measles?

Prior Probability P(H) = 0.0001, where H = “has the measles” The a priori probability of someone having the measles is 0.0001

Prior Probability Note that probability only measures our degree of belief in a given proposition. We use probability due to our uncertainty, our lack of knowledge. A particular person always either has the measles or doesn’t. If we knew, we wouldn’t use probability.

Prior Probability What is P(H)? That is, what is the a priori probability of someone not having the measles? P(H) = 1 - P(H) P(H) = 1 - 0.0001 = 0.999

Posterior Probability P(H|E) is the probability that hypothesis H is true given that evidence E is observed. We sometimes read this as, “ the probability of H being true in light of evidence E”. P(H|E) is called the a posteriori probability, or the posterior probability, of H.

Posterior Probability Example: Suppose we see red spots (E) on a person’s body, and we consider the hypothesis that they have the measles (H). Does the presence of the evidence change the probability that they have the measles?

Posterior Probability If the evidence is relevant to the hypothesis, then yes, we would expect it to change the probability that the hypothesis is true. It’s just common sense that if you observe someone who has red spots on the body, the probability of that person having measles is greater than if you just picked someone at random.

Posterior Probability Red spots don’t always indicate measles (they could be due to chicken pox, or something else). Medical surveys show that: P(H|E) = 0.6 The posterior probability that hypothesis H (has measles) is true given that evidence E (red spots) is observed is 0.6.

Posterior Probability When would evidence not increase the probability that a hypothesis is true? • When the probability of the hypothesis is already 1.0. • When the evidence is irrelevant to the hypothesis.

Conditional Probability P(E|H) is the probability that evidence E will be observed assuming that hypothesis H is true. We sometimes read this as, “ the probability that hypothesis H being true will result in evidence E”.

Conditional Probability Example: Assume that a person has the measles; what is the probability that he will exhibit red spots on the body?

Conditional Probability Most people who have the measles eventually get red spots on the body, but not everyone does, and the red spots take a little while to show up after you get the measles. Medical studies show that: P(E|H) = .80 The probability that you will observe a specific piece of evidence (red spots), given a particular hypothesis (that you have the measles), is 0.80.

Conditional Probability P(E|H) is the probability that evidence E will be observed even if hypothesis H is not true. What is the probability that someone will have red spots if they don’t have the measles? Someone who doesn’t have the measles may still have red spots (due to chicken pox, etc.). The experts tell us that: P(E|H) = 0.10

Joint Probability What are all the possible combinations of symptoms? 2 symptoms: fever and red spots 3 symptoms: fever, red spots, malaise 4 symptoms: fever, red spots, malaise, depressed white blood cell count, etc. In general, it is not practical to define all 2n entries for the n Boolean variables - the full joint probability.

Joint Probability P (H  E) is the probability that both H is true and E is observed. This is called the joint probability of H and E.

Joint Probability We can express the conditional probability, P(H|E) (probability that hypothesis H is true given that evidence E is observed) in terms of the joint probability P(H  E), (probability that both H is true and E is observed).

Joint Probability P(H|E) = P(H  E) P(E) This states that the probability that hypothesis H is true if evidence E is observed is equal to the joint probability of H and E, divided by the prior probability of E. Suppose that we observe 10 people who have red spots (E), and, out of those, 6 have measles (H  E). Then: P(H|E) = 0.6

Joint Probability P(H|E) = P(H  E) P(E) and similarly P(E|H) = P(E  H) P(H) Multiplying by P(H) we get P(E  H) = P(E|H) P(H)

Product rule Product Rule: P(EH) = P(E|H)P(H) = P(H|E)P(E) The probability of A and B being true is equal to the probability of A being true if we observe B, times the probability that B is true. The probability of A and B being true is equal to the probability of B being true if we observe A, times the probability that A is true.

Deriving Bayes’s Rule P(E  H) = P(E|H) P(H) (from before) Since joint probability is commutative, we have P(H  E) = P(E  H) and therefore P(H  E) = P(E|H) P(H) Substituting that into P(H|E) = P(H  E) P(E) gives us P(H|E) = P(E|H) P(H) (Bayes’s rule) P(E)

Conditional Probability Bayes' Rule: P(H|E) = P(E|H) P(H) P(E) This says that the probability that the hypothesis H is true, given that we observe some evidence E, is equal to the probability that we will observe some evidence E if the hypothesis H is true, times the a priori probability that H is true, all divided by the a priori probability that we will observe evidence E.

Bayes’s Rule P(E) = P(E|H)P(H) + P(E|H)P( H) The a priori probability of observing some evidence E is equal to the probability that we will observe some evidence E if the hypothesis H is true, times the a priori probability that H is true, plus the probability of observing E if the hypothesis is not true, times the a priori probability that H is not true.

Conditional Probability Bayes‘s Rule: And this can be generalized to cover situations where there are multiple possible hypotheses.

Why use Bayes’s Rule? Knowledge of causes: P (red spots | measles) is often easier to obtain than diagnostic knowledge: P (measles | red spots). Bayes’s Rule lets us use knowledge of causes to derive diagnostic knowledge.

Conditional independence But often we must compute the full joint probability of many competing hypotheses and many symptoms - computationally expensive. Instead, we make the assumption of conditional independence. X and Y are conditionally independent given Z iff: P(X,Y | Z) = P(X|Z) P(Y|Z)

Conditional independence Assuming conditional independence allows us to avoid computing the full joint probability. Assuming conditional independence is more often justifiable than assuming complete independence.

Conditional independence Assume red spots and fever are conditionally independent given measles: P(red spots, fever | measles) = P(red spots | measles) P (fever | measles) Then we can deduce: P(red spots, fever, measles) = P(red spots, fever | measles) P(measles) = P(red spots | measles) P(fever | measles) P(measles)

Reasoning Under Uncertainty