230 likes | 252 Views
Explore how to calculate conditional probability, its applications, and the concept of independence in probability theory. Learn through examples and real-life scenarios. Dive into sensitivity, specificity & predictive value in medical tests.
E N D
Chapter 3. Conditional Probability and IndependenceSection 3.1. Conditional ProbabilitySection 3.2 Independence Jiaping Wang Department of Mathematical Science 01/28/2013, Monday
Outline Why Conditional Probability Definition of Conditional Probability Application of Conditional Probability Independence Applications of Independence
Example A common summary of these data is the “unemployment rate”, which is 5511/125133 = 4.4%. However, this rate doesn’t tell us anything about the association between unemployment and education. So we are interested in finding another probability, called conditional probability.
Continue Now we compute the conditional probability based on the education for the unemployment rate. From the following table, we can find when the education level increases, the unemployment rate decreases.
Example 3.1 Projected percentage of workers in the labor force for 2014 are shown in table. How do the relative frequencies for the four ethnic groups compare between women and men? If we assume the population size = n, then the white men has 43%*n/(53%*n)=43/53=81%, so similar for other relative frequencies.
Reduced Sample Space Another illustration: Consider the probability of a family with two girls, the sample space is S={(boy,boy),(boy,girl), (girl,boy), (girl, girl)}. So the P(Two girls)=1/4. Now, if we are told that the family has at least one girl, what is the probability that the family has two girl? At this time, the sample space becomes Sr={(boy, girl), (girl, boy), (girl, girl)}, so the P(two girls|at least one girl)=1/3. If based on sample space S and assume A={two girls}, B={ at least one girls}, then P(two girls | at least one girl)=P(A|B)=P(A∩B)/P(B)=1/3/3/4=1/3.
Definition 3.1 If A and B are any two events, then the conditional probability of A given B, denoted as P(A|B), is Provided that P(B)>0. Notice that P(A∩B) = P(A|B)P(B) or P(A∩B) = P(B|A)P(A). This definition also follows the three axioms of probability. A∩B is a subset of B, so P(A∩B )≤P(B), then 0≤P(A|B)≤1; P(S|B)=P(S∩B)/P(B)=P(B)/P(B)=1; If A1, A2, …, are mutually exclusively, then so are A1∩B, A2 ∩B, …; and P(UAi|B) = P((UAi) ∩B)/P(B)=P(U(Ai ∩B)/P(B)=∑P(Ai ∩B)/P(B)= ∑P(Ai|B).
Example 3.2 There are four batteries and one is defective. Two are to be selected at random for use on a particular day. Find the probability that the second battery selected is not defective, given that the first was not defective. Solution: Let N1 denote the event that 1st battery Selected is non-defective, N2 denote that 2nd battery Selected is non-defective. Also we assume the 1st is defective. So we are interested in P(N2|N1)=P(N1∩N2)/P(N1). From the left figure, we can find P(N1)=3/4, and P(N1 ∩N2)=6/12=1/2, then P(N2|N1)=1/2*4/3=2/3.
Screen Test A screen test indicates the presence or absence of a particular disease. There are two different kinds of errors: False Positive: The test indicates a person has disease when he/she actually does not; False Negative: The test indicates a person has no disease when he/she actually does have it. Sensitivity: the probability that a person selected randomly from among those who have the disease will have a positive test. Specificity: the probability that a person selected randomly from among those who do not have the disease will have a negative test.
Continue True Diagnosis The + indicates the presence of the disease under study; The – indicates the absence of the disease under study. The sensitivity = a/(a+c), the specificity = d/(b+d). Predictive value is the conditional probability that a randomly selected person actually has the disease, given that he/she tested positive: predictive value=a/(a+d). A good test should have a high predictive value, but not always possible, which is affected by the prevalence value: the proportion of the population under study that actually has the disease. Prevalence value=(a+c)/n
Example 3.3 Nucleic acid amplification tests (NAATs) are generally agreed to be better than non-NAATs for diagnosing the presence of Chlamydia trachomatis, the most prevalence sexually transmitted disease. The ligase chain reaction (LCR) test is one such test. In a large study, the sensitivity and specificity of LCR for women were assessed. Following are the results: LCR Tissue Culture
Example 3.3 Continue Assuming that the tissue culture is exact and that the women in the study constitute a random sample of women in the United States, answer the following questions: What is the prevalence of Chalmydiatrachomatis? What is the sensitivity of LCR? What is the specificity of LCR? What is the predictive value of LCR? Solutions: a. prevalence = 152/2132 b. sensitivity = 139/252 c. specificity = 1896/1980 d. predictive value=139/223
Definition 3.2 and Theorem 3.2 Definition 3.2: Two events A and B are said to be independent if P(A∩B)=P(A)P(B). This is equivalent to stating that P(A|B)=P(A), P(B|A)=P(B) If the conditional probability exist. Theorem 3.2: Multiplicative Rule. If A and B are any two events, then P(A∩B) = P(A)P(B|A) = P(B)P(A|B) If A and B are independent, then P(A∩B) = P(A)P(B).
Example 3.4 Suppose that a foreman must select one worker from a pool of four available workers (numbered from 1 to 4) for a special job. He selects the worker by mixing the four names and randomly selecting one. Let A denote the event that worker 1 or 2 is selected, let B denote the event that worker 1 or 3 is selected, and let C denote the event that worker 1 is selected. Are A and B independent? Are A and C independent? Solutions: S={1,2,3,4}, A={1,2}, B={1,3}, C={1}. By assumption that assigns ¼ to each individual worker, P(A)=1/2, P(B)=1/2, P(C) = ¼, P(A∩B)=1/4, so we have P(A)P(B)=1/2*1/2=1/4 = P(A ∩B), thus A and B are independent; P(A∩C)=1/4 ≠ P(A)P(C), so A and C are not independent.
Genetics Application A unit of inheritance is a gene, which transmits chemical information that is expressed as a trait such as color or size. Two genes for each trait are present in each individual, called alleles. These two allelic genes in any one individual may be likely(homozygous) or different (heterzygous). When two individuals mate, each parent contributes one of his/her genes from each allele. A simplest model, the probability of each gene from an allele being passed to the offspring is ½ and the two parents contribute alleles independently of each other.
Example 3.5 Blood type, the best known of the blood factors, is determined by a single allele. Each person has blood type A, B, AB or O. Type O represents the absence of a factor and is recessive to factors A and B. Thus a person with type A blood may be either homozygous (AA) or heterozygous(AO) for this allele; similarly, a person with type B blood may be either homozygous (BB) or heterozygous (BO). Type AB occurs if a person is given an A factor by a parent and a B factor by the other parent. To have type O blood, an individual must be homozygous O (OO). Suppose a couple is preparing to have a child. One parent has blood type AB, and the other is heterozygous B. What is the possible blood types that the child will have and what is the probability of each? Solutions: there are three possible types: AB, B and A with Probabilities ¼, ½ and ¼, respectively.
Relay in Electrical Circuit In a simple probability model, we assume the relays are independent. There are two basic kinds of connections: And other structures are based on the combinations of the series and parallel.
Example 3.6 A section of an electrical circuit has two relays in parallel. The relays operate independently and when a switch is thrown, each will close properly with a probability of 0.8. If both relays are open, find the probability that the current will flow from left to right when the switch is thrown. Solutions: Let O denote Open, C denote Close. Then there are four possible outcomes: E1={(O, C)}, E2={(O,O)}, E3={(C,O)}, E4={(C,C)}. We know the P(C) = 0.8, so P(O)=0.2 for each relay. As relays operate independently, so P(E1)=P(O)P(C)=0.16, P(E2)=P(O)P(O)=0.04, P(E3)=P(C)P(O)=0.16, P(E4)=P(C)P(C)=0.64. Also when the relay opens, no current flows. So we are interested in the event, denoted by A=E1UE3UE4 and E1, E2, E3 and E4 are mutually exclusive, so P(A)=P(E1)+P(E3)+P(E4)=0.16+0.16+0.64 = 0.96.