Introduction to Probability Theory: A Historical Perspective

STAT151: Introduction to Statistical Theory Term I, 2013-2014 Yang Zhenlin zlyang@smu.edu.sg http://www.mysmu.edu/faculty/zlyang/

Course Contents This course serves as an introduction to basic statistical theory for students who intend to pursue a quantitative major at SMU such as Applied Statistics, Actuarial Science, Economics, Finance, Marketing, Management, etc. Students are expected to have some basic knowledge in probability and statistics and to be mathematically oriented or at the very least, be interested in mathematics. Major topics include: Probability and Conditional Probability Distributions and Conditional Distributions Point Estimations Confidence Intervals Testing Statistical Hypotheses Course Outline

Chapter 1: Probability In any scientific study of a physical phenomenon, it is desirable to have a mathematical model that makes it possible to describe or predict the observed value of some characteristic of interest. The physical situations may be deterministic, such as the speed of a falling object in a vacuum, travel distance of an airplane in certain time period when the speed is set to a constant, etc., or may be nondeterministic, such as the outcome of tossing a coin, the lifetime of a light bulb, etc. This course concerns the latter situation: to provide a probability model so as to facilitate the “statistical inference” for the nondeterministic situation.

Brief History Concepts of probability have been around for thousands of years, but probability theory did not arise as a branch of mathematics until the mid-seventeenth century. In the mid-seventeenth century, a simple question directed to Blaise Pascal by a nobleman sparked the birth of probability theory. Chevalier de Méré gambled frequently to increase his wealth. He bet on a roll of a die that at least one 6 would appear during a total of 4 rolls, and was more successful than not with this game of chance. Tired of his approach, he decided to change the game. He bet on a roll of two dice that he would get at least one double 6, on twenty-four rolls of two dice. Soon he realized that his old approach to the game resulted in more money. He asked his friend Blaise Pascal why his new approach was not as profitable. Pascal worked through the problem and found that the probability of winning using the new approach was only 49.14 percent compared to 51.77 percent using the old approach.

Brief History Therefore, Pascal and Fermat are the mathematicians credited with the founding of probability theory (David, 1962). • The topic of probability is seen in many facets of the modern world. The theory of probability is not just taught in mathematics courses, but can be seen in practical fields, such as insurance, industrial quality control, study of genetics, quantum mechanics, and the kinetic theory of gases (Simmons, 1992). This problem proposed by Chevalier de Méré is said to be the start of the famous correspondence between Blaise Pascal and Pierre de Fermat. They continued to exchange their thoughts on mathematical principles and problems through a series of letters. Historians think that the first letters written were associated with the above problem and other problems dealing with probability theory.

Learning Objectives • Sample Space and Events • Basic Set Algebra • Definition of Probability • Conditional Probability • Rules for Calculating Probabilities • Probability Trees • Counting Techniques

Sample Space and Events Some fundamental concepts are important to the learning of probability theory: An experiment refers to the process of obtaining an observed result of some phenomenon. A trial refers to a run of an experiment. An outcome refers to the observed result. The set of all possible outcomes of an experiment is called the sample space, denoted by S. Note: One and only one of the possible outcomes will occur on any given trial of the experiment.

Sample Space and Events Example 1.1 An experiment consists of tossing two coins, and the observed face of each coin is of interest. The set of possible outcomes is represented by the sample space: S = {HH, HT, TH, TT}, where H=Head, and T=Tail. Example 1.2. Suppose in Example 1.1, we are instead interested in the total number of heads obtained from the two coins. An appropriate sample space could then be defined as S = {0, 1, 2}. Thus, a different sample space may be appropriate for the same experiment, depending on the characteristic of interest.

Sample Space and Events More fundamental concepts: A sample space S is said to be finite if it consists of a finite number of elements, say S = {e1, e2, . . . , eN}, and countably infinite if its outcomes can be put into a one-to-one correspondence with the positive integers 1, 2, 3, . . . If a sample space S is either finite or countably infinite, then it is called a discrete sample space;If the outcomes can assume any value in some interval of real numbers then the sample space is called a continuous sample space. Note: the sample spaces in Examples 1.1 and 1.2 are discrete. Example 1.3. If a coin is tossed repeatedly until a head occurs, then a natural sample space is S = {H, TH, TTH, TTTH, . . . }.

Sample Space and Events Example 1.4. A light bulb is placed in service and the time of operation (in hours) until it burns out is measured, then the sample space consists of all nonnegative real numbers, i.e., S = {t | 0 ≤ t < ∞}, which is a continuous sample space. Event. Any subset A of the sample space S is defined as an event. If the outcome of an experiment is contained in A, then we say event A has occurred in this experiment. If in Example 1.3 one is interested in the number of tosses required to obtain a head, then a possible sample space would be the set of all positive integers, i.e., S = {1, 2, 3, . . . }. Clearly these are countable and infinite sample spaces.

Set Algebra The study of probability models requires a familiarity of basic notations of set theory. A set is a collection of distinct objects. Sets usually are designated by capital letters, A, B, C, . . . , or subscripted letters A1, A2, A3, . . . Individual objects in a set A are called elements. In the context of probability, the sets are called events and the elements are called outcomes. Universal setS is the set of all elements under consideration. In probability application it is called the sample space. Empty set or null set, denoted by , is the set that contains no element. In probability, it is empty event.

Set Algebra S S B A A Event A: the shaded area Event B is a subset of Event A If all the elements in a set A also are contained in another set B, we say that A is a subset of B, denoted by AB. It is always true that AS. Some of these sets are depicted using figures called the Venn diagrams:

Set Algebra S S AB AB AB: the shaded area AB: the shaded area The union of two sets A and B, denoted by AB, is a new set consisting of all elements that are either in A or in B or in both A and B, i.e., AB = {a | aAoraB} The intersection of A and B, denote by AB, is a new set containing all elements that are both in A and in B, i.e., AB = {a | aAandaB }

Set Algebra S A B A Ac: the shaded area A-B: the darker shaded area The complement of A, denoted by Ac, consists of all outcomes in S that are not in A; The difference of A and B, denoted by A – B, is defined as a new set that consists of all elements in Abut not in B. Clearly, A – B = ABc.

Set Algebra A C ABC: all shaded areas B ABC: the darkest area The notions of union and intersection can be extended to more than two sets. For example, ABC = {a | aAoraBoraC } ABC = {a | aAandaBandaC}

Set Algebra S S AB AB A and B are not mutually exclusive A and B are mutually exclusive Example 1.5. An experiment consists of tossing two dice. The sample space thus consists of 36 points: {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), . . . , (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}, or in short, S = {(i, j) | i, j = 1, 2, 3, 4, 5, 6} Tow sets A and B are said to be disjoint if they have no elements in common. In probability context they are called mutually exclusive events.

Set Algebra Let A be the event that the sum of the dice equals to 7, and B be the event that the first die shows a 5. Then A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} B = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)} Now, AB = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (5, 1), (5, 3), (5, 4), (5, 5), (5, 6)} AB = {(5, 2)} A–B = {(1, 6), (2, 5), (3, 4), (4, 3), (6, 1)}

Set Algebra Important laws of sets or events: A(BC) = (AB)C and A(BC) = (AB)C (Associative Law) AB = BA and AB = BA (Commutative Law) A(BC) = (AB)(AC) and A(BC) = (AB)(AC) (Distribution Law). A = A, AS = A, AAc = S, and AAc= .

Set Algebra (AB)c = Green Area AcBc = Green Area • Other obvious but useful equations include: • (Ac) c = A, • c = S, Sc = , • AA = A, AA = A, AS = S, • A= , • A(AB) = A, • A(AB) = A. • De Morgan's law: (AB)c = AcBc and (AB)c = AcBc

Set Algebra Mutually Exclusive Events and Partition of Sample Space Events A1, A2, A3, . . . , are said to be mutually exclusive if they are pairwise mutually exclusive. That is, AiAj =  whenever i ≠ j. If A1, A2, . . . , Ak are mutually exclusive and the union of them makes up the sample space S, then, A1, A2, . . . , Ak are said to form a partition of the sample space S, i.e., S = A1A2 . . . Ak Clearly for any event B, the events BA1, BA2, . . . , BAk form a partition of the event B, in the sense that B = (BA1)  (BA2)  . . .  (BAk).

Set Algebra A2 A3 A2 A3 B A1 A4 A1 A4 A6 A5 A6 A5 See Venn diagrams below for the case of k = 6. This demonstration is useful in introducing a very important law: the Law of Total Probability.

Definition of Probability The probability P(A) of a event A in a sample space S is defined as a set function with domain being a collection of events, and range a set of real numbers, such that Axiom 1: 0 ≤ P(A) ≤ 1 for every AS Axiom 2:P(S) = 1 Axiom 3: , if A1, A2, . . . , are pairwise mutually exclusive. For a given experiment with a sample space S, the primary objective of probability modeling is to assign to each event A an real number P(A), called probability of A, that will provide ameasure of the likelihood that A will occur when the experiment is performed.

Interpretation of Probability If there are N equally likely possibilities, of which one must occur and n(A) are contained in event A, then the probability of A is given by P(A) = n(A)/N. S Using Venn diagram, the geometric interpretation: A Classical Interpretation of Probability The oldest way of measuring uncertainties is the classical probability concept, developed originally from games of chance and applies when all possible outcomes are equally likely to occur:

Interpretation of Probability Frequency Interpretation of Probability A major shortcoming of the classical probability concept is its limited applicability, for which there many situations in which the various possibilities cannot be regarded as equally likely. Among the various probability concepts, most widely held is the frequency interpretation: The probability of an event is the proportion of the time it occurs in the long run when the experiment is repeated.

Interpretation of Probability For example, we say that a coin is fair if repeated tossing results in about 50% heads. Probability as a Measure of Belief • An alternative point of view is to interpret the probabilities as personal or subjective evaluations. Such probabilities express the strength of one's belief with regards to the uncertainties involved, and they apply especially when there is little or no direct evidence, such as risk taking and betting situations. • For example, in a 5-horse race, you feel that the first horse has 30% chance of winning, 2nd horse 25%, 3rd 20%, 4th 15%, and 5th 10%.

Rules for Calculating Probabilities Complement Rule: If A is an event andAc is its complement, then P(A) = 1 – P(Ac) Note: This result is particularly useful when an event is relatively complicated, but its complement is easier to handle. Example 1.7. An experiment consists of tossing a coin four times. What is the probability of obtaining at least one head? Solution: Here the event of interest is A = 'at least one head.' This event is complicated, but Ac = 'no heads' = {TTTT}, which contains only one outcome. Thus P(A) = 1 – P(Ac) = 1 – P(TTTT) = 1 – =

Rules for Calculating Probabilities Addition Rule: For any two events A and B, the following is true: P(AB) = P(A) + P(B) – P(AB) S E1 E3 E2 • Note E3 = AB, we have, P(AB) = P(E1E2E3) = P(E1) + P(E2) + P(E3) = [P(E1) + P(E3)] + [P(E2) + P(E3)] – P(E3) = P(A) + P(B) – P(AB) A B However, P(A)+P(B) =[P(E1)+P(E3)]+ [P(E2)+P(E3)], where P(E3) = P(AB), is added twice, P(AB)  P(A) + P(B)! This can be seen using the Venn diagram on left where E1 = Green, E2 = Purple, and E3 = Dark Green.

Rules for Calculating Probabilities Random Selection: if an object is chosen from a finite collection of distinct objects in such a manner that each object has the same probability of being chosen, then we say that the object was chosenat random. Example 1.8. Suppose one card is drawn at random from a well-shuffled deck of playing cards. Find the probability that the card is either an ace or a heart. Solution: Since the card is selected at random, each card thus has the same probability, 1/52, of being chosen. Let A = "select an ace", and B = "select a heart". Then P(A or B) = P(AB) = P(A) + P(B) – P(AB) = 4/52 + 13/52 – 1/52 = 16/52 = 4/13.

Conditional Probability The conditional probability of an event A, given that another event, B, has occurred, is defined by provided that P(B) ≠ 0. The most important concept in probability may be the conditional probability. This is because Partial information concerning the result of experiment often is available. Given the known information, one certainly knows better whether an event can occur or not; and Calculation of probability of certain event sometimes can only be done through conditioning.

Conditional Probability S AABB • the magnitude of P(A|B) depends on the relative sizes of the event AB and the event B.  the magnitude of P(A) depends on the relative sizes of the event A and the sample space S. Geometrical Interpretation of P(A|B):

Conditional Probability S S AABB AB P(A|B) = 0 0  P(A|B)  1 S S AABB AB AB = B 0  P(A|B)  1 P(A|B) = 1

Conditional Probability Example 1.9. A box contains 100 microchips, 60 were produced by factory 1 with 15 defectives, and the rest by factory 2 with 5 defectives. One microchip is selected at random from the box and tested. What is the probability that the microchip was produced by factory 1 if the test indicates that it is a good one? What if the test indicates that it is a defective one? “Conditioning on B” can be understood as reducing the sample space from S to B. It is easy to see that conditional probability satisfies the conditions of a probability set function, hence it enjoys all the properties of probability, e.g., (1) P(Ac|B) = 1 – P(A|B) (2) P(A1A2|B) = P(A1|B) + P(A2|B) – P(A1A2|B)

Conditional Probability

More Probability Rules Law of Multiplication: For any two events A and B, P(AB) = P(A|B)P(B) = P(B|A) P(A) Example 1.11. At a fair a vendor has 25 helium balloons on strings: 10 balloons are yellow, 8 are red, and 7 are green. Balloons are sold in random order. What is the probability that the first two balloons sold are both yellow? Solution: Let A = {the first balloon sold is yellow}, B = {the second balloon sold is yellow}. The quantity of interest is P(AB) = P(A)P(B|A) = (10/25)(9/24) = 3/20.

More Probability Rules Extended Law of Multiplication: For any three events A, B and C P(ABC) = P(AB)P(C|AB) = P(A)P(B|A) P(C|AB), For any four events A, B, C and D, P(ABCD) = P(A) P(B|A)P(C|AB) P(D|AB C), etc. Sampling with replacement occurs when an object is selected and then replaced before the next object is selected; Sampling without replacement occurs when an object is selected and not replaced before the next object is selected.

More Probability Rules

Independent Events Two events A and B are called independent events if P(AB) = P(A) P(B), Otherwise, A and B are called dependent events. It follows from the definition of independence that A and B are independent if and only if either of the following holds: P(A|B) = P(A) or P(B|A) = P(B); It follows from the definition of mutually exclusiveness that, A and B are mutually exclusive events if and only if either of the following holds: P(A|B) = 0 or P(B|A) = 0. Hence, independence and mutually exclusiveness are very much different.

Independent Events

Independent Events Three events A, B and C are said to be independent if they are pairwise independent and P(ABC) = P(A)P(B)P(C). Example 1.13. (Cont'd). Clearly, the three events are not totally independent since P(ABC) = 2/8 ≠ P(A)P(B)P(C) = 1/8. Pairwise independence does not imply independence; butindependence implies pairwise independence. Hence A and B are independent, so are the events A and C, and B and C. Such events are called pairwise independent events.

More Probability Rules Law of Total Probability: If A1, A2, … , Ak is a collection of mutually exclusive events such that the union of them makes up the sample space, then for any event B, we have, A2 A3 The Venn diagram on the right illustrates the case of 6 events, which form a partition of sample space. B A1 A4 A6 A5 Sometimes it is not possible to calculate P(A) directly, but may be possible to calculate it part by part and then add up.

More Probability Rules The next result will help answer the question: Given B has happened, was it due to A1 or A1 or … or Ak?

More Probability Rules Bayes' Theorem: If the events A1, A2, … , Ak form a partition of the sample space S, then for any event B in S, we have, In practice, the probabilities P(Ai) are called prior probabilities,and the conditional probabilities are called posterior probabilities.

More Probability Rules Given that a red chip is obtained, it is most likely that it came from bowl 3. In applying the Law of Total Probability and Bayes’ Rule, one needs to identify two sets of given information: one is a set of unconditional probabilities, and the other is a set of conditional probabilities. The key words such as ‘given’, ‘of’ and ‘among’ indicate that the subsequent number is a conditional probability.

Counting Techniques Counting Principle: Let E1, E2, ..., Ek be sets with n1, n2, ..., nk elements, respectively. Then there are n1 n2 …  nk ways in which we can first choose an element of E1, then an element of E2, ..., and finally an element of Ek. The Counting Principle can be used to solve many probability problems when sample space is finite and the outcomes are all equally likely. When an experiment consists of several operations and each operation has a certain number of possible outcomes, then the total number of possible outcomes for the experiment can be found from the following principle.

Counting Techniques Example 1.16. Suppose that there n routes from a town, A, to another town, B, m routes from town B to a third town, C, and l routes from town C to a fourth town, D. If we decided to go from A to D via B and C. How many different routes are there from A to D? Solution: In this situation, there are k = 3 operations: A to B, B to C, and C to D, with number of elements (different routes) n, m, and l, respectively. Hence the total number of different routes is nml. The above can be argued in a direct way. For each route that we choose from A to B, we have m choices from B to C. Therefore, altogether we have nm choices to go from A to C via B. Furthermore, for each choice from A to C, there are l choices from C to D. Hence we totally have nml choices from A to D.

Counting Techniques

Counting Techniques Permutation:An ordered arrangement of r objects from a set A containing n objects (0 ≤ r ≤ n) is called an r–element permutation of A or a permutation of the elements of A taken r at a time. Permutation Rule: Let nPr denote the total number of permutations of a set A containing n elements taken r at a time. Then nPr = n(n – 1)(n – 2) ... (n –r + 1) = For example, if A = {a, b, c, d}, then ab and ba are two-element permutations of A; bcd, bdc, cbd, cdb, dbc and dcb are the three-element permutation of A.

Introduction to Probability Theory: A Historical Perspective