1 / 12

Hypothesis Testing

Hypothesis Testing. Coke vs. Pepsi. Hypothesis: tweets reflect market share (people tweet as much as they drink) Market share: 67% vs. 33% From tweets: 71% vs. 29% Happened by chance? Or people tend to talk more about Coke than they drink it?. A simpler hypothesis testing.

palma
Download Presentation

Hypothesis Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hypothesis Testing

  2. Coke vs. Pepsi • Hypothesis: tweets reflect market share (people tweet as much as they drink) • Market share: • 67% vs. 33% • From tweets: • 71% vs. 29% • Happened by chance? Or people tend to talk more about Coke than they drink it?

  3. A simpler hypothesis testing • Claim: I can distinguish Coke and Pepsi just by tasting. • How do you verify my claim?

  4. It's like a court judgment • If you want to prove something, you have to assume the opposite, and find evidence that contradicts it. • In a court, you want to prove a defendant guilty. You assume he/she is innocent.

  5. You conducted an experiment… • And have some outcome • 62 out 100 correct • Assuming I cannot distinguish them, I did it just by random guessing, is the result possible? • Of course possible, if I'm lucky, I can get 100 out 100. But is the result surprising?

  6. How do we define surprising-ness? • Let's play random guess game one million times. If it turns out, 4 of 1 million times someone manages to score 62 or more, then we can say you have to be very super duper lucky to do that. Actually 0.000004% lucky. • And we are 99.999996% sure, that you can't get 62 in one game just by luck • Thus I am actually be able to distinguish Coke and Pepsi to some extent.

  7. But we can't play this game that many times… • Or can we? • Open Excel • In cell B1, type = rand() • Can you make B1 say 0 if the random number is less than 0.5 and 1 otherwise? • You just flipped a coin in Excel!

  8. Random Guessing Game in Excel • Flip the coin 100 times, in the same column • Find out how many heads you had in cell B101 • We've just played the random guessing game one time. • Can you do it 10 times?

  9. Histogram • We want to find out how many times we scored 62 or higher. • It's also interesting to look at how the scores are distributed, i.e. which are more likely • It's called a histogram • Let's create one by hand • Then in Excel

  10. Now do it 50 times! (or more… doesn't have to be exact) • Does the histogram look better? • What about 500 times? Look at the histogram

  11. How probable is a score of 62? • You can calculate it from the histogram • Let's play the game in Python for as many times as we want! • Here are the steps: • flip a coin 100 times, and record the number of heads (I'll show you how to flip coins in Python) • Do it 1,000 times. Record all the scores (numbers of heads) • Find out how many of them is greater than 62. What's the percentage? • Now calculate this percentage for 2,000 games. 5,000 games, 10,000 and 50,000 games. What about the score 57 or higher? 54? 50? • Ahuh, may be you want to write a function…

  12. Back to Coke vs. Pepsi

More Related