1 / 17

Random Thoughts 2012 (COMP 066)

Random Thoughts 2012 (COMP 066). Jan-Michael Frahm Jared Heinly. Values to Summarize Data. Mean (EXCEL: AVERAGE(<range> ) C an informally be seen as the middle of the data B e careful they do not always tell the whole story outliers influence the mean (significantly). Median .

avi
Download Presentation

Random Thoughts 2012 (COMP 066)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Random Thoughts 2012(COMP 066) Jan-Michael Frahm Jared Heinly

  2. Values to Summarize Data • Mean (EXCEL: AVERAGE(<range>) • Can informally be seen as the middle of the data • Be careful they do not always tell the whole story • outliers influence the mean (significantly)

  3. Median • Median (EXCEL: MEDIAN(<range>)) • Order the data from smallest to largest • If the dataset is an odd number the median is the one in the middle. If there is an even number of data the average of the middle two is the median • Which measure should be used mean or median? • reporting both is never a problem • Always ask for the other if given only one

  4. Measure of Variability • Standard deviation (EXCEL: STDEV.S(<range>)) • Find the average of the data • Subtract average from the data • Square the differences • Divide the sum of squares by the number of data minus one (this is also called variance) • Take the square root of the variance

  5. Standard Deviation Properties • Can never be negative • Smallest possible value is 0 • Effected by outliers • Same unit as original data

  6. Percentile • k-th percentile • Order all numbers in the dataset • Multiply k percent times the number of data points n • round up if not a whole number • Find the value at the in step 2 computed position. Then the k-th percentile is the average of that number and the next number • Median is the 50-th percentile • Percentile is not a percent it a number that is a certain percentage of the way through the dataset

  7. Coincidences • Recall the bet that two people in the room have the same birthday • Was it a bad bet to make?

  8. Coincidences • Johnny Carson example from Paulos book: In order to have a 50% probability of someone in the room having a particular birthday, you need 253 people. • Does this make sense? • Wouldn’t you need only 50% or 366 people which is 183?

  9. Coincidences • 1000 letters, 1000 mailboxes, random assignment • Probability of at least 1 getting to correct destination • Why is it 63%?

  10. Coincidences • 1000 letters1000 random addresses (allowing duplicates), 1000 mailboxes, random assignment • Probability of at least 1 getting to correct destination

  11. Coincidences • 1000 letters, 1000 mailboxes, random assignment • Probability of at least 1 getting to correct destination • Why is it 63%? • Derangements – permutation such that no element appears in its original position • Complex calculation, but as number of elements increases, probability approaches 1 – 1/e ≈ 63%

  12. Pigeonhole Principle • If n items are put into m pigeonholes with n > m, at least one pigeonhole must have more than 1 item Source: http://en.wikipedia.org/wiki/File:TooManyPigeons.jpg

  13. Pigeonhole Principle • 1.54 million people in Philadelphia • At most 500,000 hairs are on a person’s head • What is the minimum number of people that have the same number of hairs on their head?

  14. Chance Encounters • Probability that two people from USA know someone in common • ie. they are “linked” via one person • Assumption: there are 300 million people in the USA • Assumption: each person knows 1500 other people • Probability that two people from USA are linked via 2 individuals

  15. Degrees of Separation • Six degrees of separation • There are on average 6 links between any 2 people on earth • Six degrees of Kevin Bacon, Bacon number • Determine the number of links (movies acted in) between a random actor and Kevin Bacon • Assume 2 million actors • Assume each actor has acted with 80 others

  16. Expected Value Σ • Expected value = probability of event * value of event • Ex: pay $1 to play a game, 10% chance of winning $5, 40% chance of winning $1 • Expected Value = -1 + 0.1 * 5 + 0.4 * 1 = $-0.10 • Ex: Dice game • Keep earning points until you roll a 1 • When does your expected value of points stop increasing?

  17. Blood Testing • 1% of people have disease • Need to test 100 samples of blood • Probability that all samples are healthy • What if we pool the blood into 2 sets of 50 each and then test? • What is the expected number of tests? • Can we do better?

More Related