Cancer Clusters by Random Allocation

Cancer Clusters by Random Allocation Jeff Witmer Oberlin College 22 January 2013 CAUSE Activity webinar

If the rate of cancer in your small town is three times the national average, should you be alarmed? To be specific, let’s consider pancreatic cancer. Suppose that your county is rectangular and contains 36 towns (or townships) with roughly equal populations. Further suppose that everyone is equally likely to get (pancreatic) cancer. Where will the next 36 cases appear on a map of your county?

Let’s allocate cancer cases to towns/townships randomly. Roll a pair of dice, one blue and one green. Let the number on the blue die indicate longitude and the number on the green die indicate latitude. Repeat for a total of 36 rolls.

Where do cases land?

Results from one example

We expect 1 case per town/township. But we saw a town with 4 cases! It would be bad enough if the cancer rate in my town were double the national average, but a rate 4 times the national average is outrageous. Unless it just happened by chance…

This time we saw a cell with a count of 4 and we know that it happened by chance. How often do cells get numbers as large as 3 or 4? We can run a quick simulation in R. Note: I do not do this with my students. It just lets me know that there is a very high chance that this activity will “work” – that I’ll see at least one cell with a count of 3 or higher.

A bit of R code: #cancer cluster via two dice rolls simulation longitude=c(10,20,30,40,50,60) latitude=c(1,2,3,4,5,6) #now place a cancer case: location=sample(longitude,size=1)+sample(latitude,size=1) cases=36 #roll dice 36 times location = NULL for (i in 1:cases){ location[i]=sample(longitude,size=1)+sample(latitude,size=1) } hist(tabulate(location),breaks=c (-0.5,0.5,1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5)) max(tabulate(location)) #find the maximum cell count

R code to do this 1000 times: #cancer cluster via two dice rolls simulation longitude=c(10,20,30,40,50,60) latitude=c(1,2,3,4,5,6) location=sample(longitude,size=1)+sample(latitude,size=1) cases=36 #roll dice 36 times reps=1000 maxes = NULL location = NULL counts = NULL for (i in 1:reps){ for (j in 1:cases){ location[j]=sample(longitude,size=1)+sample(latitude,size=1)} hist(tabulate(location),breaks=c (-0.5,0.5,1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5)) maxes[i]=max(tabulate(location)) } table(maxes) #maxes some results # 2 3 4 5 6 # 18 502 386 81 13 #thus Pr(2) is about 18/1000 or 2% and Pr(4 or more) is about 48%

Note that changing 36 to 72 makes extreme events less likely: Now Pr(4) – which is double the base rate – and Pr(6 or more) is only 43% (vs a 98% chance of seeing a town with triple the base rate if we use 36 rolls). #using cases=72 #maxes # 4 5 6 7 8 9 # 102 468 307 89 26 8 #thus Pr(4) -- double E(Y) -- is about 10% #and Pr(6 or more) is about 43% Also, you need to extend the list of histogram breaks from a high of 8.5 to 11.5 – or write better R code ;-)

We have looked at random allocation of cancer cases, but the same kind of problem arises often in epidemiology. Think of relative risk and see http://www.numberwatch.co.uk/2002%20July.htm (hat tip to Milo Schield; see also http://www.statlit.org/pdf/2005SchieldASA.pdf) If you do a simulation in which disease is distributed randomly, then most of the results will be unimpressive. But suppose results are published if the Rel Risk is impressively high. Then the average Rel Risk among published results is around 1.3 – a 30% increase over the base rate, which many citizens would find to be alarming.

R code #Relative Risk via simulation for a rare event #See http://www.numberwatch.co.uk/2002%20July.htm #100 researchers study 10,000 subjects each of whom has a 0.1% chance of disease diseased=rbinom(100,10000,0.001) summary(diseased) #we expect 10,000*0.001 = 10 and that's what we get on average #Min. 1st Qu. Median Mean 3rd Qu. Max. #4.00 8.00 10.00 10.03 12.00 20.00 hist(diseased,breaks=12,xlim=c(0,22)) high=subset(diseased,diseased>10) #select only the "larger than expected" results length(high) #[1] 41 hist(high,xlim=c(0,22)) summary(high) #note that one of these is 20, a rate double the expected! #Min. 1st Qu. Median Mean 3rd Qu. Max. #11.00 11.00 12.00 13.12 14.00 20.00 #Note also that the mean "high" result is 13 = 1.3 * the mean overall #We have a Poisson (large n, small p binomial is a Poisson) so SD=sqrt(mean) = 3.2 mean(diseased) #10.03 sd(diseased) #3.28

Cancer Clusters by Random Allocation

Cancer Clusters by Random Allocation

Presentation Transcript

Florida Cancer Clusters

Register Allocation by Puzzle Solving

Sample size Power Random allocation

Cancer by Neighborhood

Random and Quasi-random Allocation

Power-Aware Service Allocation and Reallocation in Clusters and Clouds

Winds Driven by Massive Star Clusters

Strong Lensing by Clusters of Galaxies

Allocation

Hidra: History Based Dynamic Resource Allocation For Server Clusters

Groups, Clusters and Clusters of Clusters

Clusters

Shuffling by semi-random transpositions

Clusters

Clusters