1 / 32

Prediction

Prediction. Greg Francis. PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University. Hypothesis tests. Hypothesis tests are commonly used as part of a method to establish scientific “ truth ” Is there an effect? What should I believe?

Download Presentation

Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University

  2. Hypothesis tests • Hypothesis tests are commonly used as part of a method to establish scientific “truth” • Is there an effect? • What should I believe? • An alternative approach is to give up on “truth” and instead focus on “prediction” • The question is not “Is there an effect?” or “What should I believe?” • Rather: “How should I behave?” • Follow the data, but do not follow it blindly • Build a quantitative model, but test the model

  3. Model building • Suppose you have two samples and you are interested in the means • Further suppose that the population properties are: • μ1=0, μ2=0.3 • σ1=σ2=1 • Typically, we would draw random samples from each group and run a t-test to determine if we should treat the means as being different • Treatment • Theory • Future work • Prediction

  4. Model building • We typically build the following kind of model • The score for subject k is related to the grand mean, to deviations from the grand mean due to being in group 1 or group 2, and to random noise • This model gives mean values for each group

  5. Model building • Draw samples (n1=n2) from populations having • μ1=0,μ2=0.3 • σ1=σ2=1 • Construct different models that vary in the estimate of the mean values: Hypothesis testing model Full model Null model If do not reject H0 (p<.05) If do reject H0

  6. Small samples • 20 experiments • n1=n2=10

  7. Bigger samples • 20 experiments • n1=n2=50

  8. Big samples • 20 experiments • n1=n2=100

  9. Model fit/error • A standard way of judging the quality of a model is by its fit to a data set • One fit measure is root mean squared error • We want a model with low RMSE

  10. Checking model approaches • Draw samples (n1=n2) from populations having • μ1=0,μ2=0.3 • σ1=σ2=1 • Repeat for 10,000 simulated experiments • Compute RMSE for each model and average across experiments • Vary sample size n1=n2

  11. Comparing models • μ2 - μ1=0.3

  12. Comparing models • For small samples, the null model provides the smallest average RMSE • For large samples, the full model provides the smallest average RMSE

  13. Comparing models • There is always a better model (on average) than what is derived by hypothesis testing • Hypothesis testing (on average) leads to over fitting for some small samples (when it rejects) • Hypothesis testing (on average) leads to under fitting for some large samples (when it does not reject)

  14. Bigger effects • Similar for other effect sizes: μ2 - μ1=0.8

  15. Null effects • Similar for other effect sizes: μ2 - μ1=0

  16. Known unknowns • But these simulations are all theoretical • To compute RMSE we need to know the true means • However, we can do something similar if we do not compute RMSE relative to the true means, but relative to test data

  17. Prediction / validation • Suppose I build my models from one set of data, x1i, and x2i, and then test them with another set of data, y1i, and y2i • Here, we compute RMSE relative to means from the test data set • You could also compute RMSE relative to individual data points

  18. Small effect • When μ2 - μ1=0.3

  19. Bigger effect • When μ2 - μ1=0.8

  20. Null effect • When μ2 - μ1=0

  21. Prediction / validation • Can better see differences by subtracting full model RMSE from other models’ RMSE μ2 - μ1=0.8 μ2 - μ1=0.3 μ2 - μ1=0 Smallest number (biggest negative) Indicates the best model (with the smallest RMSE).

  22. Prediction / validation • This looks good • At least on average, the RMSE patterns for testing means of new data are similar to those for RMSE for testing against the true means • If we want to deduce which model best predicts values, we can pick the model that minimizes the test RMSE value • Cost: we have to run the experiment twice • Testing does not require equal sample sizes, but you trade off model development against model testing

  23. Cross validation • We partly avoid that cost by using cross-validation to approximate RMSETest • Divide the data set x1i, and x2i into multiple subsets (a common choice is 10 subsets) • Build your model using all but one of the subsets • Compute RMSE for the left-out subset • Repeat for all possible combinations • 10 build and test “folds” • Compute mean RMSE across the subsets

  24. Cross validation • When μ2 - μ1=0.3, 5-fold cross validation

  25. Cross validation • When μ2 - μ1=0.8, 5-fold cross validation

  26. Cross validation • When μ2 - μ1=0.0, 5-fold cross validation

  27. Optional stopping • Actual use: μ2 - μ1=0.3, 10-fold cross validation • Start with n1=n2=10, compute cross-validated RMSE • Add 10 scores and repeat until n1=n2=200

  28. Optional stopping • Actual use: μ2 - μ1=0.8, 10-fold cross validation • Start with n1=n2=10, compute cross-validated RMSE • Add 10 scores and repeat until n1=n2=200

  29. Optional stopping • Actual use: μ2 - μ1=0.0, 10-fold cross validation • Start with n1=n2=10, compute cross-validated RMSE • Add 10 scores and repeat until n1=n2=200

  30. Cross validation • At each step, you should follow the data and use the best model for minimizing RMSE • As the data changes, so does your model • You can have an intermediate decision, but still expect it to change • If you have to make a decision with the current data it makes sense to choose the best model • Note that the best model is not necessarily a good model • You have to judge whether the RMSE is small enough for whatever purpose you have in mind

  31. Prediction / validation • Cross validation and test validation naturally generalize to more complicated models and experimental designs • Interactions, nonlinear models • Details of how to generate validation “folds” can get complicated • It’s mostly a matter of being careful about generating representative folds and not inputting your own bias and • No need to use RMSE • Other “cost” functions work in a similar way

  32. Conclusions • Prediction / validation seems like a viable approach • It encourages data accumulation • But it gives up on the idea of establishing “truth” from data • Instead, it focuses on practical uses of data • There are Bayesian methods that have the same goal • They are better if you have useful prior knowledge

More Related