1 / 18

Evidence for Probabilistic Hypotheses : With Applications to Causal Modeling

Evidence for Probabilistic Hypotheses : With Applications to Causal Modeling. Vals , Switzerland, August 7, 2013. Malcolm R. Forster Department of Philosophy University of Wisconsin-Madison. 1. References.

majed
Download Presentation

Evidence for Probabilistic Hypotheses : With Applications to Causal Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evidence for Probabilistic Hypotheses: With Applications to Causal Modeling Vals, Switzerland, August 7, 2013 Malcolm R. Forster Department of Philosophy University of Wisconsin-Madison 1

  2. References Forster, Malcolm R. (1984): Probabilistic Causality and the Foundations of Modern Science. Ph.D. Thesis, University of Western Ontario. Forster, Malcolm R. (1988): “Sober’s Principle of Common Cause and the Problem of Incomplete Hypotheses.” Philosophy of Science 55: 538‑59. Forster, Malcolm R. (2006), “Counterexamples to a Likelihood Theory of Evidence,” Mind and Machines, 16: 319-338. Whewell, William (1858): The History of Scientific Ideas, 2 vols, London, John W. Parker. Wright, Sewell (1921). “Correlation and Causation,” Journal of Agricultural Research 20: 557-585. 2

  3. How to discover causes… TWO THESES Thesis (a): Probabilistic independences provide a way to discover causal relations.  Thesis (b) Probabilistic independences provide the only way to discover causal relations.  The simplest way to argue against (b) is to show how data can favor XY against YX. 3

  4. Back to first principles… Hypothesis testing in general... Modus Tollens: Hypothesis H entails observationO, O is false, therefore His false.  Probabilistic Modus Tollens: H entails that observation O is highly probable, O is false, therefore His false.  THE PROBLEM: In most situations, all rival hypotheses give the total evidenceE very low probability. Put O = not-E …run prob. modus tollens … end up rejecting EVERY hypothesis!!! 4

  5. A response to the PROBLEM We should not focus exclusively on the total evidence E. We should focus on those aspects of the data O that are central to what the hypothesis says. Example 1: The agreement of independent measurements of the parameters postulated by the model. E.g.  in the Bernoulli model, or the agreement of independent measurements of the Earth’s mass. Example 2: The independencies entailed by d-separation in causal models. 5

  6. A response to the PROBLEM …continued. (3) We should look at what is entailed by the models by themselves, without the help of other data. Examples 1 and 2 meet this desideratum. Also justifies a faithfulness principle: Favor models that entail an independency over one that is merely able to accommodate it (even if the likelihoods go the other way). (I don’t see this as appealing to non-empirical biases, such as simplicity.) 6

  7. Now apply the agreement of measurements idea to the testing of causal models… • What does Forward, XY, entail? The independencies entailed by a DAG is part of what a causal model entails. But it often says something more… • It says something the forward probabilities (or densities) p(y|x), and nothing (directly) about p(x) or p(x,y) or p(x|y). • XY says: If p1(x), then p1(x,y) = p1(x) p(y|x), If p2(x), then p2(x,y) = p2(x) p(y|x), and so on. 7

  8. The key idea… • We can use data generated by p1(x,y), to estimate parameters in p(y|x). • We can use data generated by p2(x,y), to estimate the same parameters in p(y|x). • The two data clusters provide independent estimates of the parameters. If the estimates agree then we have an agreement of independent measurements. • The hypothesis “stuck its neck out”, it risked falsification, it survived the test, and is thereby confirmed. 8

  9. Prediction versus Accommodation Cluster 2 generated by p2(x,y) • Both XYand YXare able to accommodate (that is, fit) the total evidence well. So a maximum likelihood comparison is not going to discriminate well. y 15 10 5 x • But suppose we fit a model to Cluster 1, and then to Cluster 2 to see whether the independent measurements of the parameters agree. -15 -10 -5 5 10 15 -5 -10 -15 Cluster 1 generated by p1(x,y). 9

  10. The content of XY • XY says: If p1(x), then p1(x,y) = p1(x) p(y|x), If p2(x), then p2(x,y) = p2(x) p(y|x), and so on. • XY also says: If p1(x), then p1(x|y) = p1(x,y)/p1(y). If p2(x), then p2(x|y) = p2(x,y)/p2(y)., and so on. • In general, p1(x|y)  p2(x|y). That is, XY says that the backwards probabilities vary. • If XY is right then YX is wrong. • It’s metaphysically possible that that forward model say that forward probabilities depend on the input distribution. But we need to search for uniformities of nature… 10

  11. The Asymmetry of Regression… y • The data are generated from Y = X + U, where x is N(–10,1), U is N(0,1) and U is independent of X. • The y on x regression is different from the x ony regression. -6 -7 -8 -9 x -14 -12 -8 -6 -11 -12 -13 11 -14

  12. Forward Model: XY XY : XYpasses the test because… Cluster 2 y 15 10 Independent measurements agree! 5 x -15 -10 -5 5 10 15  -5 -10 Cluster 1 12

  13. Backward Model YX Y X says: YXfails the test because… y 15 Not all independent measurements agree. 10 5 x -15 -10 -5 5 10 15  -5 -10 -15 13

  14. Another way of seeing the same thing... y The forwardsmodel fits Cluster 2 (top right) better than the backwards model. 10 5 x -10 -5 5 10 -5 -10 14

  15. Summary Bullets • The phenomenon is completely general. It does not depend on any special features of the distribution, except the judicious splitting of the data into clusters. • The method depends on a judicious splitting of the data. Bayesians (and likelihoods) do not split data. (They consider on the likelihoods relative to the total evidence.) • If you don’t split data, then it more difficult to show that XY is right and YX is wrong. 15

  16. FORWARD CAUSAL MODEL  Independent measurements agree! 16

  17. BACKWARD CAUSAL MODEL  Independent measurements do NOT agree. 17

  18. Robustness of the Phenomenon In 15 runs the forwards regression is closer to the generating curve, y = x, than the backwards regression. y 40 20 x -40 -20 20 40 -20 -40 18

More Related