Since my recent post occasioned the first comment being posted on The Mystical Positivist blog site since its inception last December, I want to respond to the points it raises. Also, it presented me with a considerable challenge in that I needed to familiarize myself with the arguments pro and con on the use of Bayesian statistics in the analysis of Psi data.
The commenter posted a link to a paper entitled Why Psychologists Must Change the Way They Analyze Their Data: The Case of Psi. The authors, who are well known in their development of Bayesian statistical analysis, apply their techniques to a paper published in 2011 by Dr. D.J. Bem called Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. This is a fairly controversial paper in which Bem reports on nine experiments that he conducted in "retro-causation" that show findings not unlike the paper I cited in my original post by Dr. Dean Radin. Bem used a frequentist analysis to claim that eight of his nine experiments showed strong evidence (as measured by p< 0.05) of the validity of his alternative hypothesis (i.e. the affirmation that a retro-causal effect was present and the rejection of the null hypothesis which says that there is not effect). In the paper my commenter cites, Wagenmakers et al. claim that by using a Bayesian approach which takes into account assumptions about the prior probabilities of both the null hypothesis and the alternative hypothesis, they can reanalyze the data and show that Bem's claims of significance are wildly overstated.
It turns out that Bem and two co-authors (who have previously authored a number of Bayesian analyses of Psi data) published a response to Wagenmakers et al. called Must Psychologists Change the Way They Analyze Their Data? A Response to Wagenmakers, Wetzels, Borsboom, & Van der Maas (2011). They argue that Wagenmakers et al. used the subjectivity inherent in the selection of the Bayesian distribution of the prior probability for the alternative hypothesis to inappropriately bias their analysis toward the null hypothesis. In very simple terms (but terms that do not really do justice to the subtlety of the argument here), they state that Wagenmakers et al. are demanding such a large effect size from the prior probability distribution of the alternative hypothesis that their resulting analysis of Bem's data drives the Bayes factors (the ratio of the marginal probability for the null hypothesis given the data to the marginal probability of the alternative hypothesis given the data) to relative insignificance (technically they use a diffuse Cauchy prior distribution). Bem et al. utilize a different probability distribution for the alternative hypothesis based on prior knowledge of similar psychological studies of priming effects on reaction time, and they demonstrate that a Bayesian analysis of Bem's original data yields strong affirmative results for the alternative hypothesis in five of the nine experiments. They go on to do a Bayesian meta-analysis of the aggregate dataset and show a Bayes factor of 13.6K (extremely strong confirmation of the alternative hypothesis).
Lest we leave it there, Wagenmakers et al. offer a response to the response here. In looking into this question, I found a useful background paper on the subject by authors with no particular philosophic axe to grind that explains the issues behind conducting a Bayesian t-test. So based on all of this material, I have drawn a number of conclusions - some of which are actually different from where I started from:
- The most important lesson here is that the ordinary criterion for statistical significance (p<.05) in a frequentist analysis is way too generous and does not give sufficient weight to the null hypothesis. Stated differently, I will no longer lazily think that a p<0.05 result in a Psi test is necessarily significant. I will however continue to think that a p<0.001 is significant (such as in Radin's paper). Fortunately for my previously held opinions, there are plenty of solid studies out there that show results of this magnitude.
- The critiques in Wagenmakers et al. against exploratory versus confirmatory testing of data are excellent points and important ones to watch for in the Psi debate. Effectively they say that if you use a data set to identify potential hypotheses and then run a statistical analysis testing for those hypotheses against the same data set, then you will be biasing your results away from the null hypothesis. Unfortunately, they miss the point that four or five of the nine Bem experiments in the original paper were strictly replications and not exploratory in nature. And as far as the paper I cited in my original blog post, Radin reports on an initial study and three strict replication studies that do not attempt to extract new hypothesis out of the data.
- To cite Wagenmakers et al. as an indictment of all data analysis in the field of Psi research and to use that conclusion as a justification of not examining more rigorously the large body of published data on the subject smacks of confirmation bias. The appropriate application of Bayesian statistics to hypothesis testing is far from settled even outside of the field of Psi. The virtue of the Bayesian approach is that it forces one to be explicit about the subjective choices one makes in determining the prior probabilities to use in an analysis. Reasonable people can disagree on this and that disagreement can make the difference between a significant and not so significant result, but at least one's biases are out in the open. The more common frequentist approach has a built in bias against the null hypothesis that is not so obvious. In just this, Wagenmakers et al.'s larger critique of how statistical analysis is used in psychology and social sciences is a valid and strong one.
To put a bow on this particular discussion, I found a video on Dean Radin's blog of a recent debate at Harvard that featured Daryl Bem, Jonathan Schooler, and Sam Moulton. Bem is the author of the retro-causation survey discussed in Wagenmakers et al. Jonathan Schooler is the researcher featured in the articles recently in the New Yorker on the Decline Effect. Sam Mouton is a Harvard Psychologist who started out as a parapsychologist but later rejected the field when he consistently got null results in a number of studies he conducted. Bem claims to have started out as a skeptic but was led by his results over a number of studies to believe there was something there in his tests. It is a good discussion that raises some of the challenges in this kind of research. Schooler comes out in this discussion as a Psi researcher and discusses the impact of the Decline Effect on replication studies in Psi (and other fields).