Imagine the headlines if a massive and rigorous multi-lab replication attempt produced results that supported the hypothesis that people really can sense future events by means as yet unknown to conventional science – in other words, that precognition is real. One might expect such a study to receive massive amounts of media coverage, given the coverage that Professor Daryl Bem’s (2011) series of studies, published in the prestigious Journal of Personality and Social Psychology, received.
The findings from Bem’s series were seen by some as providing the strongest evidence to date that precognition is real and were reported by science journalists all around the world. Bem even discussed his results on the US national TV show, The Colbert Report.
Now imagine the headlines if that massive and rigorous multi-lab replication attempt produced results that failed to support the existence of precognition – if, in fact, it produced very strong evidence that the effect reported in the original study was not real.
As it happens, there is no need to imagine the headlines in this case because the results are in from the Transparent Psi Project which aimed at replicating the effect reported in Bem’s Experiment 1. This study was carried out Zoltan Kekecs of Lund University and a large international team of collaborators (there are no less that 30 co-authors on their paper recently published in Royal Society Open Science). The results pretty conclusively demonstrate that the technique used in the original experiment is not capable of demonstrating precognition, if indeed precognition really exists. Strangely, I have yet to see any reports in the media of this negative finding.
The plain word summary presented in the paper is worth quoting in full:
This project aimed to demonstrate the use of research methods designed to improve the reliability of scientific findings in psychological science. Using this rigorous methodology, we could not replicate the positive findings of Bem’s 2011 Experiment 1. This finding does not confirm, nor contradict the existence of ESP in general, and this was not the point of our study. Instead, the results tell us that (1) the original experiment was likely affected by methodological flaws or it was a chance finding, and (2) the paradigm used in the original study is probably not useful for detecting ESP effects if they exist. The methodological innovations implemented in this study enable the readers to trust and verify our results which is an important step forward in achieving trustworthy science.
This study was a truly impressive piece of work, setting a new standard for rigour and transparency, incorporating many measures to rule out any bias or questionable research practices. It was co-designed by a panel of 29 experts, 15 of whom were supporters of the paranormal interpretation of the original results, including Daryl Bem himself, and 14 of whom were sceptical. I was one of the sceptical members of that panel. Members were invited to assess the study protocol and, after two rounds of review and refinement, reached consensus that the protocol was of high quality and immune to virtually all questionable research practices.
Experiment 1 of Bem’s series was chosen as the target for a replication attempt because a recent meta-analysis by Bem and colleagues suggested that it was this experiment of the nine originally reported by Bem that had produced the largest effect size across 14 studies (involving a total of 863 participants).
This experiment involved getting participants on each trial to select one of two curtains displayed on a computer screen. Once the selection had been made, the computer randomly allocated one of the screen locations to be the one that would be reinforced by the presentation of an erotic image.
The hypothesis being tested was that participants would choose the to-be-rewarded location more often than would be predicted by chance alone (that is, about 50% of the time). Bem claimed that his participants chose the to-be-rewarded location on 53% of trials, a small but highly statistically significant departure from what would be expected by chance. As stated, a subsequent meta-analysis apparently supported his findings, making Experiment 1 the obvious choice for a large-scale replication attempt.
This replication attempt involved ten laboratories from nine different countries. A total of 2,115 participants contributed valid data to the study resulting in a total of 37,836 trials. This sample is more than 20 times larger than Bem’s original study and more than twice as large as all 14 studies combined using this methodology included in the subsequent meta-analysis. Zekecs and his many colleagues reported a success rate of 49.89%, very close to what would be expected by chance. In their words (p. 21),
Observing this percentage of successful guesses is 72 times more likely if the guesses are successful at random than if they have a better than chance success rate.
The steps taken to ensure that this study would be immune from criticism in terms of methodology or analysis go well beyond any other experiment that I know of. As already stated, the methodology used was approved by a panel of experts including both proponents of the paranormal and sceptics. Clear and explicit criteria of what would count as a successful or an unsuccessful replication were stated in advance. The agreed study protocol was preregistered. Several methods were put in place to ensure that no one could tamper with the data in any way including (but not limited to):
- Direct data deposition: as data were collected they were immediately directed to a trusted third-party repository (GitHub);
- Born-open data: data were made public as they were being collected;
- Real-time research report: automatized reports were continuously updated as data flowed in.
The analysis plan allowed for very strong conclusions to be drawn either supporting or refuting the paranormal interpretation of Bem’s original study. As already stated, the authors concluded, “the original experiment was likely affected by methodological flaws or it was a chance finding”. It is rare in the social sciences that a conclusion can be drawn with such confidence.
Was it worth such a massive investment of time, effort and resources to refute a claim that many mainstream scientists would not have accepted in the first place? I would say it was. For one thing, the study was not simply aimed at replicating Bem’s Experiment 1 but also demonstrating the feasibility of applying the wide range of methodological tools used. It would clearly not be worth such effort in studies investigating non-controversial findings, but these same tools can be applied to investigate controversial topics in many areas of science not just parapsychology.
For those with an interest in parapsychology, the wider implications of these results are profound indeed. The original findings by Bem were reported far and wide as providing strong support for the reality of precognition. The effect was then apparently strongly supported by a subsequent meta-analysis based on the results from 14 studies. Even so, this massive and rigorous multi-lab replication attempt demonstrates as conclusively as humanly possible that the original effect is not real. So where are the headlines from the world’s science media?