Interest in homeopathy seems to be on the wane, but maybe that’s just my bias.
While homeopathy is naked pseudoscience, it nevertheless yields occasional positive results in clinical trials. For scientists, skeptics, and those with an interest in the philosophy of science, this serves as an example of how poor trial design can lead to unreliable conclusions. Homeopathy, like acupuncture, has a habit of showing strong effects in badly designed trials, and small or no effect in well-conducted ones. If the heyday of homeopathy is over, perhaps we need a new paradigm to illustrate how weak design can mislead us?
I propose: the powerful placebo.
Over the coming months, I hope to explore some of what has been described as the crème de la crème of placebo effect research, scrutinising the biases, flaws, and weaknesses that challenge the widely accepted view of placebos as having real, powerful therapeutic effects. We will examine the primary literature to see if the work is really as compelling as some would have us believe.
Blackwell 1972, originally published in The Lancet, is a favourite of science communicators, who breathlessly claim it as a demonstration of how the colour of a placebo pill changes its effect. Blackwell took 56 medical students at the University of Cincinnati and randomised them to receive either blue or pink placebo pills. A further 44 students declined to take part, after learning what the study involved.
The students were told that they would receive either a sedative or a stimulant drug, but that they would not know which. They were also given a list of effects and side effects they could expect to experience as a result of taking the drugs. Six effects and six side effects for each, for a total of 24 expected effects.
The pill colour was assigned randomly, resulting in 29 of the students being given blue pills, and the remaining 27 receiving pink pills. In fact, all of the pills were placebos; there were no real drugs involved. Blackwell reports:
“There were two significant differences due to colour, both indicative of the blue capsules producing more sedative effects than pink capsules. 66% of subjects on blue capsules felt less alert compared with 26% on pink; 72% on blue capsules felt more drowsy compared to 37% on pink.”
Aside from the fact that ‘less alert’ and ‘more drowsy’ are arguably the same thing, there are more serious concerns with the study’s methodology if we are using it to support such extraordinary claims.
For one thing, it is only single-blind. Reading the paper, this seems to have in fact been an exercise to teach a group of medical students about placebos that their professor decided to write up for The Lancet after the fact. As a class exercise, of course the professor knew that the pills were placebos, and knew what the expected effects were. In an illustration for students, that is to be expected. But today this paper is used to support the purported effects of pill colour generally.
Moreover, the professor provided the students with a list of effects they can expect to experience after taking the pills. They were told that stimulants would make them more cheerful, more talkative, more alert, less drowsy, less sluggish, and less tired. They were also warned the stimulants would make them more tense, more jittery, more irritable, less relaxed, less calm and less easy going. Sedatives were said to cause the opposite effects, so less cheerful, more sluggish, and so on.
Since the students were asked to self report their condition after taking the pills, it’s hardly surprising they reported some of the many effects they were told to expect. Confirmation bias, if nothing else, would lead them to notice and report the changes highlighted by their professor, even if they may have ignored those same changes had they not first been prompted to look for them. In psychology, this is referred to as priming.
Researchers frequently overlook this and fail to acknowledge that there is a distinction between the reported effect of an intervention and the true effect, because of the role of bias. In fact, a 2010 review on placebo effects for Cochrane concluded that it is ‘difficult to distinguish patient-reported effects of placebo from biased reporting’.
It is also notable that, despite the potential confounding role of bias, only two of the 24 effects that students were asked to look out for rose to the level of statistical significance. The paper does not give specific p-values for those two findings, but notes that findings were reported as ‘significant’ if p<0.05.
For those who aren’t statistical experts, the p or probability value tells us how likely it is that we would see these results if the null hypothesis is true – in this case, if there is no actual effect of the pill colour. A p-value of 0.05 means there’s a 5% chance of observing these results purely by chance, assuming no real effect. However, each comparison made is another roll of the dice, another opportunity to obtain a false positive by statistical fluke, and Blackwell tests against many different outcomes.
One common technique used to account for this is the Bonferroni Correction, which adjusts the p-values to account for the number of comparisons being made, reducing the chance of a false positive.
While the paper does not present the raw data, sufficient data is provided for us to work backward the raw figures, at least for the two significant findings. We know how many participants there were in total, and we know how many were given each colour. The paper reports that 66% of people who had blue pills reported less alertness, which equates to nineteen people (65.51%). Applying the same methods to the other groups, we can determine the raw figures for the two ‘significant’ colour findings.
Less Alert | More Drowsy | |
Blue Pills (n = 29) | 19 (66%) | 21 (72%) |
Pink Pills (n = 27) | 7 (26%) | 10 (37%) |
From here, we can compute our own p-values, using a chi-squared test, and apply a Bonferroni correction. This reveals that, once adjusted for multiple comparisons, neither alertness (p = 0.072) or drowsiness (p = 0.19) are actually significant findings.
While this study may have originally been designed as an educational tool for medical students, its citation by science communicators today as robust evidence of a powerful placebo effect invites a higher level of scrutiny. It is perhaps no surprise it fails to hold up when evaluated against more rigorous standards than it may have originally been designed to bear.
This is a small, single-blind study, based on self-reported data – a potent combination that opens the door to all sorts of biases. Worse still, the students were primed with a list of possible effects, making it more likely they would report something, even if they hadn’t felt much. This isn’t good science, it’s a recipe for false positives.
Of course, Blackwell was not the final word on pill colour and placebo, there are other studies that claim to find an effect, but those are stories for another day.