The evidence for pill colour impacting placebo effects gets flimsier the more you examine it

Author

Mike Hallhttps://mikehall314.bsky.social/
Mike Hall is a software engineer and Doctor Who fan, not in that order. He is the producer and host of the long-running podcast Skeptics with a K, part of the organising committee for the award winning skeptical conference QED, and on the board of the Merseyside Skeptics Society.

More from this author

- Advertisement -spot_img

In 1996, the British Medical Journal published a systematic review of studies examining whether the colour of a pill could change its effectiveness. At first glance, the idea might make intuitive sense. Red is bold and energetic, blue is calm and serene, so patients might expect different things from red or blue pills and these expectations could influence their reported outcomes.

The review comes to a cautiously positive, if slightly hedged, conclusion: ‘[Colours] seem to influence the effectiveness of a drug.’ Since publication, the idea that different coloured placebo pills can produce different effects has become a frequently cited example whenever ‘the power of placebo’ is discussed in science communication or popular media, with this review often cited in support.

A closer look at the included studies, however, uncovers significant problems. The BMJ references six studies on the impact of drug colour on effectiveness, conducted between 1968 and 1978, all of which are impacted by some methodological or statistical issue.

Blackwell 1972

The first is Blackwell 1972, which was reviewed recently for The Skeptic. Fifty-six medical students were given either blue or pink placebo pills and asked to report which effects they felt. Blackwell claimed to show that blue pills made students feel ‘more drowsy’ and ‘less alert’, compared to pink pills.

Unfortunately this study is small, only single blind, and is based on self-reported effects – a combination which opens the door to all sorts of uncontrolled biases. Worse still, the students were primed with a list of possible effects, making it more likely they would report something even if nothing had changed. There are good reasons to doubt that the findings from this paper represent a real effect.

Schapira 1970

The next paper in the BMJ review is Schapira 1970. Forty-eight patients suffering with anxiety were treated with the anti-anxiety drug oxazepam over the course of several weeks. Although the dose was identical in each case, pills were presented in a range of colours. Patients self-reported their condition and were also evaluated by a clinician. Since the active drug was the same in all cases, Schapira suggests that any differences would be the result of the colour of the pills alone.

While the paper initially reports that anxiety was ‘most improved with green [pills],’ and depression ‘appeared to respond best to yellow,’ (and these are the findings cited by both the BMJ review and Ben Goldacre’s popular science book Bad Science), in fact neither of these findings reaches statistical significance. Schapira reports only one statistically significant finding with respect to pill colour, and that is for the effect of green pills on phobias. Even this finding is questionable, however, since it is based on the clinician reports only (no effect is found when considering the patient reports), and it involved just 17 of the 48 patients. The remaining 31 were not suffering with phobias, since participants were recruited for their anxiety symptoms. The fact that 17 of them also struggled with phobias was unintended and the analysis performed post-hoc.

This effect in phobias also disappears when the figures are properly adjusted to account for the fact that Schapira makes many different comparisons, across pill colour, symptom, and rating type.

Cattaneo 1970

Cattaneo 1970 examined 120 patients awaiting surgery on their varicose veins. The patients were randomly given either an orange or a blue pill and told they were sedatives to help them sleep. In fact, the pills had no drug in them at all. The next morning, patients were asked how long it took them to fall asleep, how long they slept, whether they felt rested, and which of the two pills they preferred.

In an odd leap of logic, Cattaneo asserts that whichever pill patients said they preferred must be the one which best helped them sleep, but it should be immediately obvious why this doesn’t necessarily follow. Perhaps orange is simply their favourite colour? Maybe they’re big Everton supporters and will pick a blue anything regardless of how well it does?

You may also reasonably ask why they are giving fake sleeping pills to patients who are awaiting varicose vein surgery? The paper claims that since they are awaiting surgery, therefore patients must be experiencing mild-to-moderate anxiety. Since they are experiencing mild-to-moderate anxiety, therefore they must have trouble sleeping. And since they cannot sleep, they must need a sedative. 

No effort is made to establish whether the patients actually are suffering with anxiety, nor is any effort made to establish whether they are struggling to sleep; these claims are just nakedly asserted.

An assortment of pills in a bowl, of different shapes, sizes and colours. There are triangular, circular and capsule-shaped pills. Colours include blue, orange, green, white and pink.
Will you pick the blue pill… or the orange pill? Image by Valeria GB from Pixabay

In fairness, the paper does claim that there is correlation between the ‘favourite pill’ and the other self-reported sleep quality scores gathered from the patients, but it makes no effort to quantify this statistically. The primary findings of the paper are then based on this self-reported preference, not the sleep quality scores.

Cattaneo reports that 41% of patients preferred the blue pill and 39% preferred orange. The remainder expressed no preference. Astute readers will doubtlessly have noticed that there is only a very small difference between 39% and 41%. In fact, if patients who expressed no preference are excluded, this is a 51/49 split. This is not a significant effect that can be generalised to the wider population, it is a coin flip.

With no meaningful effect in the overall analysis, the paper then breaks the results down by sex, claiming that men prefer the orange pills and women prefer the blue. This comparison just makes it to the common threshold for a significant finding, with a p-value of 0.042. As discussed in a previous article, the p-value represents the probability of obtaining these results even when there is no true effect. In this case, there is a 4.2% chance we would see these results even if there were no effect from pill colour by sex.

However, we should still be skeptical of the data, which appears to be the product of p-hacking. P-hacking refers to the practice of intentionally or unintentionally manipulating your analysis until you find some significant result and then reporting on that. For example, performing a subgroup analysis by sex when the overall analysis finds nothing. Even if we leave Cattaneo’s bizarre methodology aside, p-hacked data does not lead us to reliable conclusions.

Luchelli 1978

Luchelli 1978 took a similar approach, which is perhaps no surprise given that Luchelli is a co-author on the Cattaneo paper, and Cattaneo is a co-author on Luchelli. This time, 96 patients awaiting unspecified elective surgeries were recruited. The paper reports that all participants had significant sleep problems, including difficulty falling asleep, disturbed sleep, and an average sleep duration of ‘five hours or less.’

Patients were given either an orange-coloured sedative (heptabarbital), a blue-coloured sedative (also heptabarbital, in the same dose), an orange placebo, or a blue placebo. The following morning, they were interviewed to determine how long it took them to fall asleep, how long they slept, the quality of their sleep, and whether they woke feeling rested or groggy.

Unlike the earlier ‘pill preference’ metric, Luchelli used sleep onset and duration data directly in the analysis, which found two significant findings concerning pill colour. Patients who took blue pills fell asleep 32 minutes faster and slept 33 minutes longer on average. No statistically significant effect was observed for pill colour on sleep quality or grogginess.

Despite these results, there are significant issues with relying on self-reported data for metrics like sleep onset time. Be honest, can you recall the exact time you fell asleep last night or how long you slept? Such figures are difficult to report accurately, and there is substantial room for biases to affect these kinds of subjective measurements. Even if pill colour had no real impact, the effect of bias may result in an illusory effect being recorded in the data. Luchelli acknowledges these limitations but defends the approach, claiming ‘sound results have been obtained based on subjective assessments.’

Unfortunately, the paper doesn’t provide enough data to verify whether the statistical analysis was conducted correctly, but the results presented for the effect of blue pills as sedatives are marginal and would likely disappear if properly adjusted to account for the large number of subgroup comparisons made in the study.

Moreover, these subgroup findings contradict the overall analysis. For example, men taking orange capsules fell asleep faster and slept longer on the first night compared to those taking blue capsules, but on the second night, the opposite effect was observed. In women, the effect of blue capsules remained consistent across both nights, but orange capsules were more effective on the second night than the first.

Inconsistencies like these make it clear that, if pill colour has any true effect, it is variable and unpredictable. Given the variability in the data and the small sample size, it wouldn’t be surprising if just one or two outliers were skewing these results.

Nagao 1968

Nagao 1968 is a paper we unfortunately cannot examine in any great detail (despite my best efforts) as I’ve been unable to obtain a copy of the original text. But since it was published in Japanese, I’m unlikely to have understood it anyway. The BMJ does, however, outline the main findings: ‘79% of patients reported adequate pain relief with red pills, compared to 73% with white pills.’ We don’t know if this was a significant finding or not, since we can’t see the data, how the analyses were performed, which other comparisons were made, or the trial methodology.

One observation we can make is that these findings will be based on self-reported data, as there is no real alternative when measuring pain. Self-reported outcomes are especially susceptible to bias, and while it is unlikely that patients are being deliberately deceptive when reporting how much pain they are in, there are several psychological effects which can distort those reports. The subject-expectancy effect can result in patients reporting what they think should be happening, rather than what is actually happening. Social-desirability bias can result in patients reporting what they think is the most pleasing or acceptable answer.

Importantly, these sorts of effects (and dozens of others like them) can modify what is recorded in the data without necessarily changing anything about the patient. For this reason, it can be difficult to disentangle self-reported data from simple bias in studies like Nagao.

Huskisson 1974

Finally, Huskisson 1974 conducted a study on 24 patients with rheumatoid arthritis, each of whom required on-demand pain relief (in addition to their standard care) at least once per day. In a somewhat complicated design, patients were randomised to receive one of three active painkillers or a placebo, with the pills presented in pairs and in various colours, across several days. Patients self-reported their pain relief on a scale from 0 (no relief) to 3 (complete relief), and their responses were recorded hourly for six hours after taking the medication.

Huskisson found that pill colour did not significantly alter the effectiveness of the active painkillers, but patient-reported pain relief did vary by pill colour when administering placebos. No difference between the colours was found one hour after administration, but differences appeared at two hours (p < 0.02), three hours (p < 0.05), four hours (p < 0.02), five hours (p < 0.05), and six hours (p < 0.05).

Despite this, we should still interpret the data cautiously. First, the sample size was small, with at most six patients receiving each coloured placebo. Two patients dropped out of the study, but the paper does not specify why or which groups they left. Second, like Nagao, the results are likely to be influenced by reporting biases, as all the data gathered was self-reported by patients. Finally, even the statistically significant results were relatively marginal, with p-values ranging from < 0.02 to < 0.05. Given the number of statistical comparisons performed, it is likely that even these results would not remain significant if adjusted for the false discovery rate.

Interestingly, the red-coloured pills did not outperform the other colours when active painkillers were administered. This effect, if it represents a real phenomenon, was limited to the placebo pills. However, studies like Huskisson are often used to support the idea that the colour of medication can enhance its placebo effect, even while this study’s findings do not support such claims for active drugs.


Returning to the BMJ review itself, each of the six papers referenced is graded for its methodological quality. Blackwell, which claimed that blue pills made students less alert, scores 6.5/10. Schapira, which claimed yellow pills were better for depression and green are best for anxiety, scores 7/10, as does Cattaneo, with patients waiting for their varicose vein surgery. The remaining three studies, Luchelli, Nagao, and Huskisson, all score less than 5/10.

Taken together, the studies from the BMJ paint a picture not of a meaningful placebo effect tied to pill colour, but of random noise in poorly designed research. Perhaps not a surprise, given that most of it was conducted 20 years before the BMJ review, which itself is rapidly approaching 30 years old. The most impressive effects are found in the studies judged to have the poorest methodological quality, while the better-designed trials show little to no real effect.

This is a pattern we see frequently in pseudoscience: acupuncture, homeopathy, reiki, and similarly implausible therapies tend to show the biggest effects in poorly controlled studies. When proper controls are enforced, the effects disappear.

The BMJ tries to walk a middle ground. The review acknowledges that the evidence is inconsistent while still suggesting that colour might influence the effectiveness of a drug. It ends with a call for more research. And while it’s plausible that colour could change how patients perceive a treatment, the data don’t show a reliable and meaningful clinical effect. The studies that seem to support the idea are small, weak, and flawed. The more robust trials find little of interest.

So should we start colour-coding pills, based on what we think patients will respond to best? I’ve seen more than one commentator make this very suggestion, but absent any robust, reliable, and reproducible data showing that pill colour has a measurable impact on clinical outcomes, I would suggest we focus on what we can be sure actually works: proper treatment, solid evidence, and good science.

The Skeptic is made possible thanks to support from our readers. If you enjoyed this article, please consider taking out a voluntary monthly subscription on Patreon.

- Advertisement -spot_img

Latest articles

- Advertisement -spot_img

More like this