‘Nerdstock’ was the less-than-flattering name that BBC Four gave to Robin Ince’s ‘Nine Lesson and Carols’ when it was broadcast in early 2010. I was unable to attend in person, so I was more than happy to discover that the BBC had recorded it for later broadcast, even if the title wasn’t my cup of tea.
One particularly vivid recollection is of Ben Goldacre, author of the excellent book Bad Science, bounding on stage with that terrific, restless enthusiasm, and giving a whistle-stop tour of the amazing power of the placebo effect and its evil twin, the nocebo.
‘Pacemakers,’ he spilled into the microphone, ‘improve congestive cardiac failure after they’ve been put in, but before they’ve been switched on!’
The comment was clearly crafted and timed to elicit the laugh it earned, but it was also a comment which made me sit up and pay attention. Pacemakers improve congestive cardiac failure after they’ve been put in, but before they’ve been switched on? As Goldare quickly commented, this is a ‘properly outrageous’ finding. And it’s one which piqued my interest.
The study behind this claim was published in the American Journal of Cardiology in 1999. Linde et al. recruited 81 patients with obstructive hypertrophic cardiomyopathy (HOCM, a thickening of the heart wall) and fitted them with pacemakers. The patients were randomly assigned to one of two pacemaker settings: ‘atrioventricular synchronous pacing’ or ‘atrial inhibited mode’ at 30 beats per minute.
In AV synchronous pacing, the pacemaker coordinates the electrical activity between the atria and the ventricles. It detects the natural electrical signals of the atria and delivers a pacing pulse to the ventricles with an appropriate delay, mimicking the heart’s normal conduction pathway and ensuring proper timing between contractions. In other words, the pacemaker is switched on.
In contrast, the atrial inhibited mode only stimulates the heart if it fails to beat normally. In this study, the inhibited pacemakers were set to 30 beats per minute, meaning the heart would have to pause for two full seconds before a pulse was triggered. Since this won’t typically happen, these pacemakers were effectively placebos. They are fully implanted, but do not stimulate the heart.
Several key metrics were recorded at baseline. The New York Heart Association (NYHA) functional classification is a scale to measure the extent of the patient’s heart failure, ranging from one (you have no symptoms and no limitation on physical activity) to four (severe limitation, symptoms even at rest, usually bedridden).
Another metric was left ventricular outflow tract (LVOT) gradient, a measurement of the pressure difference between the aorta and the left ventricle during blood ejection. In patients with HOCM, higher pressure is expected because the thickened heart muscle forces the same blood volume through a narrower passage.
Linde also recorded the time in minutes each patient could tolerate exercise, their peak oxygen uptake, their peak heart rate during exercise, and systolic anterior motion, an abnormal movement of the mitral valve.
Three months after implantation, the patients were brought back and the measurements were repeated. Linde reported that patients with inactive pacemakers saw a statistically significant decrease in their LVOT gradient.
Pacemakers improve congestive cardiac failure after they’ve been put in, but before they’ve been switched on!
Of course, there is far more nuance to this study than that.
As we have touched upon in previous articles, statistical significance is typically measured using a p-value or probability value. This is a number between zero and one which expresses the probability of getting these results, even if there is no true effect.
By convention, scientists have collectively agreed that the threshold for what is considered ‘significant’ should be 0.05, or 5%. While this is a widely accepted convention, it is ultimately arbitrary. Some researchers advocate for a stricter threshold, moving it from 0.05 (5%) to 0.01 (1%) or even to 0.005 (0.5%).
The LVOT gradient change reached statistical significance with a p-value of 0.04. Although this meets the conventional threshold, it is marginal and would not qualify as significant under stricter criteria. By contrast, the change in LVOT with the active pacemaker had a significance level of < 0.0001 – below one tenth of one percent!
Furthermore, the other measurements don’t support the claim that this was a real clinical improvement. Exercise tolerance and peak heart rate significantly improved in the active pacemaker group but showed no improvement in the placebo group. There was also no change in the NYHA functional class for the placebo group, meaning these patients did not experience an improvement in their overall heart failure symptoms. In contrast, the active pacemaker group improved by a full class, on average, from 2.6 to 1.7.
In total, of the six objective measures assessed, five showed no significant change for the placebo pacemaker group. The only measure that showed an effect, LVOT gradient, was borderline significant, and is likely spurious.
A p-value is meaningful for an individual outcome measure, but every additional measure you make is another opportunity to record a fluke finding by chance. Measure two things, you’re twice as likely to find something significant. Three things, and you’re three times as likely, and so on. If Linde’s figures were correctly adjusted to account for the many different outcomes recorded, the LVOT gradient change would be exposed as random noise.
Alongside the objective measurements recorded so far, Linde also records several subjective measurements, such as chest pain, dizziness, and reported palpitations. As we have discussed elsewhere, subjective measurements must be interpreted cautiously because of potential for them to be modified by bias. Subject expectancy effects, answers of politeness, and other forms of response bias can lead to patients reporting changes which don’t actually reflect any real world difference. Bearing this in mind, it is notable that of the 14 subjective outcomes which were measured, only five showed a significant effect in the placebo group, and all but one of these (palpitations) disappear when an adjustment is made for the number of outcomes measured in this study.
Another problematic aspect of this study was that three patients had their placebo pacemakers reconfigured for ‘active’ pacing part way through the study, because they complained to their doctors that the treatment wasn’t working. The paper isn’t clear what happened to the data from these patients. Was it removed from the analysis altogether? It doesn’t appear to have been, as the size of the inactive pacing group is unchanged at the end. Were the patients assessed early and their data included anyway? No such early assessment is mentioned. Unfortunately, either approach risks skewing the data in favour of the placebo effect by deemphasising the patients who failed to respond to placebo.
In short, while Linde does not provide strong evidence for a real therapeutic placebo effect, it is the kind of research that continues to be cited as evidence for the power of placebo. The reported effects are vanishingly small, and the clinical utility is dubious. I remain skeptical.