Hotels and houseplants: why we should doubt Ellen Langer’s mind-over-matter miracles

Author

Mike Hallhttps://mikehall314.bsky.social/
Mike Hall is a software engineer and Doctor Who fan, not in that order. He is the producer and host of the long-running podcast Skeptics with a K, part of the organising committee for the award winning skeptical conference QED, and on the board of the Merseyside Skeptics Society.

More from this author

- Advertisement -spot_img

In a previous article for The Skeptic from January 2022, we discussed a paper titled ‘Mind-set matters: exercise and the placebo effect,’ which proposed that merely believing their work was good exercise led to significant health improvements among hotel room attendants. This 2007 study has been widely cited, both within academia and popular science literature.

For the study, eighty-four female room attendants from seven Boston-area hotels were divided into two groups. One group was informed that their daily tasks met the Surgeon General’s recommendations for a healthy and active lifestyle, while the other group was not. Four weeks later, the informed group showed improved physiological markers, such as lower weight, BMI, body fat percentage, and blood pressure, apparently without any additional exercise or dietary changes. The study concluded that perceived exercise could result in physiological improvements.

There are good reasons to be skeptical of these conclusions, including the small sample size and loose controls. Most notably, an attempted replication in 2011 yielded no significant differences between the informed and control groups.

More recently, one of the authors of this paper, the Harvard psychologist Ellen Langer, was interviewed for a podcast published by Freakonomics. During the interview, Langer spoke about the room attendant study, and it was disappointing that neither she nor the host, Steven Levitt, referenced the failed replication. However, references were made to several other studies Langer was involved in.

Langer’s most famous work is ‘Long Term Effects of a Control-Relevant Intervention with the Institutionalised Aged’, published in the Journal of Personality and Social Psychology in 1977. Ninety-one nursing home residents were split into two groups. One group was given a lecture about personal responsibility, and to illustrate this were each given a houseplant to look after. The other group were also given houseplants, but received no such lecture and were told the nursing staff would tend to the plants for them.

Image of a houseplant. Image by Leonardo Iheme from Pixabay.

After eighteen months, the study reported that participants who had cared for the houseplants were happier and more engaged, according to both self-reports and assessments from the resident nurses. But the key finding was that people who looked after houseplants had significantly lower mortality than those whose plants were watered for them.

As with the hotel room attendant study, there are serious issues with this work, not least the randomisation technique. Rather than randomising individuals, the randomisation was actually done by floor. So everyone on one floor was put into the intervention group and everyone on the other floor was in the control group. This was done ostensibly to prevent participants from chatting with their neighbours and discovering that other people were given a different protocol, but there could be significant confounders introduced by the choice to randomise by floor. Maybe residents on higher floors are more fit and well than the lower floors? That’s why they’re on the upper floors, they can get up the stairs. Maybe one floor is nicer, or cleaner. Maybe the windows open. Maybe they get more sunshine. Maybe the staff assigned to those floors are nicer, or better trained, or more diligent.

Similar randomisation issues were present in the hotel room attendant study. There too, participants were not randomised as individuals, but were randomised by hotel. So all the attendants in four of the hotels were put into one group, and all the attendants from the remaining three hotels were put into another group. The hotels were randomly assigned, but the participants were not. Again, this was done to avoid the housekeepers chatting amongst themselves and discovering they had been told different things, but does introduce an uncontrolled bias into the data.  It’s not inconceivable that something about one or more of the hotels could mean attendants working at those hotels engage in more or less exercise.

Besides randomisation, there are other problems with the houseplant study. The sample size is small, for example, with just ninety-one participants. But the major problem is that the calculations used to demonstrate the significance of this finding are just flat wrong. A year after the houseplant study was published, a short correction was issued:

The z-score should be changed from z = 3.14 to z = 1.73. The outcome is therefore only marginally significant and a more cautious interpretation of the mortality findings than originally given is necessary.

The z-score is a measure of how many standard deviations away from the mean a data point is. And the z-score in the original paper was incorrectly given as 3.14 – so over three standard deviations away. In the correction, they give a new score of 1.73. This is described as meaning the findings are ‘only marginally significant’, but actually the z-score puts these figures well outside of the accepted threshold for significance. These findings are what we would expect to see from random chance; there was no true effect on mortality.

Despite this, the original paper has been cited over 900 times, and the correction possibly as few as three. Disappointing too is the fact that, while the houseplant study is mentioned in Langer’s Freakonomics interview, the subsequent correction and revelation that this was not a statistically significant effect was not brought up.

Another famous study of Langer’s is the so-called ‘Counterclockwise’ study. Two small groups of elderly men – in their 70s or 80s – were taken to a retreat, which was meticulously set up to resemble 1959. This experiment was done in 1979, so 1959 was twenty years earlier at the time.

They were surrounded by technology, newspapers, and music from the 1950s. One group was encouraged to reminisce about their lives in the 50s, but the other was encouraged not to simply reminisce but also to behave as if it really were 1959. They should speak about 1959 in the present tense, and they were asked to write autobiographies about their lives in the 50s, in the present tense.

This experiment supposedly found that the participants who behaved as if it were 1959 showed improved hearing, memory, dexterity, and even took on a younger appearance. Surprisingly, this widely-cited and influential work was never published in a peer-reviewed journal. The findings were instead reported in Langer’s 2009 book ‘Counterclockwise’. Despite the intriguing results, this study was never peer-reviewed or replicated. In 2019, a protocol for a replication was published, but at the time of writing the results are still pending.

Another Langer study, this one from 2016, claimed that time perception influenced blood sugar levels in people with type 2 diabetes. Forty-seven participants were randomised into three groups: Fast, Normal, and Slow. Those in the Normal group were put into a room where they were asked to play simple video games. Blood glucose levels were measured before they started playing, and again after ninety minutes.

Patients in the Slow group were asked to do the same, but in this case the clock in the room was rigged to run at half-speed. When their blood was taken after ninety minutes, they believed they had only been playing for forty-five. The Fast group did the same thing, but with their clock rigged to run at double speed, so when their blood was taken they believed they had been playing for three hours.

The results appear to show that patients’ blood glucose levels dropped in response to perceived time, not actual time. Patients in the Slow group, who believed only forty-five minutes had passed, had a much smaller decrease in blood glucose than those in the Normal or Fast groups, which had larger decreases.

Unfortunately, there are several of the same confounding factors we have seen in Langer’s other work. There is a very small sample size; forty-seven participants split over three groups. Such small numbers mean that just one or two outliers could skew the results one way or the other. The paper also indicates the participants were blinded, but doesn’t say if the researchers were. Presumably not, otherwise this would have been mentioned.

Hands holding a video game controller. Image by Anton Porsche from Pixabay.

Another possible confounder is stress, with one group feeling like they had been there for an eternity, and the other surprised their task is whizzing by. Stress modifies blood glucose, and the relative differences in stress across the three groups could plausibly account for the effect seen. There are efforts in the protocol to control for this, by asking the participants if they’re feeling stressed, which they say that they’re not. But they don’t look for biomarkers of stress, like serum cortisol levels, which may have told a different story.

If this is a true effect, there is surely some intermediate factor accounting for this, rather than the biologically implausible claim that glucose metabolism is directly driven by time perception. No independent replication has been published.

Another time perception study by Langer was published in Nature in 2023. Thirty-three participants were given standardised bruises by applying suction to their skin. They were then split, as before, into Fast, Normal, and Slow groups. The study claims that wound healing aligned with perceived time rather than true clock time. However, the same methodological issues remain, such as small sample size and lack of blinding.

There are also reasonable questions about how much a bruise would be expected to heal over the course of twenty-eight minutes; the true clock duration of this study. The healing assessments in this case were done by a combination of self-reports, and non-expert assessment using a crowd-sourcing platform. Like the blood glucose study, this effect has not been replicated – though in fairness it was published only a few months ago.

Her proposed study would have 24 women with Stage 4 breast cancer… encouraged to behave as if it were 2003, as if they did not have cancer, with the environment around them designed to reinforce that. After a week, measurements would be taken to see if their tumours had shrunk

Perhaps of even greater concern is the work Langer has not published. In an interview for the New York Times in 2014, Langer spoke about applying her ‘Counterclockwise’ method to cancer patients. Her proposed study would have twenty-four women with Stage 4 breast cancer taken to a private resort in Mexico, outfitted to resemble 2003. The women would be encouraged to behave as if it were 2003, as if they did not have cancer, with the environment around them designed to reinforce that. After a week, measurements would be taken to see if their tumours had shrunk, and to check for other biological markers of cancer.

I don’t know if this study ever went ahead. According to the New York Times, the study was set to commence in spring of 2015 – but at the time of writing no results have been published.

Other unpublished studies include one in which Langer claims to have found that breast cancer survivors who describe themselves as ‘in remission’ showed poorer health than those who described themselves as ‘cured’. Another study, described as being ‘in progress’ in the New York Times interview, asks if mindfulness can stem the progression of prostate cancer.

While Langer’s work has been influential and is often cited, there are significant concerns about the validity of her findings. Spurious effects stemming from statistical errors, and sensational findings that either aren’t replicated or fail to replicate. Biologically implausible results without clear mechanisms, loose controls, and very small sample sizes. As ground-breaking as this work appears to be, we should approach any such findings with skepticism, at least until they are consistently validated and independently replicated using more robust methodology.

The Skeptic is made possible thanks to support from our readers. If you enjoyed this article, please consider taking out a voluntary monthly subscription on Patreon.

- Advertisement -spot_img

Latest articles

- Advertisement -spot_img

More like this