A review of 40 studies that evaluated mental health apps found that they all reported positive user-engagement scores—an unusual finding given that health apps are known to have problems keeping users engaged. Underlying problems noted in the review, published March 27 in Psychiatric Services in Advance, were that each study used a different set of subjective and/or objective measures, and none used consistent benchmarks to define a “positive” user experience.
“As with a medication, we need to make sure mobile apps are tolerable before we recommend them to a patient,” said John Torous, M.D., director of the Digital Psychiatry Division at Harvard-affiliated Beth Israel Deaconess Medical Center and a co-author of the study. Digital “tolerability” refers to whether an app is easy to use and is engaging so that it is used repeatedly. These findings indicate that app developers have their own idea of what constitutes usability, he said.
As Torous and his colleagues wrote in the article, “This lack of consensus makes it difficult to compare results across studies, hinders understanding of what makes apps engaging for different users, and limits their real-world uptake.”
Of the 40 studies in the analysis, nine evaluated mobile apps for depression, four for bipolar disorder apps, seven for schizophrenia apps, seven for anxiety apps, and 13 for apps designed for multiple psychiatric disorders. The studies were selected because they all reported user-engagement indicators (UEI), a variety of measures describing the degree to which users find an app easy to use and engaging.
All of the studies reported that their app had a positive UEI rating. Of these, 15 studies used only subjective data (such as participant surveys or interviews), four used only objective data (such as verified number of login sessions), and 21 used a combination of measures.
“It is concerning that 15 of the 40 (38%) studies concluded that their app had positive UEIs without considering objective data,” Torous and colleagues wrote.
“Qualitative data are unquestionably valuable for creating a fuller, more nuanced picture of participants. ... However, there is also a need for objective measurements that can be reproduced to validate initial results and create a baseline for generalizing results of any single study.”
A problem with the studies that used objective data, however, was that most (20 of 25) did not set predetermined thresholds for good scores in advance—all analyses were retrospective.
Of the studies that included both subjective and objective measures, many set low thresholds for a positive UEI rating. For example, one study considered a user-satisfaction score of 60% to be sufficient, while another required app users to complete only one-third of their assigned tasks in a week.
In addition to low thresholds within individual studies, thresholds were inconsistent across studies. For example, frequency of usage was a common objective marker, but acceptable usage rates varied from once a day to just a few times a month.
Torous acknowledged that each of these 40 mental health apps was developed for a different purpose; therefore, some variation is expected. Still, he believes it is possible to develop some usability standards to make comparisons and evaluations easier and more reliable.
This study was funded by National Institutes of Health career development awards given to Torous and study co-author Mia Minen, M.D. ■
“User Engagement in Mental Health Apps: A Review of Measurement, Reporting, and Validity” can be accessed
here.