A shortage of behavioral health providers is likely to persist for the foreseeable future. A 2016 U.S. Department of Health and Human Services projection assessed workforce gaps in all mental health professions, estimating that more than 10,000 psychiatrists, psychologists, social workers, and counselors will be needed for 2025 and beyond (
1). Since that projection, mental health needs have only burgeoned, and the disparities in access to care have been exacerbated (
2).
With looming workforce shortages and an existing lack of capacity to meet needs, it is useful to explore alternative approaches to providing mental health support. Such alternative pathways can allow for a refocusing of professional clinicians on patients who most need their assistance (
3). In this context, evidence on the effectiveness of provision of mental health support by a range of paraprofessional providers has accumulated. In the United States, these paraprofessionals include community health workers, peer support specialists, and lay counselors (
3,
5); training requirements range from a few hours to months, and some positions require certification. Responsibilities range from helping patients navigate the health care system to directly delivering elements of cognitive-behavioral therapy (CBT).
In 2020, in response to the COVID-19 pandemic, we implemented an empathetic relational program to address loneliness through 4 weeks of telephone calls to clients of our local Meals on Wheels America (MoW) service. The program emphasized caller empathy and curiosity and structurally prioritized participants’ wishes regarding when and how often to be called. At the end of 4 weeks, we assessed the results of our randomized controlled trial. As we had hypothesized, we observed statistically significant improvements in loneliness, relative to control participants, and (notably) also in symptoms of depression and anxiety and in overall mental health (
6).
As clients of MoW, the participants of the original study had food insecurity and low income and, at some point, had been homebound, as required to qualify for MoW services. No inclusion criteria regarding mental health were set. Through our baseline assessments, we found that the sample studied was lonely, on average, but not depressed. We wondered how much of an improvement we might see if the population studied had depression. The answer to that question might help us understand the program’s potential to augment our health system’s capacity to address clinically relevant mental health needs. Therefore, the sample for this post hoc analysis consisted of the subset of patients from the original sample who showed clinically relevant symptoms of depression at baseline. In this report, we assess the degree of improvement in symptoms of depression and other mental health outcomes among this participant subsample.
Methods
For the parent randomized controlled trial (
6) (July 6–September 24, 2020), we recruited 240 people from the clients of the local MoW. Of these participants, 190 (79%) were women, 135 lived alone (56%), and 194 (81%) were single, divorced, separated, or widowed. The participants ranged in age from 27 to 101, with an average age of 69. The study was approved by the institutional review board of the University of Texas at Austin.
The program was implemented by 16 callers, ages 17–23, who volunteered their time but were paid a stipend after the trial concluded. Callers received 2 hours of training, including watching a video of a 15-minute role-play to heighten awareness of how to listen and ask questions about topics that participants mentioned (e.g., a neighbor or a meal). In addition, callers were oriented toward a goal of learning as much as they could about the person they were calling.
Participants were recruited from a list of potentially interested participants developed by MoW, provided consent over the telephone, and were randomly assigned to a study arm, via REDCap software, by the biostatistician (N.A.). Psychological assessments were conducted over the telephone by research associates, who were not the intervention callers and who were blind to allocation arm. Depression was measured by the eight-item Patient Health Questionnaire (PHQ-8). Anxiety was measured by the seven-item Generalized Anxiety Disorder (GAD-7) assessment. Overall mental health was assessed via the 12-Item Short-Form Health Survey mental component summary (SF-12 MCS). Perceived loneliness or isolation was assessed with the three-item University of California, Los Angeles loneliness scale (UCLA) (
7) and via the six-item De Jong Gierveld loneliness scale (De Jong) (
8). Isolation was measured by the Lubben Social Network Scale (LSNS). All measures were administered at baseline (before calls began) and at postintervention (after 4 weeks of calls).
Each caller had a panel of 6–9 participants in the intervention arm, with each participant (N=120) assigned to a consistent caller in the order they were recruited to the study. The program began with each participant in the intervention arm receiving one call per weekday for the first week (5 calls) at the time the participant preferred. At the end of the first week, the callers asked these participants whether they wanted to continue with a daily call or reduce the frequency of calls; 70 of 120 (58%) stayed at five calls per week, and the rest reduced the number of calls to two or three per week. Participants randomly assigned to the control arm received only assessment calls (at baseline and 4 weeks) from research associates who were distinct from the trained intervention callers.
Data were analyzed from a subsample (N=58) of intervention- and control-arm participants whose baseline PHQ-8 scores were ≥10. In the intervention arm, 28 of 120 (23%) participants had baseline symptoms of depression. In the control arm, 30 of 120 (25%) had baseline symptoms of depression. These 58 participants did not differ by gender, age, or racial-ethnic identity between treatment arms. For this subsample, time between the baseline and postintervention assessments averaged 31.7 days (range 29–34 days). Mixed linear regressions, with random intercepts for all outcomes relevant to sense of well-being, were used to examine whether the group × time interaction was significant among this subset of participants. These outcomes consisted of scores on the PHQ-8, GAD-7, UCLA, De Jong, LSNS, and SF-12 MCS. Cohen’s d was calculated to estimate effect sizes.
Results
Retention among participants with baseline symptoms of depression was excellent, with 26 of 28 in the intervention arm and all 30 in the control arm providing data at the 4-week assessment. Gender and self-reported race-ethnicity distributions in the subset of participants included in the current analysis were similar to those of the whole sample from the parent trial. Participants in the current analysis were on average younger (mean±SD=63±11 years) than those excluded (mean±SD=71±12 years, p<0.001). Among the participants who provided postintervention assessment data, 45 of 56 (80%) were women; 35 lived alone (63%); and 48 (86%) were single, separated, widowed, or divorced.
Table 1 shows the results of our analysis, including means, standard errors, and 95% CIs, before and after the intervention for the subsample with baseline depressive symptoms, along with p values for the interaction of time × group from mixed linear regressions. We found that symptoms of depression, as assessed by the PHQ-8, improved for participants in the intervention arm (mean±SD=13.0±2.6 at baseline and 9.2±3.0 postintervention, mean difference=3.8, 95% CI=2.9–4.7) compared with participants in the control arm (mean±SD=13.6±2.9 at baseline and 12.3±4.6 postintervention, mean difference=1.3, 95% CI=0.1–2.5) (p=0.013). In addition, for participants in the intervention arm, loneliness also improved, as measured by the three-item UCLA (mean difference=1.5, 95% CI=1.1–1.9, p<0.001) and the 6-item De Jong (mean difference=0.6, 95% CI=0.3–1.0, p=0.02) assessments, compared with control participants. Anxiety, as measured by the GAD-7, and general mental health, as measured by the SF-12 MCS, did not show statistically significant time × group interactions. In contrast to subjective improvements in perceived isolation or loneliness, objective isolation, as measured by the LSNS, did not change.
Postintervention effect sizes between the intervention and control arms were 0.77, 0.81, and 0.87 for the PHQ-8, UCLA, and De Jong scores, respectively. Within-subject correlations over time were 0.39, 0.58, and 0.73, respectively, for the same three instruments. With the use of PASS 2022 software (
9), we estimated the observed power for mixed-model tests for the fixed slope difference, in a two-level hierarchical design with random intercepts, as 0.74, 0.91, and 0.99 for the PHQ-8, UCLA, and De Jong scores, respectively.
Discussion
The findings from this post hoc subgroup analysis showed that after 4 weeks, the intervention of layperson-delivered empathetic telephone calls meaningfully decreased symptoms of depression and loneliness among participants who had clinically relevant symptoms of depression at baseline. Anxiety, general mental health, and social isolation did not change for this subgroup.
We observed large effect sizes of 0.77 (PHQ-8) and 0.81 (UCLA) for the subsample of people with depressive symptoms at baseline, with corresponding observed power values that were adequate or better (0.74 for the PHQ-8 and 0.91 for the UCLA). By contrast, the effect sizes among the original sample, who were not selected for baseline symptoms of any mental health condition, were in the moderate range (0.31 for the PHQ-8 and 0.48 for the UCLA). Anxiety symptoms did not improve for this subsample, although they did for the original sample. On the other hand, similar to symptoms of depression, loneliness scores showed larger effects in the subsample than in the original sample, reinforcing the relationship between loneliness and depression (
10). Finally, social isolation, measured as the number and quality of connections with family and friends (
11), remained the same as in the original sample. The introduction of a layperson caller into participants’ networks was insufficient to influence this overall score.
The effect size of 0.77 on the PHQ-8 score that we observed compared well to those reported for psychotherapy among adults with symptoms of clinical depression. For example, a systematic review of trials of psychotherapy for depression (
12) reported a mean effect size of 0.71 for adults >55 years of age. Other researchers studying psychotherapy (i.e., elements of CBT) delivered by layperson providers have also reported equivalent effect sizes. For example, in a MoW population similar to the population in the current study, “lay” counselors with 50 hours of training in delivering behavioral activation therapy were compared with clinicians delivering problem-solving therapy; effect sizes of 0.61 and 1.00 relative to “attention control participants” (i.e., those who received general check-in calls from research assistants) were observed (
13). The time course of the trial was similar, and effects were also seen at 12 weeks after the start of the 5-week program. A smaller trial (
14) showed similar promising results: participants benefited from peers who had received 8 hours of training in delivering behavioral activation therapy. Such similar effect sizes from layperson providers with brief training requirements hold promise for health care systems that currently cannot meet existing needs.
The program we reported on had the benefit of volunteer callers, offering empirical support for the work of many organizations, from religious institutions to student organizations, that coordinate volunteers to relieve loneliness among community members. In health care systems, however, opportunities to employ a workforce of carefully selected and trained layperson providers could create a large pool of culturally aligned callers (
5,
16). Stepped care models have been proposed that use layperson providers to treat people experiencing mild and moderate mental health challenges, with treatment of nonresponders and those with worse conditions escalated to receive care from physicians or other clinicians (
3). Such services would not only address mental health workforce gaps but also might feel less threatening to potential clients and bring more people into care (
17). The relational model used with layperson callers provided frequent contact, allowing the caller to detect participant changes in affect; future models could be developed to support transitions to clinician care as necessary.
Limitations of this investigation included its design as a post hoc analysis that assessed a small, specific population. Results need to be confirmed on a larger scale. The 4-week protocol could also be extended to better understand the incremental benefit of more time in the program. Finally, our study was unable to elucidate mechanisms of action or moderators of effect. A longer time course and other designs may shed light on whether, for example, improvement in loneliness enabled the improvement in depression.
Conclusions
Empathetic callers such as those in our program—as well as layperson providers who deliver CBT, a role that is currently being tested—have the potential to improve the mental health of patients and to reduce the burden on health care systems. To accelerate the potential of these layperson roles, the mental health field needs to continue to replicate and refine programs. With the urgent need to address mental health workforce shortages, however, mental health care would also benefit from parallel acceleration of the systemic changes needed to sustain programs that have been proven to be effective. These changes include development of payment models that allow for flexibility in staffing and program implementation, metrics that help set the bar for quality, and protocols for clinical coordination. The opportunity exists to expand health care systems to better meet the treatment needs of patients, while clearing the way for clinicians to focus on those who need them most.