Using longitudinally collected patient-reported outcome measures (PROMs) for patients receiving care from a large national behavioral health care system—where patient engagement is strongly encouraged—we aimed to assess the strength of improvement in self-perceived recovery (hereafter, recovery) during patients’ treatment episodes via scores on the 41-item Recovery Assessment Scale (RAS) collected at admission and discharge in analyses that accounted for clinical and patient characteristics. We hypothesized that RAS scores would improve during treatment, with more substantial improvement expected among patients with lower initial scores because of the greater potential for improvement in this group.
Methods
This study was based on longitudinal information obtained from electronic health records (EHRs) of patients receiving care at Discovery Behavioral Health (DBH) in 2021–2022. DBH is a large U.S. population–based behavioral health care system, serving >10,000 patients in 16 states across >150 facilities annually. DBH offers mental health services for all ages in a care continuum of residential treatment centers (RTCs), partial hospitalization (PHP), and intensive outpatient programs (IOPs).
To assess underlying disease severity, we restricted the cohort to those with a recorded RAS score and a Patient Health Questionnaire–9 (PHQ-9) score (to assess depression severity) at admission. To evaluate changes in recovery during a treatment episode, the final cohort was further restricted to patients with a recorded RAS score at discharge. (Clinical and other patient characteristics of the full data set are shown in the online supplement to this report.)
The RAS is a self-report quantitative instrument designed to measure various recovery-related topics. The 41-item RAS, as implemented at DBH, consists of 41 statements, which are each rated by patients on a 5-point Likert scale (ranging from 1, strongly disagree, to 5, strongly agree), with total scores ranging from 41 to 205 and lower scores indicating less recovery progress in the area assessed. The RAS was developed for patients with severe mental disorders and has shown good test-retest reliability (r=0.88) and internal consistency (Cronbach’s α=0.93) (
1,
4,
10). The 41 statements are often grouped into five domains: personal confidence and hope, willingness to ask for help, goal and success orientation, reliance on others, and no domination by symptoms (
11). (Definitions of each RAS item and domain are shown in the
online supplement.)
We categorized patients into seven groups on the basis of their total RAS score at admission (hereafter, RAS score at admission groups): 41–82, 83–103, 104–123, 124–144, 145–164, 165–185, and 186–205 (which correspond to an average response to each of the 41 items of 1.0–2.0, >2.0–2.5, >2.5–3.0, >3.0–3.5, >3.5–4.0, >4.0–4.5, and >4.5, respectively). (The distribution of total RAS scores at admission in the final cohort is shown in the online supplement.) All patients for whom RAS scores were available provided responses to each of the 41 items. Thus, lower total scores were not driven by nonresponse to individual items. In secondary analyses, we created similar groupings for each RAS domain (based on the average score on each item included in the respective domain).
A broad range of characteristics were assessed, including demographic characteristics (i.e., age, self-reported race, ethnicity, and sex assigned at birth), primary diagnosis, admission PHQ-9 score (total scores range from 0 to 27 contributed by nine items; each item is scored on a 4-point scale, ranging from 0, not at all, to 3, nearly every day, with higher scores indicating greater depression severity), service line (i.e., general mental health, eating disorder, or substance use disorder), program type (RTC or PHP/IOP), number of previous treatment episodes within the study period, treatment duration, and reason for discharge (e.g., against treatment advice or completed treatment).
The distribution of prespecified covariates was compared across the RAS score at admission groups. Linear regression models were used to evaluate the average change in total RAS score from admission to discharge and the average total RAS score at discharge with 95% confidence intervals, for the full cohort and for each RAS score at admission group. In addition to crude analyses, we performed two levels of adjustment. In level 1, we controlled analyses for demographic variables and accounted for patients contributing multiple treatment episodes. In level 2, we additionally adjusted analyses for censoring bias by using inverse probability of censoring weights constructed within the broader cohort with an admission RAS score, with or without discharge RAS score, by using the same baseline covariates. This approach allowed us to address the concern that patients with a discharge RAS score might be different from those without a discharge RAS score (e.g., the latter may have been more likely to discontinue treatment against professional advice because of a lower perceived need for treatment [
12,
13]).
To identify heterogeneity in recovery across key patient characteristics, we conducted subgroup analyses separately for children and adolescents versus adults, individual race and ethnicity groups, patients who completed treatment versus those who did not, those with specific PHQ-9 scores at admission (categorized as 0–9, 10–18, and 19–27, corresponding to an average response to each of the nine items of 0–1, >1–2, and >2–3, respectively), those treated in RTC versus PHP or IOP settings within the respective service line (general mental health, eating disorder, or substance use disorder), and patients stratified by treatment duration (≤30 days, 31–60 days, and ≥61 days). Because depression was the most common primary diagnosis among patients in the general mental health service line, and alcohol use disorder was the most common primary diagnosis in the substance use disorder service line (see the
online supplement), additional analyses were conducted on subgroups with or without these primary diagnoses in their respective service lines. To further evaluate whether RAS score changes from admission to discharge could be driven by differential treatment duration, we conducted a robustness test with analyses adjusted for treatment length. To account for underlying depression severity, we also adjusted the analyses for total PHQ-9 score at admission. We repeated the primary analyses separately for each RAS domain to study variation in findings by recovery domain. Finally, we conducted analyses to evaluate the potential for regression to the mean (RTM) by using an extension of the algorithm of Mee and Chua (
14,
15) with systematic variation of the assumed population mean RAS score at admission (see the
online supplement). RTM is a statistical phenomenon in which extreme values on a first measurement (e.g., low RAS scores at admission) tend to be closer to the average on subsequent measurements (e.g., RAS scores at discharge), which may create the impression of a significant change even in the absence of an actual effect.
All analyses were conducted in SAS, version 9.4. Changes in total RAS scores from admission to discharge were considered statistically significant (corresponding to a two-sided p<0.05) if their 95% confidence intervals did not intersect zero (indicating no change). Similarly, differences between RAS score at admission groups were considered statistically significant if their 95% confidence intervals did not overlap, suggesting a significant difference between groups. However, when interpreting the results, we focused on the magnitude and precision of the effect estimates, as indicated by the width of 95% confidence intervals, rather than solely classifying them as statistically significant or not (
16). We further focused primarily on differences among RAS score at admission groups to assess the extent to which changes in RAS scores depended on the initial score at admission, but we also provide results from main and secondary analyses for the full cohort in the
online supplement. This study was approved by the institutional review board of Brigham and Women’s Hospital, which waived the need for informed consent.
Results
Among 20,770 patients (contributing 27,002 treatment episodes) receiving treatment at DBH in the 2021–2022 period, 15,059 patients (representing 18,147 episodes) had RAS and PHQ-9 scores recorded at admission, 9,441 of whom (representing 10,496 treatment episodes) had an RAS score recorded also at discharge and were therefore included in the cohort (a flow diagram is shown in the online supplement).
When patients were grouped by total RAS score at admission, group sizes ranged from 51 (for the RAS score at admission of 41–82 group) to 3,604 (for the RAS score at admission of 145–164 group). Patients with lower RAS scores were more likely to be treated in an RTC setting (e.g., N=39 [76.5%] for RAS score at admission of 41–82 vs. N=464 of 816 [56.9%] for RAS score at admission of 186–205) and in the general mental health service line (N=33 [64.7%] vs. N=208 [25.5%], respectively); to be children or adolescents (N=33 [64.7%] vs. N=290 [35.5%], respectively), of Asian or other or unknown race (N=19 [37.3%] vs. N=184 [22.5%], respectively), non-Hispanic (N=32 [62.8%] vs. N=451 [55.3%], respectively), and assigned female at birth (N=33 [64.7%] vs. N=427 [52.3%], respectively); and to have higher PHQ-9 scores (mean±SD score=19.5±8.6 vs. 6.5±7.0, respectively) (see the online supplement).
When we assessed the average RAS score change from admission to discharge and discharge RAS score both overall and for each of the RAS score at admission groups, findings were consistent across the two adjustment levels (see the
online supplement). On average, we observed a 26.0-point (95% CI=24.6–27.4; level 2–adjusted analysis) increase in total RAS score from admission to discharge in the full cohort. However, this change strongly varied across the RAS score at admission groups: patients belonging to a group with a lower RAS score at admission had on average very strong improvements in their score, whereas those who started with a high score and therefore had limited room for improvement, had discharge scores close to their admission score. For instance, in the level 2–adjusted analysis, patients with an RAS score at admission of 41–82 had on average a 78.4-point (95% CI=65.3–91.6) increase in total RAS score at discharge, whereas those with an RAS score at admission of 186–205 showed a small downward trend in the RAS score (mean change=−4.4, 95% CI=−6.3 to −2.4). Even though patients with a lower RAS score at admission had a significant score increase during the treatment episode, their total discharge score was still markedly lower compared with the scores of those with a higher RAS score at admission (e.g., mean total discharge RAS score of 146 [95% CI=134–159] for those with an RAS score at admission of 41–82 vs. 189 [95% CI=187–191] for those with an RAS score at admission of 186–205) (
Figure 1).
When we stratified the analyses by race, for all RAS score at admission groups, patients identifying as Black had on average lower total RAS scores at discharge compared with patients of other racial groups, but we observed no clear differences among other racial groups. For other subgroup analyses, and when we accounted for treatment duration and PHQ-9 score in analyses, changes in RAS scores were overall consistent with those of the main analysis. Although we observed some differences in these changes across subgroups, these were typically seen in groups of very small sizes. Results from the RAS score domain models were also similar to the main findings, and the average score change from admission to discharge for each RAS score at admission group was similar across domains. Only scores for the domain “no domination by symptoms” did not change as much from admission to discharge compared with the changes observed for the other domains. The results of the RTM analyses indicated that RTM alone would fully account for the observed RAS score change only under strong assumptions about the true underlying population mean RAS score at admission (see the online supplement).
Discussion
Using a large and diverse cohort of patients receiving treatment within a population-based behavioral health care system, we found that patients with lower recovery scores at admission experienced substantial improvements in recovery during the treatment episode. These results were generally consistent across subgroups. These findings are encouraging because they suggest that patients with the most significant recovery challenges can make considerable progress during treatment.
Despite these marked improvements, RAS scores at discharge were still markedly lower for patients who had lower RAS scores at admission compared with patients with higher RAS scores at admission. Potential explanations for this difference might include greater symptom severity (beyond depression severity assessed with the PHQ-9), presence of comorbid conditions, less baseline functioning, less resilience and external support, or other psychosocial factors; some patients with lower initial recovery scores may have had a length of treatment that was sufficient for reducing their symptoms but insufficient for these patients to fully realize their recovery potential. These potential recovery factors highlight the complexity and individual trajectories of the recovery process and underscore the need for future research to explore the individual factors explaining the variability in improvement across different patient populations. The smaller change observed in the RAS domain “no domination by symptoms” compared with the magnitude of the changes in the other domains may indicate the complexity of altering self-perception related to symptom impact (e.g., patients might still experience fear about their symptoms returning and affecting their daily life). We also observed lower average discharge scores among Black patients compared with patients from other racial groups, suggesting that some racial groups may face greater challenges in the recovery process. Understanding such disparities is crucial for optimizing personalized treatment approaches and ensuring that all patients can achieve the best possible outcomes.
The use of a large EHR data set with standardized and longitudinally collected PROMs offered the opportunity to systematically evaluate changes in patients’ well-being throughout their treatment. Nevertheless, this approach also introduced the possibility of selection bias. Although accounting for selective censoring did not change our findings, patients without a recorded RAS score at discharge may still represent a population with unique challenges that merit further exploration. Furthermore, although the RAS is a quantitative tool designed to capture various recovery aspects, it may not reflect the recovery processes of individuals across the entire spectrum of mental health phenotypes and severity levels. Recovery is a subjective experience; expectations and perceptions of what recovery means can differ among patients and might change for a patient over time (which might have contributed to the small downward trend in discharge RAS scores among the patients with very high admission scores). Therefore, patients might consider other aspects not measured by the RAS as instrumental for recovery. Future studies may therefore focus on using multiple or different recovery measures supplemented by a qualitative patient assessment.
Although our sensitivity analyses indicated that the observed changes in RAS scores were unlikely to be solely driven by RTM, some portion of the changes may still be attributable to RTM. Our results might not generalize to patients receiving routine outpatient mental health care or care from non-DBH treatment providers. Finally, we note that, unlike previously published RCTs that linked recovery changes to specific treatment modalities, the intention with this cohort study was to assess the extent of recovery score changes during a treatment episode among patients receiving clinical care at DBH and to examine how these changes vary across specific patient subgroups. Our findings did not elucidate specific treatment strategies or other factors beyond the characteristics considered that might drive these changes. However, they lay the groundwork for future research to identify causal mechanisms, treatment strategies (e.g., by triaging patients into different treatment modalities on the basis of PROMs [including RAS scores] at admission), and other potentially modifiable factors to further enhance personalized treatment approaches.