Thousands of studies unrelated to behavioral health show that without measuring performance and providing feedback, improvement is minimal (
1). In the behavioral health field, greater use of outcome monitoring and feedback has been recommended for all practices (
2–
4).
A measurement feedback system is one tool for providing outcome monitoring and feedback in a clinical setting (
1). It provides systematic and frequent measurement of treatment progress and processes within a continuous quality improvement framework. It is designed to provide feedback that enhances clinical decision making, improves accountability, drives program planning, and informs treatment effectiveness (
5–
7). To effectively and continually improve performance, the feedback must be accurate and clinically useful (
1,
6).
Research with adults supports the use of systematic feedback to clinicians (
8,
9). For example, clinicians' ability to detect worsening of adult clients' symptoms is improved through regular and systematic outcome monitoring and feedback (
10,
11). Worthen and Lambert (
12) have suggested that feedback influences clinical outcomes by providing clinicians with information they may have unintentionally overlooked or underemphasized and by identifying problems within domains that could jeopardize progress. Feedback can assess actual client change, enhance therapeutic alliance, produce more accurate case conceptualizations, and foster richer discussions of potential change in treatment plans (
13).
However, there are no clinical trials of the effectiveness of measurement feedback systems in improving outcomes for youths. Published studies have focused on establishing the accuracy of warning systems that can be used to predict youth treatment failure on the basis of outcome measures scored with computerized algorithms (
14,
15). One study found that incorporating data from multiple reporters, for example, youths and caregivers, is most sensitive in identifying youths at risk for treatment failure (
14).
This study used a cluster randomized experimental design to test the hypothesis that weekly feedback to clinicians improves the effectiveness of mental health treatment of youths living in community settings. Feedback is provided by Contextualized Feedback Systems (CFS), which include a psychometrically sound and clinically useful battery of very brief measures that promotes overall practice improvement through frequent and comprehensive assessments (
16). CFS is based on a theoretical synthesis of perspectives from social cognitive psychology, organizational theory, and management that explains the mediating and moderating structure of feedback interventions that can change clinician behavior (
17–
19). An earlier version of CFS called Contextualized Feedback Intervention and Training was used in this study.
Methods
Design and procedures
The study design and procedures were selected to provide the optimal balance between the requirements for scientific rigor and the need to test the intervention under real-world conditions. Sites affiliated with the Providence Service Corporation (PSC), a private, for-profit, behavioral health organization, participated in the study. PSC provides mainly services to youths and their families in their homes. As a highly decentralized organization, its services are not uniform across sites. Specific type of treatment is not prescribed. Clinicians report using various therapeutic approaches, including cognitive-behavioral, integrative-eclectic, behavioral, family systems, and play therapy.
All PSC sites in the study agreed to adopt CFS as part of the organization's ongoing continuous quality improvement initiative with the intention of learning whether its use was financially viable and organizationally feasible. Forty-nine sites expressed interest in participating in the randomized clinical trial of CFS. They were randomly assigned by the research study's data manager to an experimental or control group. The control group clinicians could receive feedback on a client every 90 days. Subsequently, 21 sites (11 experimental and ten control) dropped out of the study. Our initial plan specified a 2×2 factorial design that included two conditions—feedback and the provision of three Web-based modules on common factors (therapeutic alliance, expectancies about counseling, and collaborative treatment planning). However, only one-third (N=31) of the clinicians accessed the training modules before their first client. Thus, we considered this module condition to be an implementation failure and analyzed data by the feedback condition only. There were no statistical differences in reasons for attrition between experimental and control groups.
Individual support was available thereafter by phone or e-mail. Initial training on how to integrate feedback into practice was provided through on-site workshops. Ongoing training was provided through regularly scheduled (at least monthly) group teleconferences. CFS was introduced to sites as a continuous quality improvement initiative. Thus clinicians were expected and encouraged to participate. However, some clinicians did not participate and some participated with only some clients. The research project provided the clinicians at the 28 sites with a youth and caregiver brochure to give to clients that described CFS. It also trained clinicians in how to explain CFS to clients. Training was segregated by study group assignment so that its content aligned with either the weekly or the 90-day feedback control condition.
At the close of a treatment session, the youth, caregiver, and clinician completed paper questionnaires. They all placed their completed forms in an envelope that was sealed by the clinician. The clinician delivered the envelope to a project-trained assistant at each site, who entered the data. Feedback reports were available as soon as data were entered into the system. The researchers did not collect or enter data and received a limited data set for analyses that included dates of treatment but no other personal identifiers. Data collection extended from June 1, 2006, through December 31, 2008.
All youths aged 11 to 18 who entered home-based services after CFS implementation began at each site were eligible to participate. The study was approved by the Vanderbilt University Institutional Review Board with a waiver of informed consent.
Feedback intervention
Clinicians in the experimental group received weekly feedback plus cumulative feedback every 90 days after a youth was enrolled in CFS. The weekly reports were available a median of nine days (mean±SD=12.3±22.3) after the end of the session. Nearly half (46%) of these reports were available within a week or less.
Clinicians in the control group received only the cumulative, 90-day feedback. Because youths remained in CFS about four months (mean=3.8±3.1; median=3.3), many would have been discharged before their first 90-day report became available. We considered the 90-day group to be a no-feedback control group.
Feedback was automatically generated by CFS in the form of computer screens that compared and summarized measures completed by the youth, caregiver, and clinician at previous sessions. Examples of feedback included mean scores and alerts if the youth's symptoms ranked in the top 25th percentile in severity. Indicators of whether change from one measurement instance to the next met criteria for reliable change and trend graphs for change over multiple measurement points were also provided. Feedback viewing was tracked in the system whenever clinicians clicked the “radio button” at the bottom of the main feedback Web page indicating if they agreed with the report. A detailed description of the current version of the feedback system, with screen shots, is available at
cfsystemsonline.com.
Measures
Youths' symptom severity and functioning were assessed by the Symptoms and Functioning Severity Scale (SFSS), part of the Peabody Treatment Progress Battery (
16,
20). The SFSS was completed by the youth, the caregiver, and the clinician to provide data from multiple perspectives. Cronbach's alphas (
21) for the youth, caregiver, and clinician forms are .92, .94, and .93, respectively. Correlations with the Youth Self-Report, the Strength and Difficulties Questionnaire, and the Youth Outcomes Questionnaire—other measures of youths' symptomatology—range from .71 to .89, depending on respondent type, indicating good convergent validity (
22–
24).
The SFSS assesses change over time in closely timed repeated measurements. Version 1 of the SFSS consists of 32 items that rate how frequently within the last two weeks the youth experienced emotions or exhibited behaviors linked to typical mental health disorders among youths, including attention-deficit hyperactivity disorder, conduct or oppositional defiant disorder depression, and anxiety. Frequency is rated 1, never; 2, hardly ever; 3, sometimes; 4, often; or 5, very often. A total severity scale score is created by a simple average of ratings for each youth if at least 85% of items are completed.
Statistical analyses
The main hypothesis tested was whether youths in the experimental group, whose clinicians could receive weekly feedback reports, improved faster than youths in the control group. This intention-to-treat analysis was repeated for data provided by each of the three respondents—youths, caregivers, and clinicians.
To test the hypothesis, we used hierarchical, longitudinal, slopes-as-outcome models that used random coefficients. In our hierarchical linear models (HLM), repeated measures were nested within participants, youths were nested within clinicians, and clinicians were nested within sites (
25–
28). We estimated three HLMs, one for each respondent type, to determine whether the feedback intervention had an effect on individuals' trajectories of outcome and whether there might be any differential effect by respondent type. We used SAS, Proc Mixed V9.12, to estimate the model by using restricted maximum likelihood (RML) estimation. HLMs, which include fixed effects and random intercepts for youth, clinician, and site levels, allowed for an exchangeable correlation structure at each level (
29,
30). RML estimation is recommended for multilevel models if repeated measures are not equally spaced (
25). HLMs offer important advantages over older models (
31–
33), such as better handling of missing values and unequal time intervals between and within participant responses. Repeated measurements also increase the statistical power, describe the shape of change over time, and avoid the psychometric problems associated with changes in scores before and after an intervention.
Results
Sample
Twenty-eight sites in ten states were included in the analyses, 13 in the experimental group and 15 in the control group. Information about services and clientele at sites that were not included in the evaluation was not available. However, an organizational survey provided data on 24 of the 28 evaluation sites and 107 nonevaluation sites. The sites did not differ significantly on number, years employed, highest degree, or degree specialty of clinicians.
Table 1 shows background characteristics of youths in the experimental and control groups.
Tables 2 and
3 show characteristics of caregivers and clinicians.
The 340 youths who completed the SFSS at least once comprised the analytical sample used in the HLMs of youth reports of outcome. Multiple group testing is controlled with a bootstrap alpha on the basis of 100,000 resamples with replacement. After adjustment of the p values, we found no statistically significant differences at baseline between the control and the experimental groups with one exception. The experimental group had more black and fewer white youths (
Table 1) and caregivers (
Table 2) than the control group (p<.05). There were no significant differences in any clinician characteristic (
Table 3).
Youths participated in the study for a mean±SD of 16.5±13.6 weeks (median=14.5). A total of 3,775 research records were generated, each representing a week in which the youth received treatment and any CFS data were collected. The mean number of research records per youth was 11±9.2, indicating that CFS data were not collected every week. At least one measure in the battery was scheduled to be collected every week, but the SFSS was scheduled to be completed every two weeks. However, it was not always possible to adhere to the schedule. The SFSS was completed 4.2±3.3, 3.0±2.8, and 4.2±3.8 times by youths, caregivers, and clinicians, respectively.
Baseline ratings on the SFSS varied between types of respondents—caregivers' ratings were significantly higher (2.56±.76) than ratings by clinicians (2.34±.55) and youths (2.36±.55) (p<.002). Weak agreement (weighted kappas <.26) between caregivers and clinicians and between youths and caregivers supported our decision to treat data from each type of respondent as an independent test of outcome.
Outcomes
We estimated the individual growth trajectories of the total SFSS score as reported by youths, clinicians, and caregivers. We adjusted each model by youths' race, the only variable found to be imbalanced at baseline between experimental and control groups.
Table 4 shows the results of the intent-to-treat analysis. The intercept is the average SFSS for the two reference groups (control group and nonwhite youths) at baseline (time=0), and the other estimated parameters are deviations from the intercept. Regardless of type of respondent, the estimated feedback coefficient was not statistically significant, indicating that the control and experimental groups had the same level of functioning and symptomatology upon starting CFS; that is, there were no initial group differences.
Clinicians and caregivers reported that symptoms at baseline were more severe among white youths than among nonwhite youths (p=.001 and p=.02, respectively). Youths did not report a race difference (p=.54). Youths and clinicians reported significant improvement in youths' outcome over time (effect size=.30 and .17, respectively, data not shown), but caregivers did not. All three groups of respondents reported that youths in the feedback group improved significantly faster than youths in the control group (p<.01). Feedback effect sizes were .18, .24, and .27 for youths, clinicians, and caregivers, respectively. All effect sizes used the HLM-estimated coefficients measured at the average length of stay in CFS.
The intention-to-treat analysis described above did not consider that one-third of the clinicians in the feedback group did not view any feedback. A separate HLM analysis of the experimental and control groups was conducted to take into account whether a report was viewed (data available upon request). For all types of respondents, youths whose clinicians viewed at least one feedback report improved faster than youths whose clinicians did not view any report (p<.02). When we examined the proportion of reports viewed in a dose-response analysis, effect sizes increased by 50% for youths, to .27, and by 66% for clinicians, to .40 (p<.001). The effect size did not increase for caregivers.
Discussion
This is the first randomized controlled trial to examine the effects of feedback to clinicians on youths' clinical improvement. We found that regardless of type of respondent, youths whose clinician had access to weekly feedback improved faster than youths whose clinician did not.
The effect sizes for CFS were modest. However, to put them in context, they were about the same size as those found for comparisons of empirically supported treatments (ESTs) and treatment as usual in community settings (
5). Moreover, it is encouraging to find an effect of CFS, consistent across three types of respondents, when used with treatments of unconfirmed effectiveness as usually found in community settings. CFS may be of even greater benefit to clinicians who use established ESTs. Moreover, because some clinicians resist using ESTs because they think they can harm the therapeutic relationship (
34), CFS can help clinicians monitor this important relationship and adjust their approach accordingly. Combining feedback with established ESTs, a design we are now testing with a new software package that combines Functional Family Therapy with CFS (
35), may optimize both approaches.
The dose-response analyses showed that the effect size was dramatically increased for two of the three types of respondent reports when the proportion of feedback reports viewed (dose) was considered. Increasing viewing of feedback may be a key to increasing its effects. However, because the dose analyses were correlational, it is possible that other variables accounted for the relationship between dose and changes in severity. For example, it could be that clinicians who viewed more reports were better clinicians.
Interviews with clinicians and supervisors as well as data from this study have been used to develop a third version of CFS that includes better clinician adherence monitoring capacity, easier implementation, and enhanced feedback reports. We anticipate that the latest version of CFS will improve implementation and produce even stronger effects than those reported here, but we need to learn more about how clinicians use feedback to plan and conduct treatment (
36).
There were several limitations associated with conducting a large-scale, multisite field experiment of a measurement feedback system. Although sites were randomly assigned, clinicians within a site who were the most effective could have volunteered to use weekly feedback more often than clinicians in the control group, which received only cumulative feedback at 90 days. However, such a possibility would require that clinicians in the weekly feedback group were more highly motivated and that their self-perceptions of efficacy actually led them to be more effective. Such assumptions are not supported by the data.
It could also be possible that clients at sites where weekly feedback was provided were selected by their clinicians because they thought they were more likely to improve than clients at the control sites. Again, there is little support for this assumption, given that the only initial difference between the groups was race and not any variable that predicted improvement. Moreover, for this possibility to be true, we would need to assume that clinicians could predict who would improve faster, a fact not in evidence.
A study in which data are collected by the clinician involves a trade-off between the practical realities of a real-world evaluation and the stricter but entirely unaffordable protocols in which data are collected by the researcher. In addition, a study in which researchers collect the data would invalidate a study of a system that is designed for clinician data collection.
Significant attrition from the initial random assignments of sites could have biased the samples. However, except for race, which did not relate to improvement, there were no significant differences between the experimental and control groups. It is also possible that the groups differed in some unmeasured way that is correlated to how long they stayed in the study. Yet youths in the experimental and control groups attended an equivalent number of sessions and were enrolled for an equivalent length of time. Finally because we do not know the types of treatments provided at each site, it is possible that differential treatments could have influenced the outcome. However, none of the sites reported that they consistently used specific treatments.
Conclusions
Generalizability is a problem in most mental health research because it is very rare to conduct formal representative sampling of sites or services. The results of this study strictly apply to the sites studied and the services provided, in this case typical home-based care and the CFS intervention. However, a study of 28 sites across ten states that used a randomized controlled trial is exceptional.
The results indicated that there was no significant heterogeneity among the sites and that attrition by sites did not appear to introduce bias in the data collected. This is the first study of youths in diverse and real-life community settings that has shown that mental health outcomes can be improved without necessarily introducing a more usual evidence-based treatment. It supports the use of measurement feedback systems in community clinical practice as an important approach to improving outcomes.
Acknowledgments and disclosures
This research was supported by a grant from the National Institute of Mental Health (RO1 MH068589) and by the Leon Lowenstein Foundation. The authors thank the Providence Service Corporation for their partnership in this project and Ann Garland, Ph.D., Kim Hoagwood, Ph.D., Sarah Horwitz, Ph.D., Robert King, Ph.D., Bill Reay, Ph.D., Tom Sexton, Ph.D., and Steven Shirk, Ph.D., for their comments on an earlier draft. The data analysis was supervised by Craig Kennedy, Ph.D., associate dean for research, Peabody College, and was reviewed for bias by an external consultant to the dean.
Dr. Bickman, Dr. Kelley, Dr. Breda, Dr. Reimer, and Vanderbilt University have a financial interest in CFS. Dr. de Andrade reports no competing interests.