Psychosocial evidence-based practices can lead to desirable outcomes for people with serious mental illness. These outcomes include improved community and interpersonal functioning, better quality of life, and lessening of psychiatric symptoms (
1,
2 ). Unfortunately, these practices are not routinely available, and many people with serious mental illness are not receiving evidence-based care (
3,
4 ). Widespread variation at the state level has been reported with regard to commitment to and success of implementation of evidence-based practices (
5 ). One hindrance to widespread dissemination of evidence-based practices is a lack of knowledge about the process of implementation (
6 ).
The literature suggests that implementation of evidence-based practices has failed for numerous reasons. Most evidence-based practices are complex and may be difficult to implement without preexisting structure and support (
7 ). Some studies have suggested that the principal reason for failure lies in an inadequate implementation plan with no clear model specification (
8,
9 ). A recent review of the mental health implementation literature found large conceptual and empirical gaps (
10 ), highlighting the need for implementation research.
The National Implementing Evidence-Based Practices Project investigated the implementation of five psychosocial practices in routine mental health settings: supported employment (
11 ), family psychoeducation (
12 ), illness management and recovery (
13 ), integrated dual disorders treatment (
14 ), and assertive community treatment (
15 ). The project examined practices implemented in 53 sites across eight states that used a common implementation model, which included material and human resources for each evidence-based practice. A mixed-methods study was conducted to understand the processes and outcomes of implementation over a two-year period at each site. Additional details about the project have been previously published (
6 ).
The unit of analysis for the study reported here is the site, and the primary outcome is model fidelity (
16 ). Fidelity scales have been validated for some evidence-based practices, such as assertive community treatment and supported employment, and have proven useful in differentiating among programs (
17 ). Studies that used fidelity scales have found better outcomes for consumers when services adhere closely to a model with specified critical components and standards (
18,
19,
20,
21,
22,
23,
24,
25 ). This relationship has been established for assertive community treatment and supported employment but not for the other three practices included in this study.
This is the first prospective study with a sufficient number of sites to permit use of fidelity as the outcome of implementation in order to examine differences among evidence-based practices. Our primary aim in this article is to present the two-year fidelity results from the National Implementing Evidence-Based Practices Project. The study had two overarching purposes: to discern whether certain evidence-based practices were implemented more faithfully than others and to examine change over time in fidelity within each evidence-based practice in order to determine the critical time exposure for successful implementation.
Methods
Sites
Mental health authorities in eight states agreed to participate in the project, which involved recruiting sites and developing training and consultation capacity. Each state identified two evidence-based practices for dissemination. Various mechanisms were used across the states to recruit sites and to determine which sites would implement which practice. Some states chose among solicited proposals; others used less formal procedures. Sites were public-sector community mental health agencies, and the evidence-based practices were implemented within their programs of care for people with serious mental illness. The extent of practitioner involvement in and consumer access to the evidence-based practice was determined by each site and varied widely.
States provided a consultant-trainer for each evidence-based practice, and sites agreed in principle to provide time for training and supervision and to develop a relationship with the consultant-trainer. States gave no additional financial incentives to the sites, and they differed in the extent of their commitment and resources to disseminate and support evidence-based practices. Sites also agreed to participate in a range of evaluation activities, which were coordinated by an implementation monitor assigned to each site. Implementation monitors visited the sites monthly to collect systematic qualitative and quantitative data on the process and outcomes of implementation. Implementation monitors and consultant-trainers covered from two to eight sites within their states. Institutional review board approval was obtained for the overall project and within each state.
The baseline assessments and the start of implementation ranged from mid-2002 to mid-2003. Implementation monitoring continued for two years. Forty-nine sites provided two years of data.
Implementation model
The implementation model arose from a literature review, practical experience of services researchers, and focus groups with stakeholders. Additional details are available elsewhere (
7 ). Central to the model were the implementation resource kits, also known as toolkits, and a consultant-trainer (
26 ). The toolkits contained practice-specific and common resources, such as a user's guide and implementation tips for program leaders; introductory videos, PowerPoint presentations, and brochures; a practice demonstration video and workbook for practitioners; and fidelity scales with protocols.
The consultant-trainer provided training and clinical supervision to the program leaders and practitioners who were implementing the practice. The model prescribed a half-day kickoff session to introduce the practice to all stakeholders and three days of skills training modules for practitioners. Additional training and clinical supervision were provided as requested. The consultant-trainer also provided consultation to the leadership of the agency concerning concomitant organizational changes. Consultant-trainers were trained and supervised in both roles by evidence-based practice experts through monthly conference calls and semiannual in-person meetings.
The intensity and quality of the training and consultation varied both within and across states and practices, as did use of the toolkits. In addition to the kickoff and skills training, some sites received on-site training and consultation once per month for two years, whereas others received one day quarterly in the first year only. This variation was the result of individual differences among the consultant-trainers, the contexts in which they worked (states and sites), and the practices themselves.
Assessing fidelity
Fidelity was assessed by rating adherence to the principles and procedures specified in the evidence-based practice models. Fidelity scales had been validated previously for assertive community treatment (
27 ) and supported employment (
11 ). Investigators within this project, in conjunction with developers of the practices, created fidelity scales for integrated dual disorders treatment, illness management and recovery, and family psychoeducation. The implementation resource kits, including the fidelity scales, can be obtained on the Web site of the Center for Mental Health Services of the Substance Abuse and Mental Health Services Administration (mentalhealth.samhsa.gov/cmhs/communitysupport/toolkits).
The assessment of fidelity was similar across practices. It involved one-day site visits to gather information from various sources in order to make 5-point ratings on the critical components of the practice. A rating of 5 indicates full adherence to the model, and 1 indicates no adherence. The average of the item ratings yields a total fidelity score. For the study reported here, a total score of 4.0 or greater indicated high fidelity, scores between 3.0 and 4.0 indicated moderate fidelity, and scores less than 3.0 indicated low fidelity. Two trained raters—the consultant-trainer and the implementation monitor—assessed fidelity. Senior staff for each evidence-based practice provided initial training to the fidelity assessors and provided monthly telephone supervision.
Fidelity was assessed at baseline (before implementation) and at six, 12, 18, and 24 months thereafter. Assessors followed a detailed protocol with instructions for preparing sites for the visit, critical elements in the fidelity assessment, and sample interview questions. The protocol also included a fidelity assessment checklist. The assessment schedule typically included interviews with the team leader and practitioners, observation of team meetings and the intervention (for example, accompanying an assertive community treatment case manager), interviews with clients, and review of client charts. After the site visit, each assessor made independent fidelity ratings. The two assessors then reconciled any discrepancies to arrive at the final fidelity ratings. The consultant-trainer provided a fidelity report to the site after each assessment, which summarized the fidelity ratings and provided advice concerning components of the practice that were deficient.
Analysis
The interrater reliability of the fidelity scales was evaluated with the intraclass correlation coefficient (ICC) (
28 ), based on a one-way random-effects analysis of variance (ANOVA) model for agreement between the two fidelity assessors on the total scale scores. The ICC was computed across all assessment points for each fidelity scale.
Forty-nine sites that completed the project were included in the outcome analyses. We first analyzed the fidelity outcomes by evidence-based practice at the end of the two-year implementation period. One-way ANOVA compared the average endpoint fidelity scores among the five evidence-based practices. Second, we examined change over time in fidelity for the five evidence-based practices. Mixed-effects regression models were used to test the significance of the time and practice main effects and their interaction.
Discussion
The 59% rate of high-fidelity implementation at two years is encouraging for the wider dissemination of these five practices. Only seven of the 49 sites (14%) failed to achieve an endpoint fidelity scale score of 3.0 or higher. Currently, there is no consensual agreement on cutoff scores for high fidelity, whether for quality assurance, implementation research, or accreditation. We chose a score of 4.0 or higher on face validity, but others have imposed different standards (
17,
29,
30 ).
Because of the lack of comparison sites and because of confounding in this study (for example, states and consultant-trainers), the findings must be viewed as descriptive. They leave open questions concerning the effectiveness of the implementation model and differences among the five practices. Moreover, there is little relevant literature in the mental health field with which to compare these results. The rates of high-fidelity implementation compare favorably with those in previous studies—for example, 53% for assertive community treatment (
17 ), 50% for family psychoeducation (
31 ), and 61% for supported employment (
32 ).
There are several possible reasons for the differences in endpoint fidelity. The practice models, and hence the fidelity scales, variously emphasize the structure of the practice versus the clinical expertise required to deliver it, which may account for baseline and endpoint differences. Supported employment and assertive community treatment require structural and clinical changes, whereas other practices rely more on clinical interventions. Structural changes can often be implemented quickly, whereas clinical skills require extensive training and supervision to implement fully.
Furthermore, the differences among the practices must be interpreted cautiously, because the fidelity scales have not been calibrated against each other. Item analysis and a larger sample of sites will be needed to fully evaluate the properties of the five fidelity scales. Despite this limitation, several factors support the credibility of the findings reported here: similar procedures were used to develop the fidelity scales, common assessment methods were used, and the scales have high face validity. Moreover, the fidelity scales provided the basis for implementation targets for the sites, and as the maxima in
Table 2 indicate, high scores were attainable on all five scales.
The longitudinal fidelity results indicate that providers using the toolkit implementation model and similar resources should be able to achieve successful implementation within 12 months. The variation in the rate of implementation highlights another difference among the evidence-based practices. Unlike assertive community treatment and supported employment, which can attain high fidelity rapidly, illness management and recovery and family psychoeducation unfold more slowly over time in a prescribed fashion, and therefore it is impossible to achieve high fidelity in these practices until all of the stages have been implemented.
The toolkit model prescribed one year of training and consultation to achieve successful implementation, followed by one year of reduced support to facilitate sustaining of the practice. Accordingly, the longitudinal results revealed few further gains in fidelity during year 2, but they also revealed no erosion of the gains made during year 1. The focus at the sites shifted from improving fidelity to sustaining the practice, even if high-fidelity implementation had not been achieved by the end of year 1. This suggests the need to revise the implementation model to include booster training or other interventions at sites that fail to reach high fidelity in one year in order to facilitate further gains in fidelity and to support sustainability.
This study focused on model fidelity as the measure of evidence-based practice implementation, although there are other indicators of implementation success. The penetration of the practice provides another perspective on implementation, where penetration is defined as the proportion of eligible consumers who have access to the practice. Measures of consumer satisfaction with evidence-based practice services and practitioner attitudes toward evidence-based practices could also be used to evaluate implementation success. Another perspective on implementation concerns the quality of clinical skills and interactions, which are difficult to assess validly (
33 ). System outcomes such as lower hospital and jail use should also accrue from successful implementation of evidence-based practices.
The ultimate aim of successful evidence-based practice implementation is to improve consumer outcomes, but outcomes are an imperfect measure of implementation, because the association between fidelity and outcomes is modest. Consequently, Goldman and colleagues (
5 ) have cautioned that "measures of fidelity are a means to an end, not an end in themselves." Study of the relationship among multiple indicators of implementation is needed, in addition to study of the association between those indicators and consumer outcomes. An optimal approach to implementation that will improve consumer outcomes may be to combine approaches, that is, to seek fidelity to the model during early implementation and then to use outcome-based supervision to adapt and sustain the practice.
The limitations of this study must be considered when drawing inferences from the fidelity results. Sites were not selected randomly. All sites volunteered, but the rigor and nature of the selection process differed across states. No doubt those chosen were among the more motivated to implement a new practice, but they cannot be considered to be early adopters because all practices except illness management and recovery have been around for at least ten years. Nevertheless, the generalizability of the findings to other sites may be limited by selection factors. In addition, the fidelity assessors were not blinded. The consultant-trainers had an investment in the success of their sites, but the high agreement between them and the implementation monitors, who were not similarly invested, suggests the absence of strong and systematic bias.
In addition, the fidelity results are descriptive rather than analytic, and we cannot yet answer questions as to why certain evidence-based practices had higher fidelity implementation than others or why certain practices took longer on average to reach high fidelity. The differences may be due to the practices themselves, to contextual factors, or to the fidelity scales. In the course of this study, we accumulated a large amount of qualitative and quantitative data concerning the process of implementation at these sites. In follow-up papers, these data will be used to examine predictors of successful implementation and to explore the facilitators of, and barriers to, high-fidelity implementation both within and across evidence-based practices.