A fidelity measure that captures whether the key ingredients of a program model are in place is an important tool for successful implementation of an evidence-based practice (
1). Maintaining program fidelity is especially important for permanent supported housing, because the intervention’s complexity can lead to ambiguity and confusion about concepts and practice (
2). Although the Substance Abuse and Mental Health Services Administration has defined key elements of permanent supported housing (
3), a recent research review calls for further determination of fidelity to the model and its relationship to outcomes (
4).
The Housing First approach is an evidence-based practice developed by Pathways to Housing in New York City (
5). Housing First programs aim to rapidly move people into housing by using key principles of housing choice, delinking housing from support, no requirement for housing readiness, harm reduction, and recovery-oriented treatment (
6). Housing First was rated highest in model clarity and specification in a review of 35 studies of supported housing (
2). The Housing First approach has gained popularity in recent years and has been implemented worldwide. Still, the widespread flourishing of the model has had unfortunate consequences, including conflicting use of the label and inconsistent definitions of key principles (
7,
8). There is a pressing need for a validated and useful fidelity tool to accompany implementation of this model.
The Housing First fidelity tool is a 38-item scale with five domains that has shown early promise with regard to utility, reliability, and validity (
6). Stefancic and colleagues (
6) described the process of scale development and field testing in two large research initiatives in the United States and Canada. The scale generally had good internal consistency, captured variability in implementation, demonstrated construct validity, and was useful for guiding program development and technical assistance. The scale’s validity was further supported by a study in California, which found an association between fidelity ratings and housing outcomes in the state’s full-service partnerships (
9). The Canadian initiative, At Home/Chez Soi, is a multisite, randomized controlled trial (RCT) of Housing First in five cities (
10). A mixed-methods evaluation found generally high program fidelity ratings at one-year follow-up, even for culturally adapted programs (
11).
This study drew upon additional data from At Home/Chez Soi to address two new questions about Housing First fidelity ratings: Do they correspond in expected ways to program operation descriptions from administrative data sources, and do they correlate with client outcomes?
Methods
The At Home/Chez Soi project, funded by Health Canada through the Mental Health Commission of Canada, implemented Housing First in five Canadian cities (Moncton, Montréal, Toronto, Winnipeg, and Vancouver) between 2009 and 2013 for people with a psychiatric disability who were homeless. Before participants were recruited, staff were hired and trained according to the Pathways Housing First model. At Home/Chez Soi compared the effectiveness of Housing First to treatment as usual on a variety of outcome measures at baseline and six, 12, 18, and 24 months (
10). Twelve programs across the five cities with a total of 2,148 participants were studied. The 1,158 participants randomly assigned to receive Housing First were the focus of this analysis.
Research Ethics Board approval was obtained from 11 institutions and signed informed consent from each participant was required. [Further details about the study design and results and participant characteristics and a copy of the fidelity scale are available as an online supplement to this report.]
Fidelity assessments consisted of one- and two-year assessments of the 12 programs, five assertive community treatment programs and seven intensive case management (ICM) programs. A few items on the fidelity scale were modified slightly to evaluate the ICM programs (changes in wording to reflect program differences), but the rating scales and calculation of total scores were the same for both types of programs. Four to six raters from various backgrounds (clinicians, researchers, administrators, and peers) made full-day site visits, which included technical assistance. Data sources included staff meeting observations, interviews with staff and program directors, consumer focus groups, and reviews of charts and other program documentation.
Each of the 38 fidelity scale items is rated on a 4-point scale; as is typical with this scale, ratings were made by consensus after team discussion. Specific response categories vary by item, but they are always ordered such that 1 represents poor fidelity and 4 represents excellent fidelity. Possible scores range from 38 to 152. Two of the 38 items measure elements of support intensity.
Data about program operations were obtained through summaries from routine administrative systems. In particular, information was collected about the amount of time staff spent in direct care (in-person contact with program participants) and indirect care (case conferences, charting, referrals, and other administrative work on behalf of program participants), time to first housing placement, and number of contacts per client.
For this analysis we focused on the primary outcomes of the full trial. Data were derived from standardized scales with established reliability and validity. The observer-rated Multnomah Community Ability Scale (MCAS) (
12) measured community functioning, and the 20-item Lehman Quality of Life Interview (QoLI-20) (
13) measured participant-reported quality of life. The Residential Timeline Follow-Back scale (
14) was used to collect housing stability data.
We characterized the fidelity of each program over the course of the study by calculating the mean of the total fidelity scores over the two assessments. Stata 13 software was used for statistical analysis.
We used Spearman correlations to measure associations between fidelity ratings and the four administrative data indicators.
To examine the association between program fidelity and three study outcomes (MCAS, QoL-20, and housing stability), we used mixed-effects modeling. Models included random intercepts as well as fixed effects for fidelity, sex, age group (<26, 26–35, 36–45, 46–54, and ≥55), need level (high versus moderate), Aboriginal status, and racial-ethnic status. Aboriginal and racial-ethnic status were included because targeted programs for individuals from specific racial-ethnic groups were available at some sites (
10). We coded age into groups in order to capture nonlinear associations in a standard way across outcomes and models and for consistency with previously reported results.
We included the baseline measurement of each outcome as a covariate, and, as a result, the dependent variable in each model represented outcomes over the study period. For the MCAS and QoLI-20, we fit linear models. In 80% of cases, the proportion of time spent in stable housing was either 0% or 100%. We therefore dichotomized this variable, using a cut point of 50%, and fit logistic models.
We examined the linearity of the association between fidelity and each outcome by inspecting means and by fitting fractional polynomial models. Results indicated that the effects were essentially linear.
Because the number of programs (N=12) was small, we also conducted a simple sensitivity analysis. We reran our mixed-effects models after dropping each of the programs in turn.
Results
Total fidelity ratings ranged from 118 to 151 (70%–99% of the maximum value), with a mean±SD rating of 136.6±10.3 (87% of maximum value) and a median rating of 136.
Participants had 88±71 provider contacts per year (median=71, interquartile range [IQR]=46–108) and received 56±59 hours of direct services (median=47, IQR=28–71) and 46±43 hours of indirect services (median=34, IQR=16–63). At the program level (N=12), overall fidelity was correlated, at least at the trend level, with mean direct service time (ρ=.55, p=.10), indirect service time (ρ=.58, p=.08), and number of contacts with providers (ρ=.60, p=.04) but not with time to placement in stable housing.
In mixed-effects models, a difference of one SD in fidelity rating (10.3 points) was associated with a difference of .93 (95% confidence interval [CI]=.38–1.48) points on the MCAS and 2.27 (CI=.84–3.70) points on the QoLI-20 and with higher odds of housing stability (odds ratio=1.11, CI=1.08–1.14, for a fidelity rating that was .5 SDs above versus .5 SDs below the mean) (all p<.01). Effects for the MCAS and QoLI-20 corresponded to a Cohen’s d of .11 and .10, respectively. Fidelity effects remained significant after removal of each program.
Discussion
These results are encouraging, given the relatively small number of programs, lack of statistical power, and restricted range of variation in fidelity and outcomes. In this study, the restricted range of fidelity scores is an indicator of the sound implementation of the program model. The demonstration model employed training and quality assurance to achieve high levels of fidelity, and these efforts appear to have contributed to the narrow range of positive outcomes.
Evidence of the relationship between fidelity and outcomes for evidence-based practices is not usually available. Such a relationship is an underlying assumption or justification for all fidelity scales, but the predictive validity of fidelity scales, other than for individual placement and support employment programs, has not been demonstrated by rigorous methods that control for other factors (
1). One weakness of our study was the use of total scores rather than domains or items. We would need a larger sample of programs in order to do so.
We found a stronger effect for fidelity on community functioning and quality of life than on housing stability and no relationship between fidelity and time to housing. This may be because city-level factors external to the program, such as vacancy rates, had more influence on housing outcomes compared with other outcomes, and we cannot disentangle city and program differences in this analysis. Although effect sizes for community functioning and quality of life were small in absolute terms, they represented meaningful variations in the intervention effects; elsewhere we reported effect sizes of .2 to .3 for Housing First as a whole, which implies that a variation in fidelity scores of one SD would be associated with variations in intervention effects of 30%–40%.
The difference in fidelity ratings between the highest- and lowest-fidelity programs, moreover, was 33 points, which corresponds to an effect size of .33 for the MCAS and .40 for the QoLI-20. Effect sizes this large imply that, as has previously been reported (
9,
15), variation in program fidelity is associated with meaningful variation in outcomes—a finding that supports both the validity of the scale and the effectiveness of the intervention. This association between fidelity and actual service contact is consistent with program theory (
5) that links availability or intensity of support to achieving other outcomes, such as improved quality of life. These findings help to validate the scale through use of a concurrent data source.
Conclusions
This study provided evidence supporting the incorporation of the Housing First fidelity tool into research and implementation regarding the Housing First model.
Acknowledgments
The authors thank Jayne Barker, Ph.D., and Cam Keller, M.A. (2008–2011 and 2011–2014, respectively, national project leads) and the members of the national research team, the five site research teams, the site coordinators, the numerous service and housing providers, and the persons with lived experience who have contributed to this project and the research.