Mental health courts (MHCs) were developed in the late 1990s to address growing numbers of adults with mental illnesses in the U.S. criminal justice system (
1,
2). These courts operate primarily as postbooking diversion programs whereby defendants voluntarily agree to judicial supervision of community-based mental health treatment, often in exchange for a reduced or dismissed index charge upon successful completion. MHCs may help reduce high rates of reoffending in this population (
3). Although MHCs vary in their design (
4), case processing (for example, proportion of referred cases accepted and time from referral to acceptance) (
5), and selection of participants (
6), they share several defining features. These include a separate docket (list of cases heard in court), judicial supervision of treatment plans, regular appearances of participants before the judge, and terms of participation for successful completion (for example, demonstrated treatment adherence) (
7). Over the past 20 years, MHCs have spread rapidly, and there are now nearly 350 MHCs in the United States (
8).
A key question is whether MHCs are effective in reducing reoffending among justice-involved adults with mental illnesses. Past studies have shown effects of MHC participation on arrests (
9–
12), charges (
13), and jail days (
14,
15). Other studies have failed to find effects of MHC participation on recidivism (
16–
18). A prior meta-analytic investigation examined 15 quasi-experimental and single-group studies published through July 2009, finding a positive effect, moderate in size, on recidivism (Hedges’ g=–.55) (
19). However, this study also revealed evidence of publication bias (that is, published papers presented significant findings in favor of the MHC) and a high degree of heterogeneity across effect sizes. Together, findings to date suggest considerable variability in the effectiveness of MHCs.
Beyond variations in the structure and operation of MHCs, methodologies used to evaluate them may explain mixed findings. Some studies have examined recidivism after participants’ enrollment in the MHC (
12,
15,
16,
18), whereas others have measured recidivism after MHC exit (
13,
14,
17,
20–
22). In addition, length of follow-up has varied across studies, with few studies measuring recidivism longer than 12 months (
13,
15,
16,
18). Furthermore, the methodological quality of designs with nonequivalent comparison groups has varied significantly on key indicators, such as composition of the comparison group, use of matching strategies, and reporting of confidence intervals. For these reasons, investigation of study-level characteristics may elucidate between-study variability and explain inconsistent findings regarding MHC effectiveness.
Since 2009, there has been considerable growth in the research literature on MHCs, including two multisite investigations (
15,
18) and several investigations employing comparison groups to examine the effectiveness of MHC participation compared with treatment as usual (
11,
14,
15,
17,
18,
22). As a result, there is a need to reexamine the contemporary literature on the effect of MHCs on recidivism. We conducted a meta-analytic investigation of the effectiveness of MHCs in reducing reoffending among adults with mental illnesses. Our aims were to establish the effect of MHC participation on criminal recidivism compared with treatment as usual and then to identify moderators of these effects, such as study quality and length and timing of follow-up.
Methods
We followed the PRISMA guidelines (
23,
24) for reporting of inclusion criteria, assessment of publication bias, and synthesis of results.
Literature Search
Three primary inclusion criteria guided our literature search: first, the intervention was identified as an MHC for adults (as opposed to youths); second, recidivism was included as a dependent variable, operationalized as any continuous or dichotomous measure of arrest, criminal charge, conviction, or time in jail for a specified follow-up period; and third, the study included a comparison group. We conducted a systematic literature review in PsycINFO, Google Scholar, and National Criminal Justice Reference Service Abstracts using the key word “mental health court.” The initial search identified 2,769 records. [A flowchart illustrating the search process is presented in an
online supplement to this article.] An additional ten records were identified through reference review. Abstracts were screened by two members of the study team (EL and DB) to determine whether the study identified the intervention as an MHC, represented an empirical investigation, reported on an MHC participant-level outcome, and was published between January 1, 1995, and December 31, 2015. These criteria produced 75 unique records for full-text evaluation by two members of the study team (DB and BN) against primary inclusion criteria. Among eligible studies, we excluded one record for which information to compute a between-groups effect size could not be obtained (
25) and 11 records of duplicate samples. As a quality control measure for our initial search, we replicated our original search criteria in PubMed (80 records) and LexisNexis (77 records). We also replicated our PsycINFO search using identical search constraints and several additional search terms: “diversion program*” (327 records), “problem-solving court*” (64 records), and “alternative to incarceration” (50 records). Review of these records yielded no new records meeting inclusion criteria. Records for which effect sizes could be extracted by sample (that is, a specific MHC and jurisdiction) were treated as separate studies. A total of 16 records representing 17 unique studies were included in the meta-analysis (
11–
18,
20–
22,
26–
30).
Data Extraction
Two of four trained coders (EL, DB, ES, and KD) independently extracted the following data for each study: year of publication, composition of comparison group, MHC location (city, county, and state), dates of data collection, publication type (dissertation, publication, or report), recidivism outcome (arrest, charge, conviction, or jail), length of follow-up (12 months or >12 months), timing of follow-up (after MHC exit, after MHC enrollment, or after MHC referral), and sample characteristics overall and by group (percentage male, mean age, and percentage white). Excellent levels of agreement were achieved across categories (90.0% agreement). Discrepancies were resolved through discussion with the first author.
Because of the high risk of bias and a shortage of instruments of suitable quality for use in nonrandomized and retrospective investigations (
31), we assessed study quality by using two measures: the SIGN Methodology Checklist 3 for Cohort Studies (
32) and the Quality Assessment Tool (QAT) for Quantitative Studies (
33). These were adapted to capture relevant methodological indicators and to generate quality ratings of low, moderate, or high. Each study was coded and scored independently on both measures by two authors (EL and CR). SIGN and QAT ratings showed strong evidence for convergent validity (r=.75, p=.001), corresponding to a large effect size (
34). Interrater reliability was excellent for the SIGN framework (κ=.80; 87.5% agreement) and fair for the QAT framework (κ=.39; 62.5% agreement) (
35). Average ratings across both frameworks produced an excellent level of interrater reliability (intraclass correlation coefficient=.91) (
36).
Between-groups effects on recidivism (k=25) were extracted and coded with a consensus approach by two authors (EL and CR). Effect size direction was standardized such that negative effects represented lower recidivism for MHC participants relative to comparison group participants. Consistent with our operationalization of recidivism, effect sizes were first extracted for continuous measures (that is, arrests, charges, convictions, and jail days). If it was not possible to code continuous outcomes, effect sizes from dichotomized measures of recidivism were coded (that is, any arrest, charge, conviction, or jail time). All effect sizes were coded consistent with quality ratings and an intent-to-treat approach (
37). For most effect sizes (k=19), sufficient information was provided to calculate a standardized mean difference (d). For studies that did not report a within-subjects correlation, we used an estimated correlation of r=.50, which we deemed conservative on the basis of published estimates in the literature (
25). For all other effect sizes (k=6), odds ratios were coded and d estimated in Comprehensive Meta-Analysis (CMA) software, version 3 (
38). For studies reporting rate ratios (N=2, k=4), we recorded odds ratios for dichotomous outcomes to allow inclusion of all effect sizes. When separate effect sizes were presented for MHC completers and noncompleters (N=2 studies), effect sizes were coded separately (k=3) and aggregated.
Data Analysis
Analyses were conducted by using a random-effects model (
39) because of known variability in the design and operation of MHCs (
4–
6). The random-effects model accounts for variability in the intervention- and study-level characteristics as well as sampling (
40). Standardized mean difference (d) effect sizes were calculated for each study, weighted by inverse variance, and aggregated to produce weighted mean effect sizes. When multiple effect sizes were extracted for a single study, effect sizes were averaged across studies to minimize bias from correlated outcomes (
41). Heterogeneity was assessed with Cochran’s Q statistic, indicating the presence of heterogeneity, and with I
2, approximating the amount of heterogeneity (
42,
43). I
2 values of 25%, 50%, and 75% represented low, moderate, and high heterogeneity (
44). We tested four study-level moderators: study quality, recidivism outcome, length of follow-up, and timing of follow-up.
To assess publication bias, we examined publication type as a potential moderator. We then examined a funnel plot of standard errors from random effects (
45), which provides a graphical representation of publication bias based on asymmetry across the vertical axis (
46). Because the funnel plot interpretation is subjective (
47), we conducted the “trim and fill” method, which quantifies and adjusts for funnel plot asymmetry and provides a corrected effect size (
48), and computed a fail-safe N, which estimates the number of additional studies with a nonsignificant intervention effect needed to nullify the effect size (that is, to raise the p value above .05) (
49). All analyses were conducted in CMA software, version 3 (
38).
Discussion
MHCs have grown more prevalent across the United States in the past decade (
8). Although they are generally accepted as one strategy to reduce the overrepresentation of adults with mental illness in the criminal justice system, they are not without controversy (
51–
55). For instance, MHCs have been criticized as potentially obstructing defendants’ due process rights (
51,
55,
56). They also have been called a stopgap for pervasive, structural problems, such as stigma related to mental illness or inadequate community mental health resources (
52,
54). As a result of these critiques, questions remain regarding their effectiveness. We conducted a meta-analytic investigation of studies examining the effectiveness of MHC participation on recidivism relative to treatment as usual. We also examined the extent to which study-level factors attenuated effectiveness.
Overall, our findings indicate that MHC participation had a modest effect on recidivism relative to traditional criminal processing (d=–.20). Because we employed a strict intent-to-treat approach, this finding likely represents a conservative estimate (
57). Specifically, previous research has demonstrated that graduation from an MHC, as opposed to participation more generally, is associated with better outcomes (
14,
58). However, in practice, not every participant who enrolls in an MHC will graduate. Rather than speaking to the effectiveness of successful participation in an MHC, our findings inform the overall effectiveness of MHCs as a judicial strategy to reduce the number of adults with mental illnesses who are returning to the criminal justice system.
Our findings suggest a need for research examining strategies (for example, more frequent status hearings and intensive case management) to encourage participant engagement in MHCs. Indeed, there has been limited investigation of features of MHC participation beyond graduation status that may contribute to reduced recidivism (
59–
61). Furthermore, addressing the criminogenic risks and needs (for example, financial resources, housing, and procriminal attitudes) of MHC participants may contribute to greater reductions in recidivism (
62), although the extent to which these criminogenic risks and needs are addressed in MHC case management and supervision is unknown.
Individual studies have produced significant effects of MHC participation on conviction and arrest outcomes. However, results from moderator analyses showed small effects of MHC participation on either outcome, especially when measured after MHC enrollment. Rather, MHC participation appeared to be most effective at decreasing jail time after exit from the MHC. These findings suggest that MHCs may be most effective as a harm reduction intervention. Specifically, given the already high rates of reoffending in this population (
3), it may not be realistic to expect complete desistance from criminal activity among MHC participants. Rather, MHC participation may be a means to mitigate the severity of future offending (that is, jail time associated with a new offense).
Length of follow-up did not moderate the effect of MHC participation, suggesting sustained reductions in recidivism over time. To date, only one study has examined long-term recidivism outcomes, finding that 53.9% of participants were rearrested in a five-year period (
58). However, that study did not include a comparison group of offenders undergoing traditional criminal justice processing. We also found stronger effects when recidivism was measured after exit from the MHC versus after enrollment, which may reflect the intensive community monitoring of MHC participants and the widespread practice of using jail as a sanction for noncompliance (
4,
63).
Our findings raise a broader question regarding the types of improvements MHC participants should be expected to make during—and after—MHC participation. Future MHC research should adapt practices from an implementation science framework to examine the extent to which MHCs achieve key service outcomes—such as service referrals and engagement—and the extent to which these outcomes contribute to participant outcomes, such as improved psychosocial functioning and decreased recidivism (
64). These investigations are critical to understanding how MHCs operate, what contributes to their effectiveness, and the extent to which short-term gains in treatment and service utilization result in long-term improvements in community functioning.
Finally, although we found limited evidence of publication bias, we observed a moderating effect of study quality, with lower-quality studies yielding higher effect sizes. Of note, few RCTs have been conducted in MHCs (
16). Although some concerns have been raised regarding the use of RCTs to evaluate MHCs for reasons of procedural fairness (
27), RCTs have been used successfully to evaluate other diversion strategies, including drug courts (
65). Our findings highlight the need for increased rigor in evaluations of MHCs, including improved measurement of recidivism and use of appropriate analytic strategies (
66). For example, the dichotomization of recidivism measures (for example, any arrest: yes, no) has the potential to restrict response range and to bias results (
67). When count variables are used (for example, number of arrests), their distributional properties must be assessed prior to analysis. Although a growing number of studies have employed Poisson-class regression (for example, negative binomial, Poisson, and zero-inflated models) to model count data, effect sizes are not consistently reported.
Our findings should be considered along with several limitations. First, our literature search focused on published studies and reports conducted by external researchers. We did not include data resulting from internal evaluations, which may have excluded potential data sources. Nevertheless, our findings showed little evidence of publication bias. In addition, when means and standard deviations were used to calculate standardized mean differences, rarely could we determine whether distributions of recidivism variables met normality assumptions. When studies reported proper effect sizes for Poisson-class models (that is, incidence rate ratios), these could not be included in the meta-analysis because of our use of the standardized mean difference. Instead, we coded odds ratios from comparisons of dichotomous outcomes, reducing effect sizes for two studies (
12,
14). Finally, we could not investigate participant-level sources of effect size variability because of inconsistent reporting across studies, and although we investigated study-level moderators, we were unable to use meta-regression strategies to quantify these effects. These are important directions for future research.