Providing high-quality care at lower costs is a national goal (
1,
2). If it is to be achieved, the primary driver is envisioned to be quality measurement of clinical outcomes that are aligned with financial incentives (
3–
6). Quality measurement of mental health care, however, has lagged behind advances in other health care sectors, with disproportionately less attention paid to child mental health and outcomes (
7–
13). In 2013, only 29 state Medicaid behavioral health agencies provided online information related to measuring behavioral health care quality; use of quality measures varied widely by state, with very few targeting care for children (
14). Among the 26 measures in the 2018 core set of children’s health care quality measures for Medicaid and the Children’s Health Insurance Program, only four are related to behavioral health (
15), and the relationship between adherence and improved clinical outcomes has not been established (
10,
13,
16,
17).
As early as 1991, several legislative mandates in California called for the development of a statewide system for publicly reporting the quality of mental health care and its outcomes over time. This requirement is embedded within a series of laws that seek to stabilize funding for community mental health programs by shifting administrative and financial responsibility to county mental health agencies and earmarking specific tax revenues for mental health care (
18–
20). For children, early efforts to measure performance included documenting high need for mental health care in select county programs (
21,
22), assessing agreement between child functional measures (
23), and describing foster home and state hospital utilization and expenditures among counties implementing system-of-care principles (
24).
In 2012, the legislative mandate to transfer the administration of all Medicaid-funded mental health services to the state of California’s Department of Health Care Services (DHCS) was amended to include a statute to “develop a performance outcome system for early and periodic screening, diagnosis, and treatment mental health services that will improve outcomes at the individual and system levels and will inform fiscal decision making related to the purchase of services” (
25). Yet, despite these policies, there remains a need to develop a robust data infrastructure for quality monitoring and a standardized approach for measuring child outcomes (
19,
26–
28).
In this context, DHCS contracted with a major university to address the question, “What is the best statewide approach to evaluate functional status for children and youth that are served by the California public specialty mental health service system?” (
29). As the recipients of this contract, we sought to recommend a standardized child measure of functioning for statewide use. To do so, we used a five-phase approach consisting of an Internet environmental scan of measures used by state mental health agencies; a statewide provider survey; a scientific literature review; a modified Delphi panel; and final ratings of candidate measures on the basis of nine minimum criteria informed by stakeholder priorities, scientific evidence, and the performance outcome system statute. At the conclusion of the project, we prepared a report to the state outlining our recommendations for a statewide performance outcome measurement system (
30). This article builds on that report by providing a fuller examination of the modified Delphi panel ratings, using qualitative data to explain and identify stakeholder priorities. The article also discusses the final recommendation and implementation plan from the DHCS mental health services division (DHCS-MHSD) and briefly summarizes the study’s methods and main findings.
Methods
Identification of Candidate Measures
To identify a pool of candidate measures, we conducted an environmental scan, a statewide provider survey, and a scientific literature review. The environmental scan examined mental health agency Web sites in 49 states (excluding California) to identify which states used standardized measures to screen for mental health service need or track clinical outcomes for children served by publicly funded specialty mental health programs from December 2015 through February 2016. In addition, a statewide provider survey was conducted by using Survey Monkey in December 2015 to identify which standardized measures of child functioning were used in community-based mental health programs within California and how they were used. The provider sample included behavioral health directors or their designee in 56 of the state’s 58 (97%) counties. Exploratory findings from a purposive sample of 21 contracted providers are not reported.
Further, a comprehensive scientific literature scan was conducted by using SCOPUS, PubMed, and PsycINFO to identify peer-reviewed studies from the previous 5 years (2010–2015) that used standardized measures to track clinical outcomes for children ages 0 to 18 who were receiving community-based, outpatient mental health services. Eligibility criteria were peer-reviewed articles published between 2010 and 2015, English-language abstracts, and use of at least one standardized measure that compares change in the child’s symptoms or functioning across at least two time points. The scan excluded studies with target populations that did not meet medical necessity criteria for Medicaid reimbursement in California’s publicly funded specialty mental health outpatient programs (e.g., primary diagnosis of drug, alcohol, or tobacco use disorder or neurodevelopmental delay).
The final list of candidate measures was merged from these three data sources. Eligibility criteria included use by one state mental health agency, use by two or more California county mental health agencies, or having been used as a clinical outcome measure in three published studies from the literature review. Proprietary and publicly available measures were included. The list did not include measures designed to track individualized outcomes (e.g., therapy progress using more restricted age- or disorder-specific measures or treatment plan goals) because they are not suitable for assessing the effectiveness of care at an aggregate level (provider, program, or county).
Modified Delphi Panel
The modified Delphi method, also called the RAND/University of California at Los Angeles (UCLA) appropriateness method, is a well-established approach that combines scientific evidence and judgment to produce the best possible information (
31). The original method entails assessment of existing scientific evidence by a group of nine medical experts, anonymous ranking of quality indicators based on scientific evidence and expert opinion, confidential feedback to panel members on their responses in relation to the rest of the group, and a discussion among the panel followed by a confidential final ranking of the quality indicators (
32).
For this project, the method was adapted to expand the breadth of expertise by using a 14-member panel and add ratings for scientific acceptability, feasibility, and usability by using criteria from the National Quality Forum (
33). Panel members were purposively selected by using a partnered approach to include expertise in the delivery of publicly funded child mental health care from a variety of perspectives as well as to include participants from urban and rural counties. Each panelist received a manual containing a summary of features (description, logistics, psychometric properties, and strength of evidence) and scientific evidence tables for each of the candidate measures (
30). The strength of the evidence for use as an outcome measure in community-based child mental health programs was rated by using the Oxford Centre for Evidence-Based Medicine (CEBM) levels of evidence. The CEBM protocol ranks the strength of evidence, based on study design and methodologic rigor, from level 1, individual randomized clinical trials with more than 80% follow-up, to level 5, expert opinion or inconclusive evidence (
13,
34). For all 11 candidate measures, the strength of evidence for the outcome studies was critically reviewed and assigned a ranking by a board-certified child psychiatrist.
Using a 9-point Likert scale, panelists were also asked to rate the measures on four domains (1, lowest; 4–6, equivocal; and 9, highest) and overall utility (1, definitely would not recommend; 4–6, equivocal; and 9, would definitely recommend). The domains were marker of effective care (the extent to which improvement in the outcome, as assessed by this measure, is an indicator of effective care), scientific acceptability (the extent to which published scientific evidence supports the use of the measure for tracking clinical outcomes in community-based mental health programs, including three subdomains—reliability, validity, and strength of evidence), usability (the extent to which the intended audience can understand the measure’s scores and find them useful for decision making), and feasibility (the extent to which data obtained from the measure are readily available or can be captured without undue burden—i.e., no formal training required—and could be implemented by counties to track clinical outcomes). Overall utility was defined as the extent to which a panelist would recommend it for statewide use to track clinical outcomes among children and youths served in publicly funded and community-based specialty mental health programs. Following discussion, panelists confidentially re-rated the measure.
To enrich findings from the panel ratings, the discussion was audio-taped and transcribed for qualitative analysis (
35). Transcripts were coded by topic, with inductive codes for theme, affect (positive or negative), and the four specified domains (
36). Each measure’s discussion was analyzed independently and condensed into a synthesis of topics (e.g., features and specific concerns). The full session was then analyzed holistically into a synthesis of common themes appearing repeatedly across multiple measures or flagged by panelists themselves as being of general concern. These common themes were also classified according to relevant domain based on conversational context. This study was approved by the UCLA Institutional Review Board.
Recommendation of Measure
A measure was recommended by meeting nine minimum criteria based on DHCS-MHSD statutory requirements and the main findings from each project stage. Criteria included broad age range (2–18 years); broad range of symptoms (internalizing and externalizing); availability in California’s top three threshold languages (Spanish, Vietnamese, and Chinese); easy to use, as reported by the 56 county mental health agencies (mean score of ≥3 on a scale of 1, difficult, to 5, easy); brief in time of administration (<10 minutes); consumer-centered version (parent or youth); acceptable strength of evidence (CEBM rating of ≤2); mean rating by Delphi panel of high or high-equivocal overall utility (≥6); and capacity to align time point to a unique episode of care (child’s current treatment episode, which often varies by child).
Results
Of the 49 state mental health agencies, 73% (N=36) reported use of at least one standardized screening measure, for an overall total of 15 unique measures (
Table 1). Use of screening measures varied widely by age, with 11 states using a measure for children younger than 5, 13 states for children ages 5 to 18, and 18 states for young adults ages 19 to 21. Assessment spanned various domains, including development, symptoms, impairment, treatment goals, service intensity, and strengths. The most frequently reported screening measure was the Child and Adolescent Needs and Strengths (CANS) (N=18 states), followed by the Pediatric Symptom Checklist (PSC) (N=9), the Ages and Stages Questionnaire (N=7), and the Child and Adolescent Functional Assessment Scale (CAFAS) (N=7). Six states reported using their own custom-made measure. Of the 36 states that used at least one standardized screening measure, 10 (27%) reported use of at least one standardized measure to track clinical outcomes, for an overall total of six unique measures. Of these, only the CANS (N=6) and the CAFAS (N=3) were used by more than two states.
Among the 56 California county mental health agencies, the most frequently reported measures were the CANS (N=33), the Child Behavior Checklist (CBCL) (N=14), and the Eyberg Child Behavior Inventory (ECBI) (N=12) (
Table 2). The reported purposes of the measures included screening, diagnosis, determining level of care, outcomes, treatment goals, and quality improvement. Most of the counties reported use of one measure for all these purposes, even if the purpose did not align with recommended use (e.g., using a service need intensity measure for diagnosis). In addition, 25 counties reported using tools that were not related to child functioning.
Of the 225 measures identified from the literature review, only 34 had been used in at least three published studies as a clinical outcome measure in a community-based mental health setting. Of these, seven measures remained after eliminating measures that were diagnosis specific (N=20), that were not applicable to the target population (N=5), or that did not measure change in clinical status over time (N=2) (see figure in online supplement). From other data sources, we identified four additional measures that were used by at least one state, were used by two or more California county mental health agencies, or were of interest to DHCS-MHSD. The final pool of 11 candidate measures included the Achenbach System of Empirically Based Assessment (ASEBA), which includes the CBCL; Clinical Global Impressions; the Strengths and Difficulties Questionnaire (SDQ); the CANS; the CAFAS; the ECBI; the PSC; the Treatment Outcome Package (TOP); the Children’s Global Assessment Scale; the Ohio Youth Problems, Functional, and Satisfaction Scales (Ohio Scales); and the Youth Outcomes Questionnaire. (Details of how the candidate measures met inclusion criteria are available in Supplemental Table 1 in the online supplement.)
With the CEBM protocol, the strength of evidence for the PSC, ASEBA, SDQ, and CAFAS was rated as a 2, corresponding to an individual cohort study. The other measures were rated as 4, corresponding to poor-quality cohort study, with the exception of the TOP, which had no outcome studies. (The psychometric properties and strength of evidence for use of each candidate measure as a clinical outcome measure in community-based mental health programs are summarized in online supplement Table 2.)
Following deliberation and rerating by the modified Delphi panel, only the ASEBA, SDQ, and PSC were rated on average in the high-equivocal to high (≥6) range for use as a marker of effective care, scientific acceptability, usability, feasibility, and overall utility (
Table 3). The remaining measures were rated consistently in the equivocal-to-low range, on average, for all domains. (Explanations for panel ratings and stakeholder priorities that emerged from the panel deliberation are summarized in
online supplement Table 3.)
Panelists’ priorities for a statewide performance measurement system included assessment of a broad range of symptoms, use with a wide age range, strong scientific evidence, availability in multiple languages, easy interpretation of findings, low burden to administer, parent report version, alignment with the current treatment episode, and timely feedback. The potential for the PSC to facilitate communication across primary care and specialty mental health care providers was viewed as a unique strength. Upon tallying the nine minimum criteria for recommendation for statewide use, only the PSC met all criteria (
Table 4). Compared with the PSC, the ASEBA and SDQ, both of which met seven of the nine criteria, required longer time frames for evaluation (past 6 months for the ASEBA and past 6 months or current school year for the SDQ); as a result, they were well suited for detection of chronic symptoms but not for alignment with a child’s unique episode of care.
DHCS-MHSD mandated the use of the PSC and CANS for the statewide performance measurement system (
37). In fiscal year 2017–2018, $14,952,000 was allocated to build a state-level data capture system and reimburse counties for the costs related to implementation of screening with CANS (i.e., training and clinician time to complete), information technology upgrades, and time spent preparing and submitting data to DHCS-MHSD. Implementation was phased in beginning July 1, 2018, starting with 32 counties and followed by 26 additional counties beginning October 1, 2019, and by Los Angeles County beginning July 1, 2019.
Discussion
The lack of a common approach for standardized outcomes measurement makes it impossible to compare child clinical outcomes across states and across counties within California. Only one out of five state mental health agency Web sites reported use of any standardized measure to track clinical outcomes for children receiving publicly funded mental health services, and only two reported any information on statewide implementation. At the state and California county level, the reported measures varied widely by child age, domains assessed, and format. California counties reported using standardized measures for a wide range of purposes, but none specified using a standardized clinical outcome measure to assess the effectiveness of care. In addition, the outcome measures reported at the state and county levels did not closely align with the strength of scientific evidence for use in community-based child mental health programs or with the Delphi panel ratings. The CANS was reported more frequently than any other measure at both the state and the county level, but the strength of its scientific evidence was poor and its Delphi panel ratings were low-equivocal to low across all domains. In contrast, the PSC, the second-most frequently reported measure among all states but used infrequently in California, rose to the top based on acceptable scientific evidence and high Delphi panel ratings and was the only measure that met all nine minimum criteria.
DHCS-MHSD’s final selection struck a compromise by including the PSC because of its high rankings and the CANS because of its wide use in California. Implementation of the CANS was supported by funding to individual counties to cover the costs of clinician training and time. The CANS is envisioned to facilitate communication by being a part of the clinical assessment (
38) and is named in the legislation as an example of an “evidence-based model for performance outcome systems” (
25). This approach is consistent with evidence that state legislators place higher priority on information from behavioral health organizations than from university-based research (
39).
Successful implementation of a system for measurement of performance outcomes requires several components. Funds for a performance outcome measurement system were not earmarked in the state legislative mandate, which instead stipulates that DHCS minimize costs “by building upon existing resources to the fullest extent possible” (
18), consistent with national trends (
17). Meeting this mandate will require the development and maintenance of a relatively complex statewide data infrastructure that must include multiple clustering units of analyses (individual, provider, program, and agency), the documentation of use and fidelity to evidence-based practices (including recommended medication treatment), approaches for case-mix adjustment and identification of disparities, and the capacity to link to clinical outcomes for children with variable episodes of care by using data sources that are not contingent upon continued contact with mental health services (
5,
8,
12,
27,
40). Costs for this public investment in a measurement-driven system for assessing quality of care will be substantial and will require continual maintenance.
Other important considerations include specifying the purpose and corresponding unit of analysis (e.g., child, provider, program, county, state, or system) when selecting standardized measures to track clinical outcomes for children receiving publicly funded mental health services. The PSC was recommended because it satisfies the nine criteria identified as priorities for adopting a measure— it covers a broad age range; captures a wide breadth of symptoms; is available in California’s top three threshold languages; is easy to use; is brief and consumer centered; and has acceptable evidence strength, moderate to high overall utility, and a time period that can align with the child’s unique episode of care.
Recommendation of the PSC should be viewed as complementary to other measurement-driven quality improvement activities (
41) and does not preclude a program’s use of other standardized measures that may be purposefully selected and individualized for routine outcome monitoring in clinical practice (
42).
It is also important to choose a clinical outcome measure prior to developing a standardized set of methods and materials to electronically document and track the delivery of recommended care processes across county behavioral health agencies. Although the report to DHCS-MHSD provided some guidelines for implementation, development of the approach to collect and submit data were delegated to individual counties, potentially introducing greater heterogeneity in data quality when aggregated at the statewide level. Future research is needed to develop, maintain, and continuously refine statewide data infrastructure for monitoring the delivery of recommended care processes and their relationship to meaningful clinical outcomes as well as to track cost-shifting and potential savings across agencies serving children. As a starting point, it would be useful to develop a set of standardized materials and methods for data capture of the PSC and CANS by using a community-partnered approach in select counties. The data capture effort could then be pilot-tested and further refined prior to large scale use. For the PSC, such an approach could also capitalize on advances in digital health tools to enable primary caregivers and youths to report back in real time on clinical outcomes, ideally with results integrated into the electronic health care record (
43). This would reduce selection bias because clinical outcomes monitoring would not be contingent on contact with mental health services.
Conclusions
A shared and consistent national mandate is required to provide equitable and effective care for children, while reducing costs and placing higher priority on child mental health care. The findings of this study illustrate the need for policy action to promote selection of a common clinical outcome measure and measurement methodology for children receiving publicly funded mental health care. Although this process will likely include advances and setbacks, a statewide performance outcome system remains an important component of systemwide goals.
Acknowledgments
The authors gratefully acknowledge the strong partnership with the DHCS mental health services division, the excellent work of the members of the modified Delphi panel, the comments from the subject matter expert panel during each stage of this project, and data verification support from Xiao Chen, Ph.D.