Mental and substance use disorders are the leading cause of years lived with disability worldwide (22.9% of all nonfatal disease burden) (
1). Detection and treatment of mental illness is low or delayed, hampered by shortages of trained psychiatrists and other mental health providers, inefficient use of existing health human resources and financial resources, and stigma leading to avoidance of mental health care utilization (
2–
4).
Integrated care models involve trained mental health specialists supporting primary care providers to deliver evidence-based mental health care, disease management, and client education. These models have emerged as one solution to the aforementioned problems. Numerous systematic reviews and meta-analyses have demonstrated that integrated care improves access to mental health care, clinical outcomes, and cost-effectiveness of care (
4–
8). For one model, known as collaborative care and derived from Wagner’s chronic care model, evidence from numerous randomized controlled trials is particularly robust (
8–
11). Some authors have suggested that integrated care models may be the most promising approaches to achieving population impact by reducing the burden of mental illness globally (
2,
4,
12,
13).
However, implementation of integrated care models in real-world primary care settings is variable, may not conform to evidence-based practice, and has rarely been evaluated (
14). The most vigorously studied models have been patchily implemented, owing in part to organizational, financial, and attitudinal barriers (
15–
18). In addition, other models of integrated care have been adopted without being thoroughly tested (
14,
19,
20). This has led to a lack of clarity regarding the key characteristics of effective integrated care.
In turn, poor or incomplete implementation of integrated care contributes to poor integration of general medical and mental health care, inappropriate variation in clinical care, delayed follow-up after treatment initiation, treatment drop-out, and insufficient improvement in symptoms (
15–
18). High-profile efforts to scale up the collaborative care model have failed to demonstrate improved clinical outcomes, which is consistent with known difficulties transferring complex interventions across diverse contexts (
21–
25). Unfortunately, in these cases, important structures (that is, the conditions under which health care is provided) and care processes that may contribute to outcomes have not been consistently measured and, indeed, have not been well articulated. To meet population mental health needs, it is vital that we identify and close gaps in implementation of integrated care in primary care.
Quality measurement can illuminate gaps in the translation of evidence into practice and identify potential targets for quality improvement (
26–
28). However, there is scant literature on quality frameworks and indicators by which to evaluate integrated care (
20,
29–
33). Researchers have attempted to define dimensions of high-quality mental health care delivered in primary care settings, but consideration of integrated care practice and evidence has been limited (
29). Other efforts that have focused on integrated care have been limited by use of generic frameworks applicable to health care in general, a small number of measures focused on care processes for single diseases, exclusive focus on client experience and outcomes (foregoing measures of provider, system, or financial outcomes), and emphasis on the chronic care model without regard to identifying its critical components and how they can be transferred across contexts (
32,
34,
35). One recent study queried major U.S. databases of quality measures, seeking those that could be applicable to integrated care (
36). Although such measures may more easily gain acceptance, particularly for performance measurement when funding may be at stake, they may not be the most important, comprehensive, or balanced set of measures to evaluate and improve integrated care implementation.
We are developing a quality framework, indicators, and a measurement strategy for integrated care in primary care settings. One definition of quality indicator is “a population-based measure that enables users to quantify the quality of a specific aspect of care by comparing it to evidence-based criteria. Indicators require defining both those patients whose care meets the indicator criteria (the numerator) and those who are eligible for the indicator, or the population of focus (the denominator)” (
37). This framework is guided by Donabedian’s (generic) quality framework, which states that the organization and structure of the health care delivery system shapes health care processes, which in turn affect outcomes (
26,
28). Furthermore, the measures are informed by a seminal report from the Institute of Medicine (IoM) on quality of care, which identified the following six aims for health care: safety, effectiveness, patient centeredness, timeliness, efficiency, and equitability (
38). The development of a coherent strategy for assessing implementation of integrated care will pave the way for quality improvement and translational research and will ultimately help eliminate the quality chasm (
38).
As a first step toward developing a quality framework and indicators, we conducted a systematic literature review in which we sought to catalog and critically appraise existing quality measures that have been used to evaluate integrated care models implemented in primary care settings. Our specific research questions were as follows: What quality measures to evaluate integrated care delivery exist in the peer-reviewed and gray literature? How have they been implemented? What do they collectively suggest are key characteristics of integrated care program functioning?
Quality indicators outlining an evidence-based standard of care that all integrated care programs should adhere to and detailing how attainment of the standard should be measured are very limited. Thus we sought to examine broadly the ways in which integrated care implementation can be measured (
27).
Methods
This study was conducted from May 2014 to July 2016. Institutional review board approval was not required. We initially planned a scoping review to comprehensively inventory relevant measures of integrated care implementation (
39,
40). We developed a study protocol to guide the review process (PROSPERO registration number CRD42016038387) (
41). After data extraction and critical appraisal of the measures and prior to data analysis, we modified our analytic strategy. In light of the extensive and heterogeneous scope of available data, we conducted a qualitative synthesis that forms the beginning of a logic model regarding integrated care program functioning.
Study Eligibility
Our systematic review included published and gray literature meeting all of the following criteria: described mental health care provided in a primary care setting; described an integrated mental health care model, for example, consultation-liaison or collaborative care models; and described any measures that were or could be used to assess the implementation or outcomes of integrated care. Primary care settings were defined as the first point of contact and the locus of responsibility for health care delivered to a population of clients over time (
42). Integrated care models were defined with reference to the typology and parameters described by the Agency for Healthcare Research and Quality (AHRQ) (
7,
30). All study designs were included along with reports that did not present original research.
Search Strategy and Screening
In collaboration with a librarian, we developed the search strategy, which another librarian independently reviewed (
43). We identified literature published in English and indexed in the electronic databases MEDLINE, EMBASE, PsycINFO, CINAHL, and PubMed prior to July 3, 2014, using subject headings and key words encompassing integrated care AND primary care AND mental health care AND quality measurement. We retrieved gray literature through Google searches by using the same terms and through Web sites of relevant organizations and academic conferences. Finally, we searched the reference lists of all included sources. Two research team members (AI and a staff member) independently screened abstracts and then full texts of selected articles for inclusion. [A sample search string, a list of Web-based gray literature sources, and a PRISMA diagram are included in
Online Supplement 1.] Conflicts were resolved by team consensus. Interrater reliability was assessed with the kappa statistic (
44).
Data Collection and Critical Appraisal
We extracted all measures that evaluated the structure, process, or outcomes (
28) of integrated care in primary care settings. Using a standard data collection form, we extracted study characteristics, population, setting, measures and their classification by Donabedian and IoM domain, and details of measures used (for example, data source, measurement method, and scales). For each source, one research team member (AI or a staff member) extracted data and a second reviewer (NS or AGR) validated the data collected. We organized the citations and data using DistillerSR software.
Given our goal of constructing a quality framework and indicators, we critically appraised each measure found in the literature from the perspective of characteristics of good indicators, using Stelfox and Straus’ (
45,
46) scale, which is based on instruments from AHRQ and RAND and which includes the following specific dimensions: targets important improvements, precisely defined and specified, reliable, valid, appropriate risk adjustment (for outcome measures only), reasonable costs for data collection effort, and results can be easily interpreted [see
Online Supplement 1 for details]. Following Stelfox and Straus’ recommended method, we assigned each measure a score between 1 and 9 for each dimension, where a score from 1 to 3 is considered disagreement on that dimension, 4 to 6 is considered neutral, and 7 to 9 is considered agreement on that dimension. A low score overall for a specific dimension would denote low quality. We indicated “unknown” where there was insufficient information. In addition, each measure was assigned an overall score between 1 and 9: a score from 1 to 3 suggested that the measure is unnecessary, 4 to 6 suggested that the measure could be supplemental, and 7 to 9 suggested that the measure is necessary. We conducted targeted searches of primary literature when necessary (for example, to ascertain the reliability or validity of a measure).
Data Analysis
Qualitative systematic reviews may vary along a spectrum whereby analysis may be primarily integrative (summarizing data into accepted, well-defined categories) or primarily interpretive (inductive development of concepts and theory) (
47,
48). Our qualitative synthesis was primarily integrative and data driven and was conducted in two stages. First, we conducted a content analysis to group the diverse measures of integrated care implementation that we found into unique measures. For example, we summarized these three found measures—clients are offered choices of treatment modalities, clients receive copies of their records, and care considers health literacy—as a measure of clients’ engagement in their own care, e.g., active participation in care and treatment plan. This stage was primarily led by one author (AI), in regular consultation with the lead author (NS). We used descriptive statistics to summarize the types and quality of indicators found.
Second, through a thematic analysis and using the constant comparative method, we inductively grouped the unique measures into broad domains and specific dimensions of integrated care program performance (
49). For example, the measure of clients’ engagement in their own care informed the development of several themes, including client centeredness and the chronic care model (subtheme: informed, activated client). The lead author (NS) primarily conducted the thematic analysis stage in regular consultation with the research team. Thus the content analysis captured the frequency with which particular indicators appeared in the literature, whereas the thematic analysis explored the ability of different themes to describe integrated care programs and their functioning (
48).
Results
We identified 3,761 literature sources, of which 197 met inclusion criteria. We achieved substantial agreement for abstract screening (kappa scores, 75%–98%) and for full-text screening (kappa scores, 90%–99%). Included literature was heavily weighted toward disease-specific studies, especially randomized controlled trials, published in the United States (
Table 1). For sources that were literature reviews (N=24), we extracted data from the primary studies only, and for one source there was insufficient data. From the remaining 172 sources, we extracted 1,255 implementation and outcome measures, which we grouped into 148 unique measures. [A spreadsheet available in
Online Supplement 2 presents the 148 unique measures and validated scales used to measure them, when available. Full details about the 1,255 found measures and bibliographic references are available from the authors.]
Literature frequently reported on the evaluation of individual clinical outcomes, such as depression symptom severity, health status, and level of functioning; cost-effectiveness, such as the incremental cost of reducing depression symptoms or increasing quality-adjusted life years; and evidence-based care processes, such as appropriateness and adequacy of pharmacotherapy for a specific illness condition. Therefore, a very strong emphasis on measuring clinical effectiveness was evident, along with some emphasis on efficiency (IoM domains); emphasis on process and outcome measures was roughly equal (Donabedian framework) (
Tables 2 and
3).
Apart from patient-reported outcome measures of level of functioning and quality of life, which we categorized as effectiveness measures, client centeredness was represented through unidimensional scales of satisfaction with care, economic impact on clients (for example, direct and indirect costs of care, financial or housing status, and employability), and, rarely, clients’ engagement in their care or in program design or quality improvement.
We did not locate any measures of patient safety and found few measures of equitability or of accessibility or timeliness of care (for example, measures addressing vulnerable populations, stigma, and wait times). However, we identified several measures that did not fit with the IoM domains but rather reflected provider experience and the culture of health care delivery, such as health care provider buy-in and engagement in integrated care delivery, confidence in providing care within the integrated care model, and satisfaction with services. Most measures have been implemented, although some, predominantly from the gray literature, were recommended but not implemented.
With respect to critical appraisal, the found measures were highly variable in their quality. It was necessary to examine the original 1,255 found measures. Although they were aggregated into 148 unique measures, each of the unique measures may include examples that were defined and measured slightly differently, leading to variable quality scores for each unique measure. Furthermore, in this study we focused on measures that were actually implemented, which were more thoroughly described. For the 841 measures implemented, 30 (4%) were assigned critical appraisal scores of 1 to 3, 404 (48%) were assigned scores of 4 to 6, 385 (46%) were assigned scores of 7 to 9, and 22 (3%) were not assigned an overall score for quality because of the lack of reporting and missing data in the original citation. Generally, the highest-quality measures were those that evaluated individual outcomes of effectiveness by using validated measurement scales (for example, of psychiatric symptoms, physical symptoms, level of functioning, and quality of life). For other measures, common limitations included imprecise specification, lack of evidence of reliability or validity, lack of risk adjustment (for outcome measures), and high burden of measurement.
The thematic analysis of the quality measures yielded broad domains and specific dimensions of integrated care program functioning and impact that may be important to measure (
Figure 1).
Discussion
In this study, we comprehensively reviewed and analyzed existing measures by which to evaluate the implementation of integrated care programs. We identified key elements of integrated care and specific examples of quality measures that can be used to inform program development and evaluation, quality improvement, and the design of future research studies. Our thematic analysis and visualization invite users to consider the comprehensiveness and complementarity of measures that they may use.
This study had several strengths. First, we considered the broad range of integrated care models that have been implemented in primary care settings in English-speaking countries globally. Second, we included indicators regardless of implementation status and whether they were applied in trials or real-world settings. Third, we employed a rigorous search strategy to exhaustively locate the aforementioned types of indicators. Thus our study provides a unique and far-reaching summary of existing and proposed quality measures of integrated care.
In contrast with Goldman and colleagues’ (
36) search of the National Quality Forum and National Quality Measures Clearinghouse databases, our approach captured measures that have been used or proposed to be used specifically to measure integrated care implementation or outcomes from a wide variety of sources, rather than measures that could be repurposed toward this goal. Similarly, our scope incorporated measures from client and family, provider, program, and population and system perspectives, expanding beyond those currently included in the AHRQ
Atlas of Integrated Behavioral Health Care Quality Measures (
34). However, the drawback of our comprehensive sweep was the inclusion of lower-quality measures (for example, measures that were ill defined or infeasible) that were identified during the critical appraisal process.
Several limitations of our study should also be considered. The concepts explored in our study—primary care, integrated care, and quality of care—are all somewhat complex; our choices in defining the boundaries and search terms of these concepts influenced the literature and measures we located. Our analysis was driven by found implementation and outcome measures in the literature and aimed to summarize these measures into accepted categories. Even though our analysis explored emerging themes and areas of overlap, our approach was not theory building and does not address causality. Other authors of systematic reviews have also identified a knowledge gap regarding the active ingredients of integrated care interventions, and different methods may be needed to delineate these (for example, realist reviews and innovative trial designs) (
8). Researchers seeking to test hypotheses regarding the necessary structures and processes to achieve intended outcomes of integrated care can refer to the visualization (
Figure 1) and database [
Online Supplement 2] to inform theory development and investigation.
The body of literature we reviewed yielded a multitude of measures and captured diverse perspectives on integrated care program implementation and outcomes, and at least some measures were of high quality. Clinical and administrative leaders could use measures from our database to understand how their integrated care program is functioning and to monitor the results of quality improvement and program development efforts. For a comprehensive program evaluation, we recommend using multiple measures capturing diverse perspectives (as depicted in
Figure 1) and several different IoM and Donabedian domains of quality. For a focused quality improvement project, a small number of measures could suffice, with one to two process measures, a balancing measure, and, importantly, an outcome measure. However, although certain types of indicators are frequently represented in the literature, frequency should not be interpreted as confirmation that they are the optimal or only targets for measurement. Indeed, some measures may be prevalent because they are easier to implement (looking under the proverbial lamppost) or because of funders’ emphasis on cost control. For example, many measures are disease specific, which may not translate well into real-world implementation of integrated care programs targeting multiple (and often comorbid) conditions. Furthermore, caution is warranted regarding the potential unintended consequences of focusing on the domains of effectiveness and efficiency at the expense of other domains. Aspects of quality that are underrepresented in the existing literature (for example, equity, accessibility and timeliness, and client centeredness of care) are vital aims of integrated care; some successes have been demonstrated, and further evaluation is merited (
50–
54).
Some components of integrated care functioning that are key to implementation—such as scaling up and sustainability in real-world settings—may be overlooked. Many of the indicators we found were implemented in randomized controlled trials of efficacy, whereas local contextual factors and specific unmeasured processes embedded within trials are often underrecognized and underreported (
21–
23,
25). For example, it is possible that intricate processes of communication, collaboration, and coordination that make integrated care work and that require role changes and different types of working relationships may be missed. Our proposed new quality domain—the “culture of health care”—highlights and begins to address this gap. This concept is also reflected in other overarching quality frameworks, such as the quadruple aim, which extends the Institute for Healthcare Improvement’s triple aim of improving patient experience and population health and reducing health care costs by incorporating provider experience (
55). Notably, such factors may not only be ignored but may in fact be subverted when performance measures are used for external accountability and funding, as other authors have proposed for integrated care (
32,
36,
56).
These concerns speak to the importance of exploring the experiences and perspectives of clinicians and clients in the field. Qualitative studies may yield further insights into the aspects of care that matter to clients, as well as the indicators relevant to implementation success or failure that may not be measured at present. Consultation with experts and key stakeholders could also help identify areas for further indicator development, as well as prioritize from among the many available avenues for measurement those that are postulated to be most influential on overall quality of care. As next steps, we plan to interview health care providers for their perspectives on implementing integrated care in real-world contexts and clients for their perspectives on experience and outcome measures that matter to them. Finally, we will engage diverse experts and stakeholders in forming a consensus on a quality framework for integrated care to guide the future development of relevant, actionable, high-quality measures.
Conclusions
Meaningful and valid quality measurement has been identified as key to promoting uptake of robustly evidenced models that improve outcomes and to encouraging more cost-effective care (
32). To realize the potential for quality measures to guide and effect improvements in integrated care implementation, we will need to broaden the scope of measures to incorporate key domains of quality that largely remain unaddressed (for example, equity, access, timeliness, and safety); microprocesses of care that influence effectiveness, sustainability, and transferability of integrated care models; and client perspectives on important domains and dimensions of quality. We concur with other authors regarding the importance of measuring care processes that affect outcomes (highlighting the need to test these hypothesized relationships), improving information systems, developing health care provider capacity, and engaging clients in health care system design and improvement (
33). Finally, measures will need to be developed and tested for feasibility, applicability, and ability to drive improvements in real-world settings.
Acknowledgments
The authors are grateful to Sydney Dy, M.D., M.Sc., and Elizabeth Lin, Ph.D., for feedback on drafts of the article. The authors also thank Anjana Aery for contributions to the literature searching and data collection.