Many recent public and private strategies aimed at improving the quality and efficiency of the U.S. health care system focus on measuring, reporting on, and providing incentives for improving quality. Beginning in 2014, for example, all clinicians participating in the Medicare meaningful use incentive program will be required to report on a set of clinical quality measures drawn from the U.S. National Quality Strategy (
1). Similarly, the Medicare Shared Savings Program rewards accountable care organizations (ACOs) only if they meet measured quality standards (
2). Although quality measurement is increasingly popular and important, the development, dissemination, reporting, and use of valid quality measures are challenging problems across all health care (
3). Assessing quality is particularly difficult in behavioral health care, where despite recent efforts, quality measurement for even the more common conditions is less well developed than for comparable general medical conditions (
4,
5).
In their quest for quality measurements, the National Quality Forum (NQF) has identified more than 700 measures. Many of these are relevant and important in psychiatric practice (for example, measures of care coordination), but only 30 are directly linked to behavioral health care. Most of the behavioral health measures focus on the treatment process, not on outcomes. Given the increased attention to and importance of measuring the value of health care, the limited number of well-defined and widely accepted quality measures of behavioral health treatment puts the practice of psychiatry at a disadvantage in demonstrating value and moving forward with the implementation of meaningful provider performance ratings and pay for performance.
The absence of a comprehensive set of well-accepted measures capable of demonstrating the value of treatment makes substantiating the value of behavioral health treatment and demonstrating its impact on relevant populations it especially challenging and also makes it more difficult to appropriately position and demand greater attention to behavioral health services in new health care delivery systems, such as ACOs. The absence of measures makes it more difficult to assess whether nonquantitative limits on treatment threaten parity of coverage between mental health and general medical services. In addition, in the absence of a set of measures, it is more difficult to make a case for extra payment for care delivered by more effective providers. In sum, funds for health care will be increasingly directed to areas where value is being measured and demonstrated. If value in mental health care is poorly measured compared with other medical areas and is measured in ways that do not allow the profession of psychiatry to demonstrate its benefits, financial resources will likely be diverted from psychiatry, despite high rates of behavioral health conditions and unmet needs for treatment. As we argue below, increased use of a particular type of measure, the measurement of functional outcomes, may help to address these problems.
This Open Forum builds a case for greater use of functional outcome measurement in mental health care. We begin by describing the types of measures that exist and laying out criteria for evaluating them. We then turn to the case for functional outcome measures and evaluate their potential benefits, while noting the risks they pose.
Types of Measures
To understand the problem of measurement in psychiatry, consider a rudimentary logic model of psychiatric practice. A logic model is an analytic strategy that proceeds from a set of inputs to a set of actions, which in turn lead to outputs, and these outputs lead to the ultimately desired outcomes (
6). Logic models are built recursively—that is, the first step is to define the outcomes, the ultimate goal of treatment. For the purposes of this analysis, assume that the desired outcome of psychiatric treatment is recovery, which we operationalize as the service user engaging in a self-directed and fulfilling daily life (
Figure 1). Working backward in the logic model, we can then define the set of treatment outputs that psychiatric care can produce and that contribute to achievement of this outcome. These outputs could be, for example, the absence of debilitating symptoms from a psychiatric condition. Outputs, in turn, are produced through actions. Here, the actions might include the use of appropriate evidence-based treatments, as well as effective care coordination. Finally, engaging in these actions requires a set of inputs, such as investment in a therapeutic alliance to allow the service user to trust and engage in the evidence-based treatment.
This logic model leads naturally into a cascade of measures and measurement milestones from inputs to outcomes. Corresponding to the rightmost column of the model are functional outcome measures, which assess the ability of service users to engage in a self-directed and fulfilling daily life, focusing on those behaviors or activities that are potentially impaired by an individual’s behavioral health disorder, such as quality of life, workplace productivity, and days at work. Most of these measures are currently patient reported (
7), although research studies have collected a few measures from employers’ human resource information systems (
8).
At the next level are clinical outcome measures, which assess the clinical outputs of treatment. Some of these measures use laboratory values, whereas others, including most behavioral health clinical outcome measures, are patient reported and commonly measure symptom reduction (for example, a change in [or level of] a score on the nine-item Patient Health Questionnaire [PHQ-9]) or observational data on behavior (for example, a lower incidence of disruptive behaviors). The measures used in conjunction with DSM-5′s focus on measurement-based care would fit at this level. Other clinical outcome measures might include the acquisition of new knowledge and use of replacement behaviors or self-management skills (for example, measurement tools for social skills training and for recovery and resilience).
Process measures correspond to the action level of the model. These measure the extent to which a practice, milieu, or delivery-of-care system treats service users in a manner consistent with standards of high-quality care. For example, the rate at which discharged inpatients are seen as outpatients within a defined period after hospitalization is a process measure that focuses on the effectiveness of the delivery system. Process measures also capture coordination and integration of services across settings.
Finally, structure measures report whether a practice, milieu, or delivery-of-care system has in place the infrastructure and other resources that allow it to provide high-quality care and care with fidelity. For example, the presence of therapists trained in cognitive-behavioral therapy in a behavioral health clinic setting is a structure measure.
Measures of each type can be used at various levels within the health system, such as at the practice or delivery system level, because measures at different levels vary in their focus and purpose. Measures also can be used for various purposes. They can be used for monitoring, either within a practice (where they may serve as a benchmark for quality improvement within the service user–provider treatment relationship, as suggested by DSM-5) or across practices. They can be used to ensure that quality exceeds a minimum standard (as is the case with accreditation standards). Measures can be used as a source of comparative information for payers and service users choosing among providers. Finally, measures can be used as a basis of payment.
The NQF, a nonprofit organization created by public- and private-sector leaders to improve the quality of health care in the United States, has examined and endorsed a broad range of behavioral health measures (
Table 1). Most of the endorsed measures rely on assessing, evaluating, and screening service users for mental health issues. Another subset of measures examines the treatment of people with diagnosed conditions, assessing whether providers appropriately treat these service users according to specified process standards. The NQF has endorsed three depression measures based on the standardized PHQ-9 screening tool. Although the NQF has begun a program of assessing patient-reported outcome measures, these measures have not yet been endorsed and are not specific to behavioral health care (
9).
Although structure and process measures account for the majority of measures currently in use, they provide a weak basis for demonstrating the value of psychiatry. Most of these structure and process measures focus on primary care and rarely capture care provided by psychiatrists. Even those that assess treatment evaluate only the most basic aspects of care processes, such as whether people are receiving any care and at what frequency, and do not enable providers to distinguish themselves by how well they improve the health or lives of their patients. Only three of the endorsed measures address clinical outcomes, and none address functional outcomes.
Criteria for Assessing Measures
The dearth of functional outcome measures in current use can partly be explained by the concerns and rationales used for developing measures to date. Choosing what to measure and at what level is challenging because measurement can be used for many purposes, some of which conflict. Translating a measure concept into a specific question for which data can be routinely collected poses an additional set of difficulties.
One goal of measurement flows from the principle of transparency: consumers (or payers) are entitled to know what they are receiving (or paying for) so that they can make better decisions. This line of thinking emphasizes criteria related to the inherent importance of the concept being measured. For example, patient satisfaction is obviously important, and a measure of patient satisfaction will rank high in terms of inherent importance of the target of measurement. From this perspective, structure and process measures score poorly because they have little inherent importance; clinical and functional outcome measures are preferred.
A second perspective on measurement recognizes that measurement, reporting, and paying providers on the basis of their performance on specified measures affects the behavior of providers (and their patients). Once this is recognized, the natural question arises: in light of how measurement affects behavior, how do we make choices about measurement so as to induce the behavior we seek? This is the domain of economics and related fields. The general approach in economics is to design the reporting policy in light of how service users and providers respond to the presence of information, considering both intended and unintended consequences.
Almost always, measurement is imperfect and partial. Although some actions or outcomes are measured (and perhaps rewarded), others are not. This introduces several problems. Practice settings cannot improve everything and are more likely to focus on improving elements for which performance is being measured and rewarded while paying less attention to other aspects of care. This problem of “teaching to the test” may result in a disregard of the real goals of treatment. This is particularly problematic when improving one aspect of care may diminish (or fail to affect) quality in another dimension. For example, some research suggests that practice settings that achieve high patient satisfaction with care do not necessarily produce high-quality outcomes along other dimensions (
10).
A related problem is that providers may have (or believe they have) a limited ability to affect the measured outcomes for which they are held accountable. Changes in functioning, for example, are likely to be influenced substantially by other factors in the individual’s environment, such as the person’s living situation or work or school environment. Being held accountable for performance on a measure seen as largely outside of one’s control may be seen as unfair to the clinician or facility being measured, and holding providers accountable for these outcomes may be counterproductive. Note, however, that the inability of providers to greatly affect a measured outcome should not automatically eliminate a measure from consideration. In some cases, the measure may be useful to service users as they choose providers. For example, it may be very difficult for providers to improve their Spanish-language proficiency, but measures of such proficiency could still be useful to service users selecting among providers.
Even if, as is often the case, service users pay little attention to information about service quality, measuring and reporting on quality can still be useful as a means of improving quality (
11). If providers can observe and compare their own performance, dissemination of these measures can have a substantial impact. For example, about 80% of the relatively large quality improvement effect generated through dissemination of information about the quality of cardiac surgeons in Pennsylvania occurred when surgeons compared their own performance to that of their peers (
12).
Another unintended consequence of measurement may be to encourage providers to favor service users who will make them look good—that is, the choice of measures can affect which sets of service users may be more desirable for providers to attract. Measurement may introduce access problems for more “difficult” service users (for example, those less likely to have positive outcomes, such as individuals less adherent to treatment). Measures that focus on achieving improvements for a population—rather than on meeting a fixed bar—may be less susceptible to selection.
Finally, narrowly defined structure and process (and even clinical outcome) measures can stymie innovation at the provider level. Providers who might be able to achieve better outcomes in a new and different way will have less incentive to do so (
13). For example, a process measure that assesses whether providers follow a specific treatment protocol may hamper efforts to introduce an alternative treatment path that may be more effective. Measures that require providers to follow defined processes can lead to substantial improvements in quality by eliminating the worst care—bringing up the floor. At the same time, however, they can discourage innovations that might lead to better processes (if these innovations are appropriately studied and evaluated).
Beyond these issues are considerations in regard to the choice of specific measures. For example, a particular measure needs to be assessed in terms of its psychometric properties, including validity (that is, how well the measure captures the target of the measurement) and reliability (that is, whether repeated measurements produce the same ratings), and its statistical properties (for example, sensitivity and specificity). A second set of considerations concerns the cost and complexity of data collection. The cost of collection should be weighed against the extent to which measures drive better performance or better choices. Measures should be easy for clinicians to collect and use (
14).
Trade-offs among these considerations may depend on the use of the measures. Structure and process measures are often preferred for the purpose of tying payment to performance, as in the Medicare Shared Savings Program (
2). Narrowly defined structure and process measures can be designed to be well within provider control and largely free of selection incentives. Many can be assessed by using routinely collected administrative data, which will minimize intrusiveness and cost. However, narrow structure and process measures are susceptible to the problem of “teaching to the test.” There is no particular reason why strong performance on one of these narrow measures should be associated with strong performance in another arena. Structure and process measures also are less desirable from a transparency perspective.
Outcome measures, although preferred from a transparency perspective, raise much more serious problems of selection and provider control. By focusing on broad quality targets, however, they are less susceptible to the “teaching to the test” problem and can encourage process innovations. Outcome measures are often preferred in contexts where they are used to monitor performance but not as a basis of payment.
In the case of psychiatry, measures should ideally address all of these considerations. They should provide information that is meaningful to service users, and they should encourage providers to improve the quality of care. They should have good psychometric properties and avoid untoward unintended consequences. Considerable progress has been made in the development of behavioral health measures, but because it has occurred mainly in the context of payment reforms, the focus has been on process measures that are unlikely to lead to unintended consequences—but are also unlikely to demonstrate the benefits that psychiatry brings to the treatment process. The biggest gap in reaching this goal is the absence of outcome measures.
Functional Outcome Measures
As the discussion above suggests, most of the focus of measurement research in behavioral health care has been on structure and process measurement. Given the multifaceted nature of behavioral health problems, functional outcome measures have tremendous promise for assessing treatment effect in a way that more closely demonstrates the tangible value of treatment to service users and to the community. Recovery, for example, can be understood as a comprehensive functional outcome. The development and use of functional outcome measures to assess the value of health care has been a challenge to the broader health care field, yet in the past several decades, such measures have been developed and incorporated in the routine practice of a range of health care disciplines, such as chronic obstructive pulmonary disease (
15), stroke (
16), and knee orthopedics (
17). The World Health Organization has developed a brief set of such measures, including measures of cognition, interaction with others, and life activities, that can be administered in clinical settings (
18). In behavioral health care, studies have measured the effect of behavioral interventions in treating depression by using workplace productivity and absenteeism as measurement outcomes (
19,
20). Functional outcomes measures have also been used for quality improvement in behavioral health care. For example, through its outcomes measurement system, practitioners in Maryland’s public mental health system routinely collect and view information from their patients on living situations, functioning, substance use, legal involvement, employment, and general health (
21).
Given the heterogeneity in the nature and severity of the populations receiving treatment for behavioral health disorders, no single functional measure will likely be appropriate for all populations, and any measure will need to be adjusted for the severity of illness in the underlying population. For example, a measure that assesses absenteeism and presenteeism in the workplace may be appropriate for a population of adults with depression or anxiety, most of whom are likely to be working, but such a measure is likely to be less useful in assessing the functioning of individuals with schizophrenia, who are less likely to be employed. For the latter group, other measures of social role performance, such as days spent living outside an institution (community tenure), will be more useful.
Implementing functional outcome measures can be difficult, because they usually require collecting information directly from the service user, family member, or another individual (such as a teacher) or tapping into other nontraditional sources of data (for example, human resources absenteeism reports and long-term and short-term disability rates obtained from disability insurers). In the best case, patient self-reported information can be obtained as part of the routine processes of care and can inform treatment planning, but the collection of such information does not always result in its use (
22,
23). Obtaining information from individuals and nontraditional sources is more expensive and burdensome than using claims-based administrative records, particularly because collection of data from individuals who may have already discontinued treatment may be necessary to fully document functional outcomes across a provider’s population. Increasingly, however, efforts are under way to use technology to simplify the collection and use of such data (
24); these efforts offer the potential of supporting the more widespread deployment of functional outcome measures in community settings that treat individuals with behavioral health disorders.
Conclusions
Expanding the scope of measurement in behavioral health care to include functional outcome measures is highly desirable but raises implementation challenges. Implementation of existing process and clinical outcome measures has proved difficult but has progressed over the years. These slow but steady gains suggest that over time, implementation challenges can be overcome. Research shows, for example, that patient-reported outcome measures can be incorporated into routine psychiatric practice and can contribute to improvement in the quality of practice (
25). Psychiatrists can play an important role in moving this agenda forward, both by advocating for the development and implementation of these measures and by participating in efforts to collect them directly.
Clinicians may be frustrated at first by the lack of an obvious link between how they currently view the effect of their treatment approaches (for example, reduction of symptoms and reductions in hospitalizations) and the ultimate goal of restoring general functionality, which is measured by functional outcomes. Clinicians may need to change their practices and methods in response to this change in focus. Yet collection, monitoring, and analysis of functional outcome data, even from a single provider’s practice, and comparisons of data from different providers for quality improvement purposes are likely to improve the ability of practitioners to meet quality improvement goals. Given the challenges of implementation and the demands on practice, collection and analysis of functional data might be best implemented initially only for monitoring and quality improvement purposes, rather than for payment. Over time, as methods improve, linkage of these performance measures to payment, to public reporting of outcomes for service user choice, and for certification purposes will become possible. Use of functional measures for these more “high-powered” purposes will require inclusion of extensive case-mix adjustment methodology to prevent giving clinicians incentives to avoid highly difficult patients.
Despite these challenges, the collection and use of functional outcome measures present new opportunities to behavioral health care. Expanding the focus of measurement from process measures to broad outcome measures broadens opportunities for practice innovations that lead to quality improvement, increases incentives for coordination with other parts of the health and social service system, and complements attention to recovery. These opportunities have led other medical specialty societies to explore the use of broader functional outcome measures, and given the multiple ways that mental health problems affect the lives of service users, it is time for psychiatry to consider this broader focus as well.
Acknowledgments
The authors thank the Policy Work Group of the American Psychiatric Association Board of Trustees for support and assistance in the development of this paper.