Major depression is common, leading to marked suffering for patients and families and causing physical and mental disability, with a substantial economic burden (
1). Although major depression is prevalent across different cultures and effective pharmacological and psychosocial interventions are available, low remission rates in clinical practice are discouraging (
2). Poor outcomes are related to inadequate dose and duration of pharmacotherapy, poor treatment adherence, high dropout, and frequent as well as unnecessary medication changes (
3). In addition, inconsistency of treatment strategies among clinicians is common. Even in current, guideline-driven practice, there are often wide variations in clinicians’ behaviors, resulting in practice bias rather than a tailored and individualized treatment algorithm (
4).
The concept of measurement-based care was developed and tested in the Texas Medication Algorithm Project (TMAP) (
5–
7) and the German Algorithm Project, phases 1–3 (GAP1, GAP2, and GAP3) (
8–
11). The term “measurement-based care” was coined by Trivedi et al. (
12). Recently, measurement-based care has been gaining attention in the treatment of depression because it allows psychiatrists to individualize treatment decisions for each patient based on changes in psychopathology and tolerance of antidepressants (
12). The TMAP was the first controlled study to evaluate measurement-based care in the treatment of depression (
5–
7). Subsequently, several open or randomized controlled studies, such as the GAP1, GAP2, and GAP3 studies (
8–
11) and the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study (
4,
12,
13), evaluated measurement-based care, finding that measurement-based care-informed sequential algorithms can be successfully integrated into clinical practice and improve patient outcomes. However, design weaknesses, including lack of randomization (
4,
5) and of blind raters (
14,
15), may have biased the findings.
Despite a strong theoretical rationale for measurement-based care and data supporting the ability to implement this approach in clinical practice settings, there has never been a randomized controlled trial with blind raters comparing measurement-based care with usual care in the treatment of depression. Since usual care may involve different medication choices that could influence outcomes in addition to the presence or absence of measurement-based care, an alternative strategy is to compare measurement-based care with “standard treatment” that limits medication choices in order to isolate the effect of measurement-based care. Given the frequency of major depression, and the individual as well as societal burden it imposes, evaluation of the effectiveness and cost-saving potential of measurement-based care in a randomized controlled trial is critical, in order to inform clinical care and guidelines. Moreover, measurement-based care strategies are needed that are easily implemented in clinical practice and are scalable.
The aim of this study was to determine the efficacy and safety of measurement-based care in patients with major depression. We hypothesized that time to response and time to remission would be significantly shorter in the measurement-based care group, without greater dropout rates and side effect burden, compared with the standard treatment group.
Method
Patients and Study Setting
This was a randomized controlled trial, with assessors blind to protocol and treatment group, conducted between December 2011 and November 2012 in Beijing Anding Hospital, a university-affiliated teaching hospital in China. This 800-bed hospital serves a population of approximately 19 million people and has 1,100 outpatient visits daily.
To maximize the generalizability of the findings, only patients seeking psychiatric treatment (as opposed to those enrolled by advertisements) were recruited. Patients had to be outpatients, 18–65 years of age, with a diagnosis of nonpsychotic major depression established by treating psychiatrists and confirmed by a checklist based on DSM-IV criteria at study entry (
12), as well as a score ≥17 on the Chinese version of the 17-item Hamilton Depression Rating Scale (HAM-D) (
16,
17); all participants had to have the ability to communicate and to provide written consent. Exclusion criteria were a lifetime history of drug or alcohol dependence; bipolar, psychotic, obsessive-compulsive, or eating disorders; history of a lack of response or intolerance to either of the two protocol antidepressants (paroxetine and mirtazapine); pregnancy or breastfeeding; suicide attempts in the current depressive episode; or any major medical condition contraindicating the use of the protocol antidepressants. Paroxetine and mirtazapine doses were converted into amitriptyline-equivalent milligrams (50 mg in amitriptyline equivalents equals 10 mg of paroxetine or 15 mg of mirtazapine) (
18).
The study protocol was approved by the Human Research and Ethics Committee of Beijing Anding Hospital in accordance with the Declaration of Helsinki and local clinical traditions. All patients provided written informed consent.
Interventions and Measurement-Based Care
Patients who met inclusion criteria and provided written informed consent entered a 1-week washout phase for previously taken psychotropic medications for major depression (the washout phase was stipulated as 1 week by the Human Research and Ethics Committee of Beijing Anding Hospital). After washout, patients were randomly assigned to standard treatment or to measurement-based care according to a table of random numbers, and then followed for 24 weeks.
Patients in both groups received either open-label paroxetine (20–60 mg/day) or open-label mirtazapine (15–45 mg/day), within the therapeutic dosage range recommended by the Chinese Medical Association’s guideline for the prevention and treatment of major depression (
19). The study’s therapeutic dosage range for paroxetine was recommended by the Chinese Society of Psychiatry’s Depression Panel. Paroxetine, a selective serotonin reuptake inhibitor, was chosen because it has been one of the most commonly prescribed antidepressants in China during the past decade, and mirtazapine, an alpha-2 antagonist, was chosen because it has a different mechanism of action (
20). The treating psychiatrists could decide which of the antidepressants and dosages to prescribe, as long as they were within the study’s recommended dosage ranges. During the study, one medication change between paroxetine and mirtazapine was allowed for intolerability or inefficacy. The only other psychotropic medications allowed in the study were short-acting benzodiazepines, sparingly, for agitation, anxiety, and insomnia. Other medications not affecting the CNS were permitted.
Patients in the standard treatment group were treated by their psychiatrists according to their clinical needs as judged at each outpatient visit.
Following the STAR*D project (
www.star-d.org), patients in the measurement-based care group received treatment according to a schedule that included individualized starting dosages, dosage adjustment, and medication changes to minimize side effects, maximize safety, and optimize the therapeutic benefit for each patient. The treating psychiatrists made treatment decisions on the basis of ratings on self-report scales obtained at each treatment visit: the 16-item Quick Inventory of Depressive Symptomatology–Self-Report (QIDS-SR) (
21,
22) and the Frequency, Intensity, and Burden of Side Effects Rating scale (
23). Paroxetine was started at 20 mg/day and increased to 30 mg/day by week 4, to 40 mg/day by week 6, to 50 mg/day by week 8, and to 60 mg/day (final dosage) by week 10. Mirtazapine was started at 15 mg/day and increased to 30 mg/day by week 1 and to 45 mg/day (final dosage) by week 4. Dosage adjustments were dependent on how long the patient had received a particular dosage, symptom changes, and side effects. The measurement-based care schedule used in this study is presented in
Table 1.
The treatments were delivered in the outpatient department of Beijing Anding Hospital. All treating psychiatrists were regular clinicians. The two treatment groups were cared for by separate treatment teams. Before the study, the clinicians responsible for the measurement-based care group underwent a 2-day training program on using measurement-based care according to the study schedule.
Following the STAR*D study (
24), an independent clinical research coordinator monitored psychiatrists’ compliance with the measurement-based care treatment guidelines. A physician feedback form, designed for the study, was used to ensure that the treatment was delivered according to the measurement-based care guidelines. After each clinical visit, the clinical research coordinator completed the physician feedback form on the basis of the clinical visit’s documentation. If the treatment decision deviated from the guidelines, the psychiatrist was alerted shortly after the appointment to make the treatment consistent with the guidelines as soon as possible. With the assistance of the physician feedback form, the rate of nonadherence to the guidelines was <5% throughout the study period, and all violations were corrected promptly.
Outcome Measures
Basic sociodemographic and clinical characteristics were collected through a review of medical records, using a form designed for this study, and then confirmed in a clinical interview. The two primary outcome measures were the estimated time from randomization to response and to remission according to the HAM-D score. Response was defined as a decrease ≥50% from the baseline HAM-D score, and remission as a HAM-D score ≤7 (
12). The pill count method was used to measure treatment adherence.
Secondary outcome measures included the severity of depressive symptoms according to the HAM-D and the severity of manic or hypomanic symptoms according to the Young Mania Rating Scale (YMRS) (
25). An additional checklist with six common side effects (dry mouth, diarrhea or constipation, dizziness or drowsiness, loss of appetite or nausea, headache, and excessive sweating) was used to measure side effects at each treatment visit.
As tools to implement the measurement-based care, the QIDS-SR and the Frequency, Intensity, and Burden of Side Effects Rating scale were used to measure the severity of depressive symptoms within the past week and antidepressant side effects, respectively, in the measurement-based care group only. On the QIDS-SR, higher scores indicate more severe depressive symptoms (
21,
22). The Frequency, Intensity, and Burden of Side Effects Rating scale (
23) is a self-report instrument assessing three domains of medication side effects within the past week: frequency, intensity, and burden (the degree to which side effects over the past week interfered with day-to-day functions). Each domain is rated on a 7-point (0–6) scale (frequency, ranging from “no side effects” to “present all of the time”; intensity, ranging from “no side effects” to “intolerable”; and burden, ranging from “no impairment” to “unable to function”). A low score (0–2) indicates that current treatment may continue; an intermediate score (3 or 4) suggests that side effects require attention; a high score (5 or 6) means that the current treatment is unacceptable and a decrease in dosage or a medication switch is needed (
13).
Assessment Methods
Two raters with >8 years of experience in clinical practice and research independently assessed patients with the above-described instruments at baseline and at 2, 4, 8, 12, and 24 weeks. The raters were blind to the study protocol and treatment assignment and were not involved in treatment. Before the study, the two raters were trained in the use of the instruments. In the prestudy reliability exercise, interrater reliability (intraclass correlation coefficients for continuous ratings and kappa values for categorical measures) was above 0.8. All patients were instructed by the research coordinators not to disclose their group membership to the raters at any time during the study. Patients were removed from the study if they had a suicide attempt, became pregnant, developed a severe medical condition, or suffered from newly emerging side effects that they found intolerable and that could not be managed. Patients who were removed from the study received antidepressant treatment as appropriate as part of clinical care.
Statistical Analysis
Data were analyzed using SPSS for Windows, version 20.0 (IBM Corp., Armonk, N.Y.). Full intent-to-treat analyses were performed on all-cause and specific-cause discontinuation; all other analyses were conducted in the modified intent-to-treat sample, that is, patients who underwent a baseline assessment and at least one follow-up assessment. Baseline sociodemographic and clinical characteristics, discontinuation, response and remission rates, and side effects were compared between the two groups using independent-sample t tests, Mann-Whitney U tests, and chi-square tests, as appropriate.
Kaplan-Meier survival analyses were used to calculate the estimated time from randomization to response and remission. The Cox proportional hazards regression model was used to compare the estimated time to response and remission between the two groups while controlling for covariates, such as marital status, age, and concomitant medications. The analyses included patients who met the criteria for response or remission and those who were lost to follow-up without a documented response or remission, as well as those who did not meet response or remission criteria at their last assessment. Additionally, Kaplan-Meier survival analysis and Cox proportional hazards regression analysis were performed to compare the estimated time to all-cause discontinuation between the two treatment groups. Differences in the changes in HAM-D and YMRS scores between the two groups from baseline to endpoint were subjected to analysis of covariance with baseline scores, marital status, and age as covariates. Continuous outcomes were analyzed as last-observation-carried-forward data. The significance threshold was set at 0.05 (two-tailed).
Results
Of 164 screened patients, 120 (73.2%) met study criteria and were randomly assigned to standard treatment (N=59) or to measurement-based care (N=61) (see Figure S1 in the data supplement that accompanies the online edition of this article).
Sociodemographic and Clinical Characteristics
All participating patients had medical insurance. There were no significant differences between the two groups in demographic or clinical characteristics, except that patients in the measurement-based care group were younger on average and less likely to be married (
Table 2).
Study Discontinuation
All-cause discontinuation did not differ significantly between the measurement-based care and standard treatment groups (27.9% and 37.3%, respectively; for details, see Figure S1 in the data supplement). Likewise, time to all-cause discontinuation was similar between groups (measurement-based care, 14.6 days; standard treatment, 15.0 days; hazard ratio=0.67, 95% CI=0.35–1.29).
Dosage, Medication Adherence, and Treatment Adjustment
Antidepressant dosages in amitriptyline equivalents and clinical visits are summarized in
Table 3. Treatment adherence did not differ between groups (99.8% and 99.7%). The mean number of clinical visits was 8.0 (95% CI=7.5–8.4) for the standard treatment group and 8.4 (95% CI=8.0–8.9) for the measurement-based care group over the whole study period. There were no significant differences between groups at 0–2 weeks and 3–4 weeks; however, patients in measurement-based care had more clinical visits at 5–8 weeks and 9–12 weeks (both p values <0.001), but fewer visits at 13–24 weeks (p<0.001) than those in the standard treatment group. The total number of treatment adjustments was 23 for the standard treatment and 44 for the measurement-based care group (χ
2=13.4, df=1, p<0.001). In the standard treatment group, there were 22 dosage adjustments and one medication switch (from mirtazapine to paroxetine; the mean dosage was 75 mg in amitriptyline equivalents at the time of the switch). In the measurement-based care group, there were 40 dosage adjustments and four medication switches (two from mirtazapine to paroxetine and two from paroxetine to mirtazapine; the mean dosage was 137.5 mg in amitriptyline equivalents at the time of the switches). The mean antidepressant exposure time was 20.5 weeks (SD=5.3) in the standard treatment group and 20.6 weeks (SD=5.8) in the measurement-based care group. However, antidepressant dosages were significantly higher in the measurement-based care group than in the standard treatment group from week 2 (118 mg/day compared with 106.7 mg/day; p=0.02) to week 24 (122.1 mg/day compared with 106.7 mg/day; p=0.006).
Response and Remission
The response rate was 62.7% in the standard treatment group and 86.9% in the measurement-based care group (χ2=9.3, df=1, p=0.002), for an overall response rate of 75.0%. The remission rate was 28.8% in the standard treatment group and 73.8% in the measurement-based care group (χ2=24.2, df=1, p<0.001), for an overall response rate of 51.7%.
The average time to response was 8.1 weeks (95% CI=6.5–9.6) in the standard treatment group and 4.5 weeks (95% CI=3.3–5.8) in the measurement-based care group (t=3.4, df=118, p=0.001). The corresponding figures for remission were 14.8 weeks (95% CI=12.8–16.7) and 8.4 weeks (95% CI=6.7–10.2) for the two groups (t=4.8, df=118, p<0.001). For responding patients, the average time to response was 5.1 weeks (95% CI=4.1–6.0) for the standard treatment group and 3.1 weeks (95% CI=2.7–3.6) for the measurement-based care group (p<0.001). For remitting patients, the time to remission was 8.4 weeks (95% CI=5.7–11.1) for the standard treatment group and 6.0 weeks (95% CI=4.8–7.2) for the measurement-based care group, which fell short of statistical significance (p=0.054).
The estimated time intervals for response and remission in the two groups in the Kaplan-Meier analysis are illustrated in
Figure 1 and
Table 4. The difference between the survival curves was significant for both response (log rank=18.6, p<0.001) and remission (log rank=29.1, p<0.001).
In the Cox regression model, controlling for the potentially confounding effects of marital status, age, and concomitant medications, the estimated times to response (hazard ratio=2.2, 95% CI=1.4–3.5; p<0.001) and remission (hazard ratio=4.2, 95% CI=2.3–7.6; p<0.001) were significantly longer with standard treatment than with measurement-based care.
Symptom Ratings
There were no significant differences between the two groups in baseline HAM-D and YMRS scores (
Tables 2 and
4). By the end of the study, the HAM-D score decreased significantly in both groups, but the change was significantly larger in the measurement-based care group (p<0.001). The overall low YMRS score, however, did not change significantly from baseline to endpoint in either group.
Adverse Events
The proportions of any type and the total number of adverse events did not differ significantly between the two groups (
Table 5).
Concomitant Psychotropic Medications
Short-acting benzodiazepines were prescribed for 32.2% (19/59) of patients in the standard treatment group and 47.5% (29/61) in the measurement-based care group, a difference that fell short of statistical significance (χ2=2.9, df=1, p=0.09). Of these patients, all 19 in the standard treatment group took lorazepam, and in the measurement-based care group, 27 patients took lorazepam and two took oxazepam.
Discussion
Evidence is increasing that measurement-based care allows psychiatrists to individualize treatment decisions for major depression based on changes in psychopathology and side effects, which decreases inappropriate variance and improves the implementation of appropriate treatment strategies, thereby enhancing outcomes, reducing treatment resistance, and increasing the quality of care (
10). To the best of our knowledge, this was the first randomized controlled trial with blind raters to systematically investigate the effect of measurement-based care, compared with standard treatment, on time to response and remission in patients with major depression, using identical medication options in the two groups in order to isolate the effect of measurement-based care. Similar to findings of the TMAP (
5) and subsequent open or randomized studies (
8–
11), we found that measurement-based care-informed sequential algorithms can be successfully integrated into clinical practice and improve patient outcomes. Our study demonstrated significantly higher response and remission rates at 6 months in the measurement-based care group compared with the standard treatment group, translating into numbers needed to treat of 5 and 3 for response and remission, respectively. Furthermore, compared with the standard treatment group, the measurement-based care group had a significantly higher proportion of responders (86.9% compared with 62.7%) and remitters (73.8% compared with 28.8%). In fact, in the measurement-based care group, 85% of the responders achieved remission, whereas only 46% did so in the standard care group. Moreover, for patients receiving measurement-based care, both time to response and time to remission were significantly shorter than for those in the standard treatment group (5.6 weeks compared with 11.6 weeks, and 10.2 weeks compared with 19.2 weeks, respectively). These findings were both statistically significant and clinically meaningful, reducing time of suffering and expense before reaching response by 6 weeks and before reaching remission by 9 weeks. These positive results were obtained within the context of significantly more treatment adjustments in the measurement-based care group, guided by rating scale-based assessments, and without a higher frequency of dropouts, concomitant medications, or adverse effects.
Response was measured because it is a common outcome measure in drug trials (
26). However, response that falls short of remission is suboptimal, since it is associated with residual symptoms, high frequency of relapse, impaired social functioning, and high risk for suicide (
12,
27). In contrast, the higher rates of and shorter time to remission in the measurement-based care group translate into a lower symptom burden, lower rates of expected relapse and suicide, and normal psychosocial function (
28).
The antidepressant dosages in the measurement-based care group were higher than those in the standard treatment group from week 2 to week 24. The higher dosages in the early phase of treatment may have led to faster symptom reduction. Furthermore, patients in the measurement-based care group had more treatment adjustments than those in the standard treatment group (44 compared with 23), which were predominantly dosage adjustments. This rating scale-based and individualized treatment approach that set the clear target of remission was likely responsible for the superior efficacy outcomes in the measurement-based care group without compromising treatment continuation and tolerability, despite higher antidepressant dosages. Our finding of lower dosages in the standard treatment group supports the notion that suboptimal antidepressant dosages and lack of appropriate medication changes in the context of inadequate outcome contributed to the low remission rates (
3,
4). Moreover, our results, although they still require replication, also suggest that the period between 1 and 3 months may be the most critical for fine-tuning the antidepressant treatment approach, which may yield faster and better outcomes and reduce the need for frequent visits beyond 3 months. Taken together, these findings support the notion that measurement-based care can maximize therapeutic effects and minimize, or at least not increase, side effects (
13).
The remission rate in the measurement-based care group (73.8%) was higher than rates reported in most but not all earlier studies (
29). The favorable remission rate in this study compared with earlier efficacy studies (e.g., 22% [
30]) may be due to advantages related to measurement-based care. The remission rate in our measurement-based care group was also higher than the rates of 28%−33% in the STAR*D project (
12) and 54% in the GAP2 study (
14), which also used measurement-based care. Possible reasons for differences between the present study and the STAR*D study (
12) include a longer treatment duration in our study (24 weeks compared with 14 weeks) and a higher proportion of married patients (70.5% in the measurement-based care group and 86.4% in the standard treatment group, compared with 41.7% in the STAR*D study), who are known to have higher remission rates (
12). Additional possible reasons for lower remission rates in STAR*D than in our study include different sampling methods, the potential failure to distinguish unipolar and bipolar depression, and the sequential phases design, in which the next phases of the study had to be filled with patients who did not meet remission criteria (
31). Among possible reasons for differences in remission rates between our study and the GAP2 study (
14) are that the latter included patients with psychotic major depression (17.7%−18.3%) and inpatients, who are generally more severely ill, whereas our trial excluded these patient groups. Moreover, because of a lack of community-based psychiatric services in most areas of China (
32), outpatients are usually not severely ill. For example, the mean baseline HAM-D score was 22.4 in our study. No remission data are available for the TMAP study (
5). In our study, the actual (not estimated) mean time to response (4.5 weeks) and remission (8.4 weeks) for the measurement-based care group was similar to the Keller et al. study (5.7 weeks and 6.7 weeks, respectively) (
30) and the GAP2 study (mean time to remission, 7.0 weeks) (
14). No time to event data are available for the STAR*D and TMAP studies.
As mentioned above, measurement-based care consists of individualized treatment, which incorporates self-reported measures that can improve patients’ ability to monitor their own symptoms and side effects and help them understand the nature of their depression and the complexity of its treatment. All these factors are beneficial in improving the acceptability of the illness management (
33) and may help in the making of shared decisions with a clear goal of remission. Therefore, we had hypothesized that treatment adherence in the measurement-based care group would be better than in the standard treatment group. However, treatment adherence rates were very high and nearly identical in both groups (99.8% and 99.7%). In STAR*D, self-reported medication adherence monitoring via a web-based system was employed, but adherence rates were not reported (
12). Because of a lack of community-based rehabilitation or day-care facilities in most areas of China (
32), psychiatric patients frequently live with their families. As a result, treatment monitoring and support from their families may improve medication adherence, which could account for the high adherence rate even in the standard treatment group.
The results of this study should be interpreted within the context of its limitations. First, the open study design may have influenced the outcomes to an unknown degree. However, we sought to conduct a study that would mirror usual practice patterns, except for the measurement-based care component, in order to ensure generalizability. Moreover, we utilized assessors blind to both group assignment and study protocol to provide unbiased ratings. Second, we restricted the study antidepressants to paroxetine and mirtazapine, and our results may not be applicable to other antidepressant drugs. However, efficacy differences among antidepressants are likely small (
34), and by restricting the medications to the two commonly used antidepressants in both groups, we were able to isolate our testing to the effectiveness of measurement-based care compared with standard treatment, independent of potential differences in medication choice. Third, our clinical research coordinators’ routine check on psychiatrists’ adherence with the measurement-based care guideline, as was done in STAR*D, may have increased the efficacy of the measurement-based care. Fourth, only outpatients at one site in mainland China were involved; therefore, the findings need to be replicated in other treatment settings. In addition, information on family psychiatric history from case notes may be inaccurate because of potential recall bias. Because of the limited number of psychiatrists, the treatment rate for psychiatric disorders is very low in China (
32,
35). In many cases, patients or their families can only recall a positive family history of psychiatric disorders, but not the actual diagnoses. Fifth, neither specific reasons for treatment discontinuation nor sexual side effects were assessed, yet rates of discontinuation numerically favored the measurement-based care group and none of the side effects measured differed between the two groups. Sixth, in the measurement-based care group, treatment visits coincided with the research assessment visits at weeks 2, 4, 6, 8, 10, 12, and 24. Additional visits according to clinical need were allowed, especially in the latter 3 months of the study. Conversely, the clinical visits were uncontrolled in the standard treatment group. However, the number of clinical visits did not differ significantly for the two groups over the study period, arguing against a relevant effect on the outcome. Seventh, pill counts were conducted in both groups. Although this could have increased adherence in the standard treatment group beyond usual care standards (there were no pill count differences between the two groups), this is a conservative bias that would have worked against a separation of measurement-based care from standard treatment. Eighth, similar to the STAR*D project, no structured diagnostic schedule for major depression was used, although the clinical diagnosis of major depression was confirmed by a checklist based on DSM-IV criteria at study entry, and major depression symptoms had to be moderate to severe. Ninth, the study lasted only 24 weeks; however, this is a relatively long duration for a randomized trial, and achievement of remission has been associated with positive long-term outcomes (
28). Finally, given the different pharmacological action of mirtazapine, paroxetine, and amitriptyline, the conversion of antidepressant doses into amitriptyline-equivalent milligrams is not entirely clear and precise. However, using the same conversion standard when comparing measurement-based care and standard treatment in the multivariate analysis should have mitigated this limitation.