Depression is the most common illness treated in psychiatric practice, but rates of response and remission outside clinical trials have not been systematically evaluated. Effectiveness studies (where interventions are compared to a limited or alternative treatment approach, such as usual care) of psychiatric treatment of depression are rare, and while results are generalizable, they may not truly reflect outcomes seen in routine care because they are often protocol driven.
The Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study excluded patients of psychiatrists being treated for depression who had failed an adequate antidepressant trial in their current episode of depression (
1). Simon and colleagues' (
2) prospective study of routine psychiatric treatment of depression included only patients who were initiating antidepressant therapy. The study noted that “we should emphasize that we selected psychiatrists' patients initiating antidepressant treatment, excluding those referred after unsuccessful primary care treatment. A cross-sectional sample of psychiatrists' patients (rather than a cohort of treatment initiators) would reflect the accumulation of more severely ill patients via referral.”
The large majority of patients being treated for depression in psychiatric practice have failed at least one antidepressant trial in their current depressive episode (
2,
3). Therefore, it may not be appropriate for psychiatrists to view these effectiveness studies as a benchmark.
The availability of quick, valid, and reliable patient-rated depression severity instruments, such as the Patient Health Questionnaire-9 (PHQ-9) and the Quick Inventory of Depressive Symptomatology (QIDS), has, for the first time, made systematic monitoring of depression severity more feasible in routine psychiatric practice. The PHQ-9 and the QIDS are as sensitive in detecting depression severity as clinician-rated scales such as the Hamilton Depression Rating Scale (HAM-D) and the Montgomery Asberg Depression Rating Scale (MADRS) (
4–
8).
The goals of this analysis were to longitudinally determine rates of response and remission at 12 and 24 weeks using the PHQ-9 o measure depression severity among a cross-section of patients with moderate to severe major depressive disorder, dysthymia, or depression not otherwise specified and to identify factors associated with achieving response and remission. We hypothesized that response and remission rates would be midway between those reported in efficacy trials and in the few naturalistic community studies.
Methods
The NDMLI is a collaborative effort of the American Academy of Family Physicians, the American College of Physicians, and the American Psychiatric Institute for Research and Education (APIRE). This report focuses on data obtained in psychiatric settings.
Participants
At the outset of the study 19 psychiatric practices were recruited nationally to participate in NDMLI, including six group multispecialty practices, six group mental health specialty practices, four departmental practices that were part of a larger system of care, one outpatient public clinic, and two solo private practitioners with minimal office assistance, representing a variety of organizational structures. Two practices dropped out because they were unable to fulfill data collection requirements of the project.
Of the 17 psychiatrists who participated in the study, 13 (76%) were men with a mean±SD age of 53±11 (range 36–73 years) and an average number of years in practice of 27±12 (range seven to 48 years). The study participants were more likely to be male, to be slightly younger, and to have had fewer years of practice compared with U.S. psychiatrists overall, of whom 65% are male with an average age of 57±12 years and an average time in practice of 29±13 years, according to data from the American Medical Association Physician Masterfile.
The NDLMI intervention
Psychiatrists participating in NDMLI attended a series of three learning sessions spread over a one-year period. The sessions, modeled after the Institute for Health Care Improvement Breakthrough Series (
13), reviewed principles of chronic illness care. The primary focus was to routinize use of the PHQ-9 to facilitate monitoring of depression severity at each visit. The implementation of a registry for tracking and providing proactive follow-up of patients with depression and the systematic planning and documentation of self-management were also discussed (
14,
15). A more detailed description of the intervention can be found elsewhere (
9).
Data collection
The longitudinal observational psychiatrist-reported patient-level data were collected between March 2005 and April 2006. Project psychiatrists were asked to include all patients at their selected outpatient practice sites who were 18 years or older and who had received a primary or secondary diagnosis of depressive disorders (excluding bipolar disorder, schizophrenia, and other psychotic disorders). Consistent with a population-based approach to treatment, new as well as existing patients with either single-episode or recurrent depression treated by project psychiatrists during the 12-month course of the project were asked to complete the PHQ-9 at each visit, irrespective of the severity or chronicity of their depressive symptoms or adherence to treatment. Thus, PHQ-9 data on a cross-section of patients seen in typical psychiatric settings were captured.
The patient's initial PHQ-9 score reflected the first administration of the PHQ-9 by the project psychiatrist at any point during the patient's continuum of care. Hence, the length of time patients were tracked and treated varied during the project. It is also important to note that no treatment algorithms or fixed visit schedules were required, nor was there any specific requirement introduced to utilize guided measurement-based care, as recommended by STAR*D. Following each encounter with a patient, the project psychiatrist completed the depression monitoring flow sheet, which recorded the patient's gender, age, primary and secondary diagnoses, the date the patient's self-management goals were formally documented, and the patient's overall PHQ-9 score for that encounter. In addition, the psychiatrist noted the utility of the PHQ-9 in clinical decision making and specified type of treatment changes prescribed during the visit on the basis of information obtained from the PHQ-9. The deidentified patient flow sheets were forwarded to APIRE by project practices on a monthly basis.
Analytic methods
All data analyses as well as basic frequencies reported here were performed using SAS. The “last observation carried forward” (LOCF) approach was used to explore rates of response and remission for intent-to-treat analyses. For example, patients with any follow-up PHQ-9 that was completed within 12 weeks of the initial study visit were included in 12-week follow-up analyses. Subsequently, the closest follow-up visit to the 12-week time point and its corresponding PHQ-9 score were used in the LOCF analyses (timing of the visits could range from less than one week to 12 weeks after the initial visit). Similarly, for the 24-week analysis, the LOCF ranged between less than one week to 24 weeks. The post hoc logistic regression analyses were performed using SUDAAN. The model included potential confounders and variables considered predictors ofresponse and remission, which are described below.
Measures
Outcome variable.
Response was defined in two ways: a PHQ-9 score <10 or 50% improvement in the PHQ-9 score. Remission was defined as a PHQ-9 score <5. Clinically meaningful thresholds for levels of depression severity for the PHQ-9 include 0–4, none; 5–9, mild; 10–14, moderate; 15–19, moderately severe; and 20–27, severe (
4). Unless otherwise specified, only patients with an initial PHQ-9 score ≥10 (clinically significant depression) were included. To reflect the routine clinical practice setting, patients received a clinical confirmation of a depressive disorder (major depressive disorder, dysthymia, or depression not otherwise specified) by the study psychiatrists.
Covariates.
Because new as well as existing patients with either single-episode or recurrent depression were included, length of time in treatment varied across patients tracked in the project. Consequently the initial PHQ-9 score reflected the first administration of the PHQ-9 by the project psychiatrists at any point during patients' continuum of care. Weeks to first follow-up enumerated the number of weeks after the initial PHQ-9, when a second PHQ-9 was administered. Number of visits enumerated total number of visits from initial administration of the PHQ-9 during the designated time frame. Number of weeks to first follow-up and number of visits up to the designated time point were included in logistic regression analyses as a continuous variable.
Last, for analyses, self-management was considered present if a date for self-management goal documentation was entered; otherwise, absence of any recorded date indicated documentation of self-management goals did not occur (
14,
15). On the basis of clinical diagnosis, psychiatrists recorded the presence or absence of any co-occurring psychiatric disorders. Patients' age was included in logistic regression analyses as a continuous variable.
Institutional review board
Based on the decision of APIRE Institutional Review Board (IRB) and IRBs of participating practices, lead psychiatrists and their project coleaders were required to sign informed consent. The deidentified patient-level data reported by the psychiatrists on the flow sheets, however, were exempt; practices were not required to obtain patients' informed consent.
Results
Patient demographic and clinical characteristics
A total of 17 practices provided patient-level data during the 12-month course of the project. By the conclusion of the study, project psychiatrists provided data on 6,363 clinical contacts for 1,763 outpatients with a diagnosis of depressive disorder. Approximately 42% (N=740) of patients were diagnosed as having one or more co-occurring psychiatric conditions.
Initial PHQ-9 scores were≥10 for 960 patients and <10 for the other 803 patients. The mean score for first administration for those with initial PHQ-9 scores ≥10 was 16.4±4.6, compared with 4.8±2.9 for those with an initial PHQ-9 score <10.
Patient tracking as a routine part of psychiatric care
Of the 960 patients with an initial PHQ-9 score ≥10, 792 (82.5%) had one or more follow-up visits. Patients with one or more visits were tracked during the project for an average of 5.8±3.2 months (range >0 to 12 months).
Table 1 shows the mean number of visits per month and mean number of weeks to first follow-up for patients with one or more follow-up visits on the basis of severity of depression symptoms.
Response and remission rates
Table 2 presents LOCF response and remission rates at 12 and 24 weeks among patients with initial PHQ-9 scores ≥10 and among a cohort of 372 patients with initial PHQ-9 scores ≥10 for whom both 12-week and 24-week visits had been reported. Higher rates of response and remission were observed at 24 weeks than at 12 weeks for patients in the cohort.
Predictors of response and remission
Results from post hoc multiple logistic regression analyses of predictors of response and remission among patients with initial PHQ-9 scores ≥10 are presented in
Table 3. Each analysis accounted for nesting of patient data within each practice site. The number of weeks to first follow-up was significantly and independently associated with response at 12 weeks, even after statistical adjustment for the other covariates; for every decrease in number of weeks to follow-up the odds of response and remission increased by about 9%. In addition to weeks to first follow-up, documentation of self-management also demonstrated significant and independent association with response for the 24-week analysis.
Predictors of remission at 24 weeks included severity of initial PHQ-9 score, weeks to first follow-up, and documentation of self-management. These factors were independently and significantly associated with remission, even after statistical adjustment for other covariates in the model. Only severity of initial PHQ-9 score demonstrated a statistically significant association with patients' achieving remission at the 12-week follow-up. Although not statistically significant, the direction and magnitude of associations of the covariates for weeks to first follow-up and documentation of self-management with remission at 12 weeks appeared comparable to results observed at 24 weeks.
Discussion
This study of 17 psychiatric practices in routine care of a large cohort of patients diagnosed as having moderate to severe depressive disorder provides a benchmark for outcomes likely to be seen in psychiatric care outpatient settings in the United States. Between 31% and 41% of patients were responders by 12 weeks, and 13% were in remission. By 24 weeks, between 36% and 45% of patients were responders, and 18% were in remission. The results reflect the rather modest frequency of follow-up visits even among patients presenting with PHQ-9 scores ≥10. Notably, longer time to first follow-up visit predicted poorer outcomes. Finally, documentation by clinicians of self-management by the patient was a very positive predictor of outcome.
The outcomes from this study can be compared to those seen in depression efficacy and effectiveness studies. Efficacy studies generally demonstrate better outcomes than effectiveness studies, but results from both are usually better than real-world outcomes. Depression efficacy trials reported response and remission rates of approximately 63% and 47%, respectively (
16). Efficacy studies typically include a placebo or inactive control group and have strict inclusion and exclusion criteria and usually exclude many depressed patients treated by psychiatrists.
Depression effectiveness studies such as STAR*D are more generalizable than the efficacy studies, but they differ from typical psychiatric care by requiring patients' informed consent and involving study coordinators, specific treatment algorithms, and frequent follow-up visits. Even with this rigorous approach, only 47% of STAR*D participants responded to and 33% reached remission upon their first trial of antidepressant treatment. Therefore, it is not surprising that the rates of response and remission in this study were lower than those seen in STAR*D, which included the full measurement-based care package (
17,
18), including the measurement of symptoms, side effects, and adherence coupled with clinical decision making at critical decision points.
A major strength of this study is the generalizability of the population and practice settings to routine care settings that are not typically a venue for research. Because there has been a major emphasis on the use of measurement-based care for depression through the findings from large practical clinical trials, the use of the PHQ-9 by NDMLI project psychiatrists to track outcomes and support treatment changes when response and remission goals were less than desired is a step in the right direction. It is important to note that although principles of chronic illness care were reviewed during NDMLI learning sessions (
9), the only change consistently implemented by participating practices was the routine use of the PHQ-9.
A number of large practical effectiveness trials, such as the Texas Medication Algorithm Project (TMAP), STAR*D, and Improving Mood-Promoting Access to Collaborative Treatment (IMPACT), provide some indication of expected rates of response and remission in routine practice, but all previous studies of routine psychiatric treatment of depression derived from a very small sample of patients from any given practice or the physicians' caseloads (
18–
20). Our study, on the other hand, conducted in routine practice, included a large number of patients with depression from psychiatrists' caseloads, making it more generalizable. If we had randomized patients to specific interventions, which is necessary in both efficacy and effectiveness studies, we would have lost the ability to assess outcomes in general psychiatric practice.
Wisniewski and others (
21) found that STAR*D patients who did not meet typical inclusion and exclusion criteria for efficacy trials had lower rates of response and remission. NDMLI included patients who would not have been eligible for efficacy trials or even STAR*D, which may explain the low response and remission rates reported here.
This study has limitations. The low frequency of visits and the lack of a treatment algorithm and side effect monitoring in NDMLI may have delayed treatment response and remission, compared with clinical trials. However, because of its limitations, the study may represent the current state of psychiatric treatment of depression in the real world.
The outcomes reported here may be even better than usual because of the use of monitoring with the PHQ-9 and self-management strategies. Depression severity is not usually quantitatively monitored in psychiatric practice, and project psychiatrists reported that the results of the PHQ-9 frequently led the participating psychiatrists to change treatment (
9). The treatment adjustments reported most frequently included changing medication dose, adding another medication, and starting or increasing therapy. All these adjustments could have increased rates of remission. Also, the participating psychiatric practices may have been more motivated than an average psychiatric practice to focus on depression treatment.
Another limitation of the project is the lack of data to differentiate patients on the basis of whether they were experiencing a new episode of depression or had chronic depression with a long history of treatment. If most patients were among the latter group, the low remission rates reported here would be consistent with the very low remission rates among patients on the third and fourth trials of medication in the STAR*D study (
22).
To better understand how to improve response and remission rates, project psychiatrists suggested identifying predictors of response and remission. The post hoc regression analysis found that the severity of initial PHQ-9 score, weeks to first follow-up, and documented self-management were the three factors that predicted remission. Baseline severity of depression has been noted to predict resistance to depression treatment in other studies (
23,
24). The underrecognition of depression comorbidity in routine practice may have lessened the power to detect the impact of comorbidity in this study (
25). The role of self-management documentation in predicting response and remission was not anticipated but perhaps not surprising in light of research showing that self-management improved outcomes in other chronic illnesses (
26). Empowering depressed patients to become more active in their own care may be even more important than in other chronic illnesses, given that hopelessness is often a prominent symptom of depression.
A limitation of the regression analysis is the lack of information about current treatments patients were receiving. This is a potentially important factor to account for since treatment approaches are targeted to achieve response and remission. However, previous findings suggest that nearly 95% of patients with major depression (that is not in remission) treated by psychiatrists receive at least psychopharmacologic treatment and that the remainder receive psychotherapy (
27). Therefore, it is safe to assume that close to 100% of patients in this study, all of whom were being treated by psychiatrists, received some form of treatment for depression.
Conclusions
Administering the PHQ-9 at every visit allowed participating psychiatrists to determine rates of response and remission in their practices. Because most sites had not routinely monitored depression severity before this project, use of the PHQ-9 provided new information and led to discussions about ways to improve outcomes, including the feasibility of implementing other elements of measurement-based care that were described in the STAR*D study.
This is the first large study to assess depression response and remission rates in typical psychiatric practices and can act as a source of comparison for future studies. Its findings need to be confirmed in future naturalistic studies of depression treatment that can assess outcomes routinely. Studies that randomly assign patients to psychiatric care with self-management or to usual psychiatric care are also needed to determine whether the improved odds of response and remission seen in the regression analyses are valid. Prospective research on the impact of time to psychiatric follow-up on depression response and remission is also indicated. Finally, research is needed to compare the impact of routine measurement of depression severity versus usual care or to examine whether practice patterns differ for acute or chronic episodes of depression when measurement is used.
Acknowledgments and disclosures
This study benefited from generous support by the American Psychiatric Foundation (APF) and an unrestricted educational grant to APF for this research by a consortium of industry supporters, including AstraZeneca International, Eli Lilly and Company, Lilly Foundation, Forest Laboratories, Pfizer, Sanofi Aventis, and Wyeth. The authors acknowledge significant contributions of the 17 psychiatric practices that participated in this project (see appendix at ps.psychiatryonline.org). Dr. Katzelnick is a principal shareholder of stock in Healthcare Technology Systems. Dr. Trivedi has received research support from the Agency for Healthcare Research and Quality (AHRQ), Corcept Therapeutics, Inc., Cyberonics, Inc., Merck, Naurex, Novartis, Pharmacia & Upjohn, Predix Pharmaceuticals (Epix), Solvay Pharmaceuticals, Inc., Targacept, and Valient. He has received consulting and speaker fees from Abbott Laboratories, Inc., Abdi Ibrahim, Akzo (Organon Pharmaceuticals Inc.), Alkermes, AstraZeneca, Bristol-Myers Squibb Company, Cephalon, Inc., Evotec, Fabre Kramer Pharmaceuticals, Inc., Forest Pharmaceuticals, GlaxoSmithKline, Janssen Pharmaceutica Products, Libby, LP, Johnson & Johnson PRD, Eli Lilly & Company, Meade Johnson, Medtronic, Neuronetics, Otsuka Pharmaceuticals, Parke-Davis Pharmaceuticals, Inc., Pfizer Inc., Sepracor, SHIRE Development, Sierra, Tal Medical/Puretech, Transcept, VantagePoint, and Wyeth-Ayerst Laboratories.