Selective serotonin reuptake inhibitors (SSRIs) and other second-generation antidepressant medications, including bupropion, mirtazapine, venlafaxine, and duloxetine are the most commonly prescribed pharmacological treatments for depressive disorders in late-life and are recommended as first line treatments in clinical practice guidelines (
1). Although previous systematic reviews of antidepressant drugs found evidence for efficacy in older adults with depression (
2–
8), reviews or meta-analyses before 2003 focused principally on the tricyclic antidepressants (TCAs) because until that year only one large placebo-controlled trial of a nontricyclic antidepressant, marketed in the US, in outpatients 60 years of age or older with major depressive disorder had been published (
9). Because the first few trials of second-generation agents reported no advantage relative to placebo (
10,
11) or small drug-placebo differences (
9,
12), the efficacy of these agents in older depressed adults was questioned.
METHODS
The Cochrane Controlled Trials Register (2006, Issue 3) was searched using terms previously employed by the Cochrane group in a similar search (elder, geriatr, senil, older, old age, late-life, aged, 80-and-over, and depress) (
6). These terms were supplemented with the terms
antidepressants, fluoxetine, sertraline, paroxetine, citalopram, escitalopram, venlafaxine, duloxetine, mirtazapine, bupropion, nefazodone, and trazodone (second generation antidepressants marketed in the United States). MEDLINE (1966 to August 2006) was also searched. Proceedings from geriatric psychiatric and psychiatric professional society meetings since 2000, and previous reviews were hand-searched. Pharmaceutical manufacturers were queried and information was requested as needed.
Trials selection
Trials were included if they met the following criteria: acute phase, parallel group, double-blinded, placebo-controlled with random assignment to an orally administered second generation antidepressants (nontricyclics) marketed in the United States; patients had nonpsychotic, unipolar Major Depression not associated with a specific medical disorder, e.g., poststroke depression or dementia; patients were community dwelling, and aged 60 years or older; numbers of patients randomized, outcomes, and dropouts were obtainable. Trials did not need to be published or peer reviewed and could be reported in manuscripts, technical trials reports, or posters. (Some sources presented incomplete information regarding remission rates, mean change scores with the standard deviations (SD), or adverse event discontinuation rates and this information was obtained though other data presentations or from sponsors). Two authors identified and agreed upon studies meeting these criteria. Study quality was assessed with Jadad scores (
19).
Data extraction
Information extracted included study design, patient selection criteria, medication doses, location, trial durations, age, sex, baseline depression rating scores, numbers randomized, clinical outcomes on rating scales, and dropouts occurring during the double-blind trial period.
Clinical outcomes included response, remission, and change scores on the Hamilton Depression Rating Scale (HAMD) (
20) or Montgomery Asberg Depression Rating Scale (MADRS) (
21) and were assessed in the intent-to-treat (ITT) samples using the last observation carried forward in patients with at least one posttreatment rating. Response was defined as ≥50% improvement from baseline on the HAMD or the MADRS. Remission was defined by the individual study. In flexible dose trials that compared two formulations of the same medication, we combined the drug groups to make one contrast with the placebo group to minimize multiple comparisons with the same placebo group. For fixed-dose trials, comparing two doses of a medication, we compared each dosage group to placebo. In trials with an active comparator we compared each drug within the trial to the placebo group. Data were abstracted by one investigator (JCN) and checked by another investigator (LSS). Any discrepant data were rereviewed by the investigators to ensure that accurate data were obtained.
Necessary information not included in the publication or presentation was requested from the investigator or sponsor of the study. If SD of the change scores were not available, they were estimated from standard errors and sample sizes or imputed using the largest SD reported in other trials.
Statistical analysis
The number of responders, remitters, dropouts, and the number randomized into each drug and placebo group for each trial were statistically combined using the Peto fixed-effects model. Effects were expressed as odds ratios (ORs) and absolute risk differences (RDs) with their 95% confidence intervals (CIs), test of significance (Wald z), number (N) of contrasts, and p values. The OR is the odds of an event (e.g., response) occurring in one group divided by the odds of the event in the other group (e.g., the placebo group). The overall ORs for the meta-analysis is the mean of the ORs computed for each contrast weighted for sample size and the event rate. The RD for the meta-analysis is the mean of the differences between the risk of the event (response) in one group minus that in the other group weighted as above. Mean change scores on the HAMD for each drug and placebo comparison were combined using an inverse variance fixed-effects method. Effects were expressed as weighted mean differences (WMDs) with the 95% CIs, z score, N, and p values. The WMD is the difference between the mean change scores of the treatment and control groups weighted by the sample size. Effects were calculated for each drug-placebo contrast, for each drug class (SSRI, serotonin norepinephrine reuptake inhibitor, e.g., venlafaxine; dopamine norepinephrine reuptake inhibitor, e.g., bupropion) separately, and as meta-analytic summaries for all drugs combined. A funnel plot in which the standard error (SE) of the log OR against the log OR was used to evaluate potential retrieval bias.
χ
2 tests and the I
2 statistic derived from the χ
2 values were used to test heterogeneity among the contrasts. I
2 approximates the proportion of total variation in the effect size estimates that is due to heterogeneity rather than sampling error (
22). An alpha error p < 0.20 and I
2 of at least 50% were taken as indicators of heterogeneity of outcomes. In the text, the χ
2 test and I
2 statistics for heterogeneity follow the 95% CI, z score, N, and p value for the ORs unless these values are provided in a figure. For outcomes that were heterogeneous we examined the effect of individual removal of extreme outcomes on the I
2 statistic.
We compared the following subgroups as sensitivity analyses: potential differences between SSRIs and other drugs and between studies of 6–8 week versus 10–12 week duration. Differences between two or more subgroups were investigated by subtracting the sum of the heterogeneity χ
2 statistics of the subgroups from the overall χ
2 statistic and comparing the result with a χ
2 distribution with a
df of one less than the number of subgroups (
23). Review Manager version 4.2 (The Cochrane Collaboration, Oxford, England) was used for statistical calculations.
Role of funding source
No external funding was received for the study design, trial search, data analysis, interpretation of the data, writing of the paper, or the decision to submit for publication.
DISCUSSION
Second-generation antidepressants are more effective than placebo during acute treatment of adults 60 years and older with Major Depression in terms of response and remission defined by depression rating scale scores, but the magnitude of this effect is small and variable. The numbers needed to treat calculated from the RDs by meta-analysis were 13 for response and 20 for remission, implying that for every 100 patients treated 8 would show a response and 5 a remission in excess of placebo treatment. The benefits of drug treatment, however, should be weighed against the rates of adverse events requiring discontinuation. For every two patients who responded to drug treatment, one discontinued prematurely because of adverse effects.
Short trials of 6–8 weeks seem to underestimate outcomes achieved in longer trials. The mean response rates in the 6- and 8-week trials ranged from 35% to 46.5%, the OR for response was 1.22, and the RD was 5%. In the 10- to 12-week trials, response rates ranged from 45.7% to 68.9%, the OR was 1.73, with a RD of 14%. This observation, however, is based on only four trials (five contrasts) of 10–12 weeks duration and four of the five contrasts in the longer trials were with drugs not tested in the shorter trials.
These small drug-placebo differences might raise questions about the relative efficacy of the secondgeneration drugs compared with the older TCAs. A Cochrane meta-analysis of TCA and SSRI comparison trials in older patients found no advantage of TCAs compared with SSRIs in five trials with approximately 400 patients receiving each drug class, although these trials were not placebo-controlled (
25). We note, however, that the number of comparisons is again small.
In the studies reviewed, response rates with placebo varied from 19% to 47%. Randomization to placebo is employed in depression trials to control for spontaneous improvement, treatment expectations, and nonspecific factors that might affect outcomes. Patients in clinical trials are seen for frequent visits and receive education, attention, reassurance, and clinical monitoring—interventions which have therapeutic effects. Clinicians should recognize the important contribution of these therapeutic interventions to overall outcome.
Because evidence for heterogeneity of outcomes was observed in the meta-analysis, the findings should be interpreted by considering not only the overall result, but also the individual study results. Results of one trial (
11) contributed about half the proportion of variance due to heterogeneity. This 8-week trial with escitalopram and fluoxetine was different from the others in that it was performed at a very large number of sites (76 sites) and had the highest placebo response rate, 47%.
Heterogeneity of outcomes could be due to many factors including differences in patient selection, diagnoses, study design, ascertainments of the outcomes, differences among the drugs, and often the sources of heterogeneity cannot be determined. The 10 trials assessed eight antidepressants. Although the ORs for individual drugs varied (
Fig. 2), no single medication or class was superior to another as evidenced by the overlapping ORs. But any interpretation of equivalent efficacy would be based on sparse data.
Patient factors such as age, past history, depression severity, and cognitive function varied in the samples and may influence outcome. As examples, the citalopram trial (
10) included the oldest patients and found no drug-placebo difference in response, but in a post-hoc analysis of that trial, greater drug placebo differences were found among the most severe patients. One trial with a large drug-placebo difference restricted selection to patients with recurrent depression (
14), and 3 of the 4 trials that allowed patients with significant cognitive impairment (i.e., Mini-Mental State Exam scores below 24) had nonsignificant outcomes (
10,
11,
15).
Further examination of these and other potential moderators of response require data that often are not published and would benefit from analyses at the individual patient level. Because overall drug effects seem modest, determination of the factors moderating response is critical to decisions about when to use antidepressants and when to consider other treatments.
We used a fixed-effects model because the trials had similar designs, inclusion criteria, interventions, and outcomes. One way to approach the heterogeneity of outcomes is with a random effects model to allow that the trials may be estimating different, yet related, treatment effects (
26). This more conservative approach produced results similar to the fixed-effects model (OR = 1.38 versus 1.40) but with wider CI (the 95% CI increased from 1.24–1.57 to 1.12–1.69, and a change in the p value from p < 0.001 to p = 0.002).
Although our search excluded trials of medications not marketed in the US, we found no trials of such agents that met our inclusion criteria. Three small placebo-controlled trials in older depressed patients—one with mianserin (
27) and two with nomifensine (
28,
29)—were published between 1982 and 1984, but they did not report ITT response or remission rates using a structured scale. Four other placebo-controlled trials with drugs approved for depression outside the US (moclobemide, fluvoxamine, and reboxetine) (
30–
33) were excluded because of inclusion of inpatients or medically ill populations. No placebo-controlled trials of milnacipram or tianeptine in elderly depressed patients were found. We considered all non-TCA antidepressants as “second-generation” agents, but no trials of MAOIs, trazodone, mirtazapine, or nefazodone were found that met the selection criteria.
This systematic review and meta-analysis contrasts with a recent review of late-life depression studies (
8) that included a broad array of depressive disorders, diagnoses, and patients, and various drugs and psychotherapies. That review included studies of alternative drugs and supplements (e.g., acetyl-L-carnitine, alprazolam, tryptophan), patients with depressive symptoms secondary to specific medical conditions (e.g., stroke, obstructive lung disease, Alzheimer disease), and trials with younger, middle-aged patients, (e.g., mean ages of 60). By comparison, we focused on unipolar, major depression trials and 6 of the 10 trials that we included were not included in that review.
This analysis has limitations. The trials included were typical outpatient clinical trials that excluded patients with unstable medical illness and patients in residential settings, and thus our findings should not be generalized to frail elderly. Most of the patients included in the trials were within the range of 60–80 years of age and thus, the findings may not pertain to those of advanced age. All the trials were sponsored by the manufacturer of one of the drugs studied, and although we performed a systematic review, it is possible that there are other trials that have been performed but not reported. Finally, the number of trials included in the analysis is small.
In summary, second-generation antidepressants seem effective for late-life depression but the magnitude of the effect is small. Duration of treatment may be important in that response rates and the specific effects of antidepressant medication (versus placebo) increase with time. The clinical decision to employ antidepressants will need to weigh these modest benefits during acute treatment with more robust effects on prevention of relapse or recurrence (
34,
35) and potential safety issues such as increased risks of bleeding (
36), hyponatremia (
37), and decreased bone density and fractures (
38). Finally, the heterogeneity of responses suggests substantial variability among individuals and that determination of moderators or predictors of response is especially important in order to identify patients for whom drug treatment is likely to be effective and those not likely to benefit. The effort to test these hypotheses and to examine moderating variables will benefit from analysis of individual patient data from these trials.