To the Editor: Dr. Carroll’s essential complaint seems to be that there was no reason to perform this trial but to “create positive publicity” for “niche” marketing. He elaborates with sarcasm and hyperbole that 1) statistical significance was achieved as a product of an excessively large group size; 2) the effects of sertraline were “trivial,” not clinically significant; 3) we used—he says—“newspeak” and p values “disembodied” from the underlying statistics in order to confuse readers; and 4) we acted unethically in concealing data and in reporting results. These assertions are ill-informed and without foundation, and we reject them.
This trial makes important contributions to clinical pharmacological research in late-life depression, it provides relevant information about the likely effects of selective serotonin reuptake inhibitors (SSRIs) that clinicians can evaluate, and there are no particular controversies to it.
The trial was not “oversized” and not planned to reveal “trivial” differences. As we described, the determination of group size was based on the results of a previous large placebo-controlled SSRI trial in late-life depression
(1), the expectation that outcomes in clinically heterogeneous elderly depressed populations with extensive medical comorbidity would be themselves heterogeneous and modest on average, and the ability to assess potential moderators such as melancholia or anxiety. With that exception, other placebo-controlled trials in late-life depression have been underpowered and undersized. The consequences of underpowered trials are that they tend to yield noninformative results and type II errors. Conversely, when results are statistically significant, it is because the effect sizes are implausibly large. In some instances, results from smaller trials have not been published simply because they are negative. Most experts would consider an adequately powered trial of typical clinical patients and outcomes generalizable enough to inform clinical practice as a distinct strength and not a “scientific failure.”
Contrary to his assertions, the statistics and outcomes in this report are clearly described and understandable. No “disembodied p values” were reported; every p value was explicitly connected to an outcome parameter and a statistical test. Any reader could assess the baseline characteristics of the population and the magnitudes of differences and calculate effect sizes of outcomes—just as Dr. Carroll did himself. Moreover, if the trial had been underpowered and undersized, he would not have been able to calculate an interpretable number-needed-to-treat statistic because the confidence interval (CI) would have been so broad as to be uninformative.
It is inappropriate and misleading for Dr. Carroll to compare this geriatric depression outpatient trial to the earliest imipramine trials performed around 1960 in younger adults (Klerman and Cole, 1965) in order to support his assertion that sertraline has a “trivial” effect. These trials, landmarks as they were half a century ago, were seriously deficient in nearly all areas. They used inexplicit diagnostic and inclusion criteria (e.g., mixing inpatients and outpatients, psychotic and neurotic depression, schizophrenia and mania) and methods for dosing and maintaining the blind or placebo control (e.g., many used atropine and thiopental as “placebos”). Outcomes assessments were idiosyncratic, and dropouts were not accounted for; most were so small, averaging about 60 to 70 patients, that they were not statistically significant individually.
Subsequent antidepressant trials, those from the 1980s and 1990s that used modern diagnostic criteria, rigorous methods, and specified outcomes, and modern evidence-based reviews based on these trials (2) demonstrated a relative benefit of antidepressant response over placebo of 1.6 (95% CI=1.5–1.7) in primarily young and middle-age adults. By comparison, we found a relative benefit for sertraline of 1.4 (95% CI=1.1–1.7). This effect is hardly trivial. Similarly, although number-needed-to-treat statistics from these studies are larger than what Dr. Carroll calculated, they are not statistically significantly so. The relative benefit (or relative risk) is an effect size measure that accounts for placebo response, something that a number-needed-to-treat statistic cannot (Laupacis et al., 1988).
The relevant comparison to make, however, is to the few other placebo-controlled antidepressant trials in late-life depression. Here, the relative benefit is 1.4 (95% CI=1.2–1.6)
(2), nearly identical to our finding. We discussed that the effects of sertraline were modest, nearly identical to a similarly sized trial of fluoxetine
(1) and suggested that the two trials probably represent best estimates of the treatment effects of SSRIs in outpatients with late-life depression. We submit that this trial is informative of what likely treatment effects are in elderly outpatients over the short term and, unlike some research, will be more enduring and of practical clinical consequence.
Dr. Carroll goes on to fault us for not providing—or worse—“withholding” or “concealing” what he calls “remission” data, presumably based on cutoff scores on outcome instruments, in order to best “spin” the results. The use of such cutoff scores on continuous or ordinal data is clearly unsatisfactory, especially in elderly groups, where there are substantial somatic and residual depression-like symptoms among both depressed and nondepressed individuals
(3,
4). In fact, we used standard definitions of a clinically meaningful response, a 50% reduction in baseline Hamilton depression scale scores and, separately, a Clinical Global Impression Scale (CGI) improvement score of 1 or 2 (i.e., markedly or moderately improved). Moreover, we reported that the CGI response of 1 or 2 had to be sustained throughout the remainder of the trial.
Nevertheless, at his request, we calculated “remission” rates, defined as an endpoint Hamilton depression rating scale score ≤10 and a CGI severity score of 1 or 2 (borderline ill or not ill at all). Remission rates on the Hamilton depression rating scale were 34.6% versus 26.6% (Cochran-Mantel-Haenszel χ2=5.61, df=1, p<0.02), and remission rates on the CGI severity scale were 32.8% versus 22.8% (Cochran-Mantel-Haenszel χ2=9.43, df=1, p=0.002), respectively, for sertraline versus placebo. The risk difference, or number-needed-to-treat statistic, and the relative benefit of 1.44 are virtually identical, and the absolute rates are similar to the categorical responses we reported for the Hamilton depression rating scale score (35% versus 26%) and the CGI scale score improvement (45% versus 35%).
Abstracts do not substitute for complete reports and do not contain all results. Dr. Carroll would have put quality-of-life scores in the abstract, arguing that most readers would read only the abstract, and says that we should highlight here that patients could not appreciate any effect. He does not similarly fault us for omitting from the abstract the patients’ self-assessed global impression of improvement, which strongly favored sertraline. Contrary to his assertion, the patients, in fact, endorsed their own improvements and with an effect size that was larger than the clinicians’ assessments.
In sum, we reject Dr. Carroll’s assertion that we put aside scientific and public health considerations to write an article under corporate influence to gain a marketing niche. Contrary to his assertion, we presented the whole Cheshire cat: face, ears, and tail. We regret that Dr. Carroll cannot offer his points more collegially or professionally.