Aggression and agitation are common in people with dementia or Alzheimer’s disease. Agitation is generally multidetermined, may wax and wane, is often associated with psychosocial stress, disruption, intercurrent medical conditions, and drug effects, and often can be managed by addressing the underlying conditions, without psychotropic medications (
1,
2). Aggression is associated with similar factors, often fluctuates, and may be a response to perceived threat, but it requires intervention when it causes patients to be a threat to their own or others’ well-being. In clinical practice, a range of psychotropic drugs are prescribed for agitation or aggression, most often antipsychotics, and anxiolytics and antidepressants to a lesser extent (
3). These medications have substantial limitations in terms of efficacy and safety (
4).
Trials of antidepressants have been conducted for treating depression in people with dementia (
5,
6) and, more recently, agitation (
7). Citalopram in particular showed equivalence to antipsychotics for inpatients and outpatients, but with differential adverse events (
8,
9). A recent multicenter randomized placebo-controlled trial (
7) showed efficacy for citalopram in patients with Alzheimer’s disease with agitation, based on clinical global and agitation scale outcomes. As part of the statistical analysis plan of that clinical trial, here we report subgroup analyses that use a univariate method and a two-stage multivariate method (
10) to assess heterogeneity of response and possible predictors of differential responses between citalopram and placebo treatments.
Method
The primary objective of the Citalopram for Agitation in Alzheimer Disease (CitAD) trial was to assess citalopram’s efficacy for agitation or aggression (see the primary report [
7] and the detailed methods report [
11]). Participants had probable Alzheimer’s disease, scores between 5 and 28 on the Mini-Mental State Examination (MMSE) (
12), and clinically significant agitation, for which a physician determined that medication would be appropriate, based on scale ratings of either “very frequently” (once or more per day) or “occurring frequently with moderate or marked severity” (several times per week and difficult to redirect or control) on the agitation/aggression domain of the Neuropsychiatric Inventory (
13). They could not have had a major depressive episode or psychosis requiring antidepressant or antipsychotic treatment. Cholinesterase inhibitors and memantine at stable dosages were allowed. Patients received a psychosocial intervention and were randomly assigned to receive either citalopram (N=94) or placebo (N=92) for 9 weeks. Dosages were titrated to as high as 30 mg/day over 3 weeks, based on response and tolerability. The co-primary outcome measures were the modified Alzheimer’s Disease Cooperative Study–Clinical Global Impression of Change (CGIC) (
14), modified to assess agitation, and the agitation subscale of the Neurobehavioral Rating Scale (
15). The 18-point agitation subscale of the Neurobehavioral Rating Scale consists of the agitation, hostility/uncooperativeness, and disinhibition items from the parent scale, each rated on a 0–6 scale as “not present,” “very mild,” “mild,” “moderate,” “moderately severe,” “severe,” or “extremely severe.” Other outcome measures were scores from the short form of the Cohen-Mansfield Agitation Inventory (
16), the agitation/aggression subscale of the Neuropsychiatric Inventory, the 12-item Neuropsychiatric Inventory, the Alzheimer’s Disease Cooperative Study–Activities of Daily Living scale (
17), and the MMSE, and the use of lorazepam as a rescue medication.
Patients in the citalopram group (78% received 30 mg/day and 15% received 20 mg/day) were significantly improved compared with those in the placebo group on both primary outcome measures. The estimated treatment difference at week 9 on the agitation subscale of the Neurobehavioral Rating Scale (citalopram minus placebo) was −0.93 points (95% CI=−1.80, −0.06; p=0.04). Forty percent of the citalopram group were rated as moderately or markedly improved on the CGIC, compared with 26% of the placebo group, which constitutes an absolute risk difference of 14% and an estimated treatment effect (i.e., the odds ratio of being at or better than a given CGIC category) of 2.13 (95% CI=1.23, 3.69; p=0.01). The citalopram group also showed significant improvement on the short-form Cohen-Mansfield Agitation Inventory, the 12-item Neuropsychiatric Inventory, and caregiver distress scores, but not on the Neuropsychiatric Inventory agitation subscale, the Activities of Daily Living scale, or in use of lorazepam (
7). Worsening of cognition (as measured by the MMSE) and QT interval prolongation were seen in the citalopram group (
7,
18).
The planned subgroup analyses reported here included the five prespecified baseline predictors: residency status (long-term care or outpatient), presence of psychosis (i.e., delusions or hallucinations as rated on the Neuropsychiatric Inventory), and severity of functional impairment (based on the Activities of Daily Living scale), cognitive impairment (based on the MMSE), and agitation (based on the Neurobehavioral Rating Scale agitation subscale). We also selected six additional potential predictors of outcome: age, gender, and use of memantine, lorazepam, trazodone, or cholinesterase inhibitors within 3 weeks of baseline. Each predictor was collapsed into two or three categories. The MMSE was divided into mild (scores ≥21), moderate (scores from 11 to 20), and severe (scores ≤10) cognitive impairment, based on the literature (
19). Other continuous variables were categorized as tertiles.
Subgroup analyses were conducted using univariate and multivariate interaction methods. For the univariate interaction method, the treatment effect across categories of a predictor was compared using logistic regressions for CGIC response and for Neurobehavioral Rating Scale agitation subscale response outcomes (i.e., odds ratio of a 50% reduction from baseline, citalopram compared with placebo). The remaining continuous outcome measures (the Activities of Daily Living scale, the MMSE, and the Cohen-Mansfield Agitation Inventory) were modeled using linear mixed-effects regression models that included a patient-specific random intercept, a visit indicator, a treatment indicator, treatment-by-visit interactions, and baseline outcome. Likelihood ratio tests were used to assess the significance of the treatment-by-covariate interactions in all models.
For the multivariate interaction method, a typical regression that includes multiple interactions relies heavily on model assumptions; we therefore used an exploratory post hoc two-stage approach (
10). Of the 11 baseline covariates, those for which there was more than a threefold change in odds ratio between any two covariate levels were included in two-stage working models.
In the first stage, an index score was calculated for each participant based on the parametric working models using the baseline predictors. Specifically, for each participant, we used his or her baseline covariate values in the models to obtain a predicted response probability for citalopram and a predicted response probability for placebo. The index score was then calculated as the difference between the predicted response probabilities from the citalopram and placebo models and represents the predicted treatment effect for that participant based on the working models (see the data supplement that accompanies the online edition of this article). Participants with the same index score can be thought of as a subset with that combination of covariate values. The value of the index score, however, is not necessarily an accurate estimate of the true treatment effect for that subset if the working models are incorrect.
The second stage of the analysis was conducted to allow for the possibility that the working models might be incorrect. In this stage, participants were sorted by index score and grouped into deciles; the treatment effect for each group was estimated nonparametrically as the difference between the empirical response-probability for citalopram minus placebo for that group. Confidence intervals for the subgroup treatment effect estimates were calculated by bootstrapping with a correction for multiple comparisons (see the data supplement).
Results
The covariates that were continuous measures, including age, Activities of Daily Living scale, and Neurobehavioral Rating Scale agitation subscale, were divided into tertiles and were roughly balanced between randomized arms. Some of the categorical covariates were not evenly distributed in the trials sample: 7% lived in long-term care, 48% had delusions or hallucinations, 54% were male, and 69%, 42%, 10%, and 8%, respectively, had prescriptions for cholinesterase inhibitors, memantine, trazodone, and lorazepam (
Table 1).
Based on the univariate method, the effect of citalopram compared with placebo on CGIC response did not vary significantly between levels of any of the five planned and the six post hoc predictors except for residency status: patients living at home or with relatives experienced a significantly greater citalopram effect compared with placebo than patients in long-term care (likelihood ratio=2.28, 95% CI=1.14, 4.57, and likelihood ratio=0.11, 95% CI=0.01, 1.78, respectively; p=0.025) (
Figure 1). Moreover, using the Neurobehavioral Rating Scale agitation subscale, the Activities of Daily Living scale, the Cohen-Mansfield Agitation Inventory, and the MMSE as response outcomes resulted in no significant interactions on any of the 11 candidate predictors of response except for age with the Cohen-Mansfield Agitation Inventory (data not shown).
For the multivariate method, we assessed heterogeneity of the treatment effect using two stages. For the CGIC, five of the 11 baseline covariates met the selection criteria—residency status, MMSE, Neurobehavioral Rating Scale agitation subscale, age, and treatment with lorazepam—and were used to compute the index score (see the online
data supplement). The variation of the treatment effect across groups of different index scores is shown in
Figure 2. The figure demonstrates as well that there is a subgroup of patients, those with index scores below the 62nd percentile, for whom there is 95% confidence that the treatment effect is at least as large as the estimated average effect for all patients.
This can be interpreted clinically by viewing the distribution of covariates for participants with different index score categories compared with the distribution of all patients (
Figure 3 and
Table 2).
Figure 3 shows the nonparametrically estimated effect for patient groups combined by the deciles of the index score. A likelihood ratio test comparing the 10 deciles to a hypothesis that there is no heterogeneity suggested that the treatment effect truly varies by index score subgroup (p=0.002). This 10-mean model can be further reduced to a three-mean model of negative or placebo responders (decile 1), marginal or low responders (deciles 2–8), and high responders (deciles 9 and 10) which was still a significant improvement over the common mean model, but was not different from the 10-mean model.
Patients with the largest predicted treatment effects favoring citalopram were more likely to be living outside long-term care facilities, to have milder cognitive impairment (MMSE score range, 21–28), to have a middle level of baseline agitation (Neurobehavioral Rating Scale agitation subscale score range, 6–8), to be within the middle age range of the trial population (76–82 years), and not to have been using lorazepam at baseline. By comparison, patients with the largest predicted treatment effect favoring placebo were more likely to be living in long-term care, to have moderate to severe cognitive impairment (MMSE scores ≤20), to have more severe baseline agitation (Neurobehavioral Rating Scale agitation subscale score range, 9–14), to be within the youngest (47–75 years) or oldest age range (83–92), and to be treated with lorazepam. When using the above two-stage method for the Neurobehavioral Rating Scale agitation subscale response and the secondary outcomes, there was no analogous evidence of heterogeneity across index scores.
Discussion
Given the clinical heterogeneity of agitation and aggression in patients with Alzheimer’s disease, it is likely that any effective therapy would benefit only a subset of patients with these behaviors. In our planned, protocol-specified analysis of this randomized controlled trial, we found no individual covariates that predicted positive outcomes with citalopram but did find that residence in long-term care was associated with a negative outcome with citalopram.
Thus, we hypothesized incorrectly that subgroups defined by single covariate predictors would respond differentially to citalopram. This failure could be a result of small sample size, imbalances in the actual outcomes associated with the covariates, and the overall small statistical effect sizes of the clinical outcomes of the trial. For example, only 186 participants were randomly assigned to treatment, 7% were living in nursing homes, 7% were being treated with lorazepam as a rescue medication at baseline, and the statistically significant co-primary outcomes were a relatively small risk difference (0.136) for the CGIC and less than a 1-point difference on the Neurobehavioral Rating Scale agitation subscale between citalopram and placebo. Thus, differences in the single covariate-defined subgroups would have been difficult to discern or potentially unreliable at identifying study participants who may have benefited from citalopram.
The multivariate analysis, on the other hand, confirmed the main outcome of the trial, namely, that there was an average treatment effect or difference in response probability with citalopram treatment, that is, citalopram improved CGIC response compared with placebo by an average 13.6% difference in response probability (citalopram minus placebo). Secondly, and importantly, this analysis identified subgroups for which the treatment effects were larger or smaller than the average, that is, for which the treatment effect for citalopram compared with placebo was not homogeneous across participants.
Indeed, two groups (about 20% of the sample) showed particularly large effects, with differences of approximately 60%−70% in response probabilities for citalopram compared with placebo; one group (about 10% of the sample) showed a large negative effect, with a difference in response probability of approximately 70% favoring placebo; the other groups (about 70%) showed essentially trivial effects. The clinical characteristics of the two subgroups for which citalopram was most effective included outpatient status, milder cognitive impairment, “moderate” to “moderately severe” agitation scores on the Neurobehavioral Rating Scale agitation subscale (as compared with “mild” to “moderate” for the lowest tertile and “moderately severe” to “severe” for the highest tertile), and being neither in the youngest nor the oldest age group.
The characteristics of the subgroup for which assignment to placebo was most effective were residence in long-term care, being in the oldest or youngest age groups (47–75 years or 83–92 years), having moderate to severe cognitive impairment (an MMSE score ≤20), having “moderately severe” to “severe” agitation on the Neurobehavioral Rating Scale agitation subscale, and receiving treatment with lorazepam.
The identification of a subgroup that had markedly better outcomes on placebo suggests that patients with more severe agitation and cognitive impairment may be harmed by citalopram. Thus, the groups most responsive to citalopram had more mild cognitive impairment and less severe agitation at baseline.
Clinical trials usually can address only one clinical hypothesis with reasonable statistical power. The results of the multivariable post hoc analysis reported here highlight the limitations of limiting post hoc analyses to univariable methods, a characteristic of many trials, including this one. Allowing the inclusion of patients with a broad range of agitation and cognitive impairment may have led to outcomes that are difficult to apply to clinical practice: On average, the patients who were enrolled in CitAD benefited from citalopram, but some benefited more than others in a manner that was not random and not predicted. Here, our two-stage multivariate analysis suggested clinical implications and a way forward (see below). Limitations of the trial, analysis, and statistical methods included the small sample size and our knowledge of the results of the prespecified univariate analyses, because this informed the subsequent multivariate analysis. Importantly, the predictive covariates for composing the subgroups were chosen primarily because they were relatively stable baseline characteristics and because there were limited candidate predictors from which to choose. Since covariates that are predictive for treatment differences (i.e., in our analysis those with a threefold change in odds ratio) may both interact with the treatment assignment and relate to the outcome in either treatment group, it is possible that our model is misspecified or that interactions are not well defined, factors limiting the ability to make inferences about treatment effects. Ultimately, however, the validity of the interpretation of the results derives from the empirical effect sizes in the subgroups and does not depend on the selection criteria.
Although a statistically significant treatment interaction was detected only for residency status using the univariate method, the direction of the (nonsignificant) interactions was the same for both methods for all the subgroups included in the multivariate model. However, the empirical basis of the proposed multivariate approach, as compared with other, more model-dependent approaches, make the interpretation of the results more reliable.
Notwithstanding these limitations, the results support heterogeneity of clinical response to citalopram—specifically that outpatients with Alzheimer’s disease without severe agitation, who do not have major depression or psychosis for which antipsychotics may be required, may benefit from citalopram compared with placebo. Future trials of citalopram or similar drugs used for aggression or agitation might account for this heterogeneity by stratifying or including or excluding participants based on levels of cognitive impairment and severity of aggression or agitation. Although the small numbers within the decile group may not broadly inform the use of citalopram in patients in long-term care or those treated with lorazepam, it may nonetheless be prudent to avoid citalopram under these circumstances, and especially when there are other options. The finding that patients with more severe agitation or aggression responded better with placebo or poorly with citalopram raises further caution. Given these results, along with the established associations of citalopram with delayed cardiac repolarization (
18,
20) and with cognitive impairment (
7), and given safety concerns of antidepressants for depressed elderly patients (
21) and the FDA’s recommendation to avoid citalopram dosages over 20 mg/day in patients over age 60, citalopram may have limited use for treating agitation in Alzheimer’s disease.
In sum, these analyses demonstrate that citalopram’s effect at 30 mg/day is heterogeneous, with maximal salutary effects for patients with relatively milder cognitive impairment and moderate agitation, and is without effect or potentially harmful for patients with more moderate to severe cognitive impairment and more severe agitation.