In this issue, Canuso et al.
(1) present the results of a not atypical psychiatric medication clinical trial: a longitudinal study in which approximately 400 subjects were randomly assigned to one of three treatments (one of two antipsychotic drugs or placebo) and assessed repeatedly over time. Figure 2 of their article contained a plot of observed cases over time, with separate analyses at each time point using only the observations available at each time point. A sudden divergence in the lines representing the two drug treatment groups at the end of 14 days of monotherapy led to concern about the effect of dropouts, which had also increased at this point, on the appearance of the lines and the results of the analyses comparing the two treatments. This editorial, by two of the
Journal’s statistical reviewers, discusses how the analysis of clinical trials results should address dropout during the trial.
With no dropout, there are several reasonable alternatives for longitudinal analysis for numerical response variables (e.g., total score on the Positive and Negative Syndrome Scale), including various fixed-effects or random-effects (mixed) general linear models and approaches such as least squares, maximum likelihood, and generalized estimating equations. Gertrude Cox, a statistical theorist, famously remarked that the best thing to do about missing data is not to have any. Unfortunately, this is usually unrealistic. However, one can and should be able to design clinical trials so that the amount of missing data is small. If a study cannot be completed without avoiding all but a small amount of missing data, the investigators should consider redesigning the study.
Missing data due to dropout occur for two major classes of reasons
: noninformative reasons (essentially random, e.g., a study subject moves to another city with his or her family) and
informative reasons (essentially nonrandom, and possibly related to treatment assignment or outcomes, e.g., the study subject suffers a relapse or develops intolerable side effects and is removed from the trial by the investigators)
(2,
3) .
If the dropout data are missing completely at random and are thus noninformative, they can be ignored, and the only issue is loss of efficiency and power with fewer subjects. Completely omitting the data from dropouts still provides unbiased estimates of the effects of interest and unbiased tests of the hypotheses of interest. When the data are not assured to be noninformatively missing, however, the missingness of the data may contain information about the efficacy or tolerability of the treatment, and we would be unwise to ignore it. Not using the data from the dropouts limits the generalizability of the results and may introduce bias. If the data are indeed informatively missing, all choices are problematic, because for any analysis, we have to make assumptions that may or may not be true. In addition, when the number of dropouts is not small, all methods are flawed, but some are more flawed than others
(3) .
Because one can never be fully certain whether data are noninformatively or informatively missing, it is considered good practice not to ignore dropouts. Last observation carried forward (LOCF) is a commonly used way of imputing data with dropouts, but common historically by accident and computational ease. LOCF uses the last value observed before dropout, regardless of when it occurred. The U.S. Food and Drug Administration’s (FDA’s) Division of Psychiatric Drug Products and its forerunners have traditionally viewed LOCF as the preferred method of analysis, considering it likely (but not certain) to be conservative and clearly better than using observed cases, where only the data observed are used.
When the FDA seemed to require LOCF as the primary method of analysis, this approach became the preferred one in psychiatric journals. Statisticians, researchers, and reviewers were comfortable with it—and besides, there were not many alternatives. LOCF analysis, however, answers a not particularly useful or interesting clinical question: “Regardless of how long the drug is taken, and of the proportion of the subjects in the study who discontinued, or when or why they discontinued, are the final scores on some clinical rating scale lower in subjects taking one drug compared with another?”
Newer methods for longitudinal analysis are increasingly available, because fast computers and statistical packages capable of using these new methods have become available. These methods have the advantage of using all the data from all time points available for all subjects, and not making the same sorts of unrealistic assumptions as LOCF about the relationships among repeated measurements taken on the same subjects. Since repeated measures are taken from the same subjects, these observations are not independent of each other. The models allow for nonconstant correlations among the time points. In contrast to LOCF, which assumes that subjects’ responses after dropout would remain constant, these models can allow for estimation of subjects’ responses after dropout using data prior to the dropout. Thus, they use all the data and can reduce bias due to dropout. For example, we can fit models in which each subject is allowed his or her own regression line (or curve) over time, so that when a subject drops out, his or her curve is projected to the end of the study, rather than simply holding at whatever the last value was, as in LOCF. We can allow different time-course slopes in each treatment. Because of the flexibility allowed, careful consideration should be given to planning what model to fit, what hypotheses to test, and how to describe the interrelationship of values over time. In the presence of missing data, mixed models estimate parameters and test hypotheses about them but do not impute missing values, whereas LOCF imputes missing values by carrying values forward.
These models have limitations: they rely on theory that may apply only to large samples, and they really do not address the issue of nonrandom, informatively missing data. They are still preferable to LOCF in general and when the percentage of missing data is small (say, <20% dropouts), because they do not make the unrealistic assumption that subjects who drop out would continue responding precisely as they did at the time of dropout. Work has been done toward extending these types of models for use when the data are nonrandom and informatively missing. These approaches include pattern mixture models, in which the pattern of the dropouts is included in the model, and shared parameter models, in which the assumptions made are incorporated into the model
(4,
5) . Such models allow the use of sensitivity analysis to investigate different assumptions.
In the clinical trial reported on by Canuso et al., completion rates for the primary designated endpoint at 14 days were relatively high—above 80% for all three treatments. The analysis with which the manuscript was submitted used observed cases at each time point and LOCF at the 42-day endpoint, when dropouts had increased. At the request of the editor, the investigators checked for nonrandom reasons for dropouts, such as side effects or changes in clinical status, and reported that there did not appear to be any. A reviewer felt that because there were dropouts, the optimal method for analyzing the data would be to use a mixed model, in which all the data, not just the first and last (or observed) observations would contribute to mean estimates. Canuso and colleagues argued that they had prespecified LOCF as their analytic method when the trial was registered. Preregistration of clinical trials, including the analytic method, is now required by most journals, including the Journal, to prevent sponsoring companies from seeking a more favorable analysis after the data have been acquired. The LOCF analysis therefore appears in the print version of this article and the mixed model analysis appears in an online supplement. Because the number of dropouts was relatively low, results are similar between the two analyses.
The FDA has been more flexible recently with respect to analytical methods for clinical trials in psychiatry. For longitudinal trials, it has accepted the use of mixed models, and mixed models are being used in the analysis of data for new drug applications. The mixed models the FDA generally allows are those that correspond closely to traditional repeated-measures models and those that conservatively allow correlations of the repeated measures to be estimated without constraint. Siddiqui et al.
(6) performed simulations and examined 48 clinical trial data sets from 25 new drug applications submitted to the FDA and concluded that mixed models are superior to LOCF for controlling type I error rates and minimizing biases.
In summary, mixed models are preferable to LOCF when the number of subjects is sufficiently large and the proportion of missing data is small enough. When either of those conditions does not hold, or when the data are not missing for a particular reason related to the study, neither mixed models nor LOCF work particularly well, and one should rethink the analytical plan and obtain statistical consultation.