There is a disturbing trend in studies of new drugs for psychiatric disorders. A number of innovative and promising drugs that appear to show effectiveness in phase II trials fail in phase III. In some cases, the study sponsor may discontinue further development of the drug based on the results. It is possible that the phase II results were misleading and that the program should have never moved to phase III because of a lack of efficacy or adverse effects. This commentary addresses an alternative explanation; that is, effective drugs are failing because of systematic problems in the methods for carrying out large, multinational phase III trials. If this is the case, it is vital for our field to address this problem because medications that can improve the lives of people with serious mental illness may never reach the clinic. In addition, the failure of these trials appears to have convinced some companies to limit investment in the CNS area and in psychiatry in particular, substantially decreasing the resources available for drug development.
In phase II drug trials, the agent is administered to patients with a particular disease or condition. These studies usually include hundreds of patients and are essential for establishing the agent’s effect size, dose range, and general safety and tolerability profile. Phase III trials are much larger and can involve 300 to 3,000 participants. These studies can confirm the phase II findings and provide additional safety information. Recently, these trials have been global and have included as many as 200 different research sites. An important observation from a number of recently completed phase III trials is that the active drug resulted in similar symptom reduction as in the phase II trials. However, the placebo response was substantially greater, and the active drug did not separate from placebo. Another characteristic of these studies is that they often have aggressive timelines. Sites and contract research organizations are reimbursed based on the number of subjects who are randomized, and there is often pressure to meet deadlines.
Drug Development Fundamentals: Preparing for Success in Phase III
Drug development aims to achieve evidence-based milestones, progressing in a disciplined fashion from discovery through each phase of clinical development. The foundation of rational drug development is the informed design of a potential medicine based on an understanding of the underlying biology. Translational research establishes the relevance of a biological target for a particular condition prior to studies in humans. Following initial phase I safety trials, pharmacology is confirmed in proof-of-mechanism studies that ensure the intervention is active as designed. Phase IIa proof-of-concept trials then establish efficacy in the intended population and are typically followed by phase IIb dose-finding studies. It is imperative that response-exposure relationships for key efficacy variables be thoroughly explored in phase II in order to optimize dosing regimens in confirmatory trials. Identifying patients who are more or less likely to respond to a drug is complex but can also be addressed in phase II. For example, subjects can be identified post hoc who show a vigorous response to an active drug and in a follow-up trial, rerandomized to active drug or placebo to confirm their responsiveness. The phase III trial could then be enriched for patients with similar characteristics. There is also interest in identifying patients, possibly by genotyping, who may be particularly likely to respond to placebo so these patients can be avoided in future trials. Sponsors might pay a penalty for these methods in that a drug could be approved with a more restrictive label, but one which identifies a subpopulation more likely to benefit. Advancing into phase III trials, the most expensive and challenging to execute, without having rigorously managed the risks associated with each earlier phase of development, can lead to costly failures.
Assuming the completion of a rigorous phase I development program, failure of a phase III study could be related to an inadequate phase II program. For example, an effect size estimate from a phase II study may not be replicated in phase III. This could be driven by a marked increase in variance during phase III from a number of factors, including less experienced sites with less experienced clinical raters; increases in population heterogeneity as studies are carried out in more languages and different cultures; and inclusion of sites contributing very few subjects. In addition, there is evidence from trials in mania and depression that the ability to detect drug-placebo differences actually decreases as the number of subjects and the number of sites increase (
1,
2). Naturally, if the primary efficacy outcome measure or analysis in phase III is altered from that used in phase II, assumptions regarding extrapolation to phase III may not hold, with a resultant attenuation of effect size.
Phase II programs may not provide adequate information regarding dose-response relationships. By attempting to accelerate development and/or to reduce the overall cost, many programs overburden early proof-of-concept trials and minimize or even skip phase IIb. This has the potential for bringing forward less than optimal dosing regimens into phase III. Thus, failure may be due to suboptimal dosing. This has been evaluated in post hoc analyses of phase III trial data, demonstrating exposure-response relationships predictive of success had the majority of patients reached appropriate exposure levels.
Issues in Trial Logistics
At the 2017 meeting of the International Society for CNS Clinical Trials and Methodology, study sponsors used their databases to identify factors in trial implementation that may have led to negative results. In one large trial that focused on negative symptoms in schizophrenia, a number of factors were identified that appeared to be associated with a lack of separation between drug and placebo. These included entering subjects on multiple antipsychotics, including subjects who had inadequate drug exposure based on plasma levels, and including individuals who had substantial changes in negative symptoms between screening and baseline. A trial from another company also found that dropping subjects who had substantial changes in symptoms prior to randomization decreased the placebo response. Among study sponsors, there was agreement that changes late in trials to increase the number of subjects tended to lead to an increase in the placebo response. These issues in implementation may occur because the financial incentives for investigators and contract research organizations, and even perhaps within the sponsoring companies, encourage enrollment with ambitious timelines. This may reinforce a tendency to focus on quantity rather than quality and to relax enforcement of entry criteria later in studies.
These observations suggest a number of approaches for improving the ability of phase III trials to detect a signal. Consider first the different possible sources of the problem in the conduct of large phase III clinical trials. These include the patients who are enrolled, the investigators and contract research organizations that conduct these trials, and the pharmaceutical companies that provide overall management for these trials. We will focus here on the following four areas to look for remedies for trying to improve the conduct and effectiveness of phase III trials, assuming that phase II preparation has been adequate: patient selection, severity rating instruments and their use, study design and controlling for the effects of being in a trial, and sponsor policies and procedures regarding trial conduct.
Patient selection.
Outside adjudication of patients selected by sites to ensure that they meet inclusion and exclusion criteria has been proposed as a way of independently addressing this potential problem (
3). Some studies have tried blinding site investigators to threshold severity criteria to avoid score inflation. Registries have been set up to try to identify and exclude “fraudulent” patients (i.e., individuals who are not even real patients who try to gain entry purely for financial gain).
Rating instruments.
Exploration of severity rating instruments and the use of these instruments might be useful. There is a tendency to stay with instruments that are familiar, but these are not necessarily the best for a particular study. Studies that are using endpoints such as negative symptoms or cognition batteries should explore methods for improving measurement. Use of more objective, biologically based measures should be explored. Patient self-report needs to be explored as perhaps a better approach for severity rating in at least some patient populations. In addition, for instruments that do require application by raters, how this is done is critical, and concerns about practices by site investigators have led to technologies to try to improve the quality of severity ratings, including video ratings or centralized raters.
Study design.
There has long been concern that the decades-long practice of single-blind placebo run-ins as a way of identifying and excluding placebo responders is ineffective (
4), and design changes have been developed to try to address this problem more effectively. One approach has been to have essentially a double-blind placebo run-in in what is often referred to as a delayed start design; however, this appears not to have been a successful design. A more nuanced version of the double-blind placebo run-in is the sequential parallel comparison design, and this has had some success in reducing placebo response (
5). Another approach that might help in reducing placebo response would be to decrease the expectation of the investigator and the subject that an individual has been assigned to an active drug. This could be accomplished by reducing the number of arms in a study or by increasing the proportion of subjects receiving a placebo (
6).
Another observation that is commonly seen in failed studies is a dramatic improvement in symptoms across all treatment arms once patients are randomized, suggesting that being in a study itself may have a powerful therapeutic effect. It may be useful to try to simplify studies as much as possible, partly as a way of reducing interactions with study staff that may be one explanation for this improvement. Making sure that all study procedures, staff interactions, and assessments are included in any run-in period prior to actual randomization might help to reduce this effect.
Sponsor procedures and policies.
Finally, one of the problems contributing to study failures might be company policy and commitment to study success. As a program moves to phase III, studies usually involve many more sites, and the quality of sites may diminish. Such factors themselves may contribute to failure (
1). Policy should focus more on quality of recruitment and site conduct than on volume of patients enrolled, which may result in fewer but more productive sites. Such policy would mean ongoing monitoring of site and contract research organization conduct and dropping of sites that do not perform. It might also involve more careful monitoring of patients to make sure they are actually taking their assigned treatment (
7). Providing incentives for quality performance, rather than the number of subjects and study visits, can be included in contracts. The study coordinator should have a personal commitment to seeing the study through and ensuring its successful conduct.
We recognize that these suggestions are currently untested. It will be important going forward to systematically collect data on the “efficacy” of these suggested approaches so that companies have some basis for deciding whether to add a particular approach to their programs. Systematically looking at the efficacy of a particular approach could be incorporated into a typical registration trial by randomizing addition of the approach to sites within a study and by testing the value of the addition on signal detection as an exploratory hypothesis, without compromising the overall goal of the study. In the meantime, the current model for designing and implementing phase III programs is failing, and this has consequences for both industry and our patients.