Results
The search identified 54 417 citations, including 22 074 unique reports, and 2827 full-text articles were retrieved after the exclusion of 19 247 reports on the basis of their titles and abstracts. We screened these 2827 full-text articles and included 550 reports from 402 studies with 53 463 participants (appendix pp 62–154). The sample had the following characteristics: mean age was 37·40 years (SD 5·96), 29 949 (56·02%) participants were male and 23 514 (43·98%) female, and mean illness duration was 11·90 years (SD 5·19; appendix pp 282–84). We excluded studies with high risk of bias for randomisation and allocation, but methods for sequence generation and allocation concealment were often not described in detail and, therefore, were coded as unclear (appendix pp 155–76). The percentage of studies with high, unclear, and low risk of bias for the individual items was: 0%, 73·1%, and 26·9% for randomisation, 0%, 78·4%, and 21·6% for allocation concealment, 10·7%, 38·6%, and 50·7% for blinding of patients and personnel, 13·9%, 34·8%, and 51·3% for rater blinding, 23·9%, 35·6%, and 40·5% for missing outcomes, 27·4%, 20·9%, and 51·7% for selective reporting, and 5·7%, 11·9%, and 82·3% for other biases. The overall risk of bias was rated as high for 92 (23%) studies.
218 (54%) studies with 40 815 (76%) participants presented usable results for change in overall symptoms (
figure 1). 26 (81%) of 32 antipsychotics were associated with significant improvement in symptoms compared with placebo (
figure 2A). The SMDs for drugs associated with significant improvement ranged between –0·89 (95% credible interval [CrI] –1·08 to –0·71) for clozapine to –0·26 (–0·39 to –0·12) for brexpiprazole. Clozapine, amisulpride, zotepine, olanzapine, and risperidone reduced overall symptoms significantly more than many other drugs (
figure 3). Most differences between the remaining drugs were small or very uncertain.
Secondary efficacy outcomes were reported less frequently especially for older drugs including clozapine (appendix pp 200–21). 117 studies (29%) with 31 179 participants (58%) presented results usable for reduction of positive symptoms (21 antipsychotics). The SMDs for the 17 (81%) drugs that significantly reduced positive symptoms compared with placebo ranged between –0·69 (95% CrI –0·86 to –0·52) for amisulpride to –0·17 (–0·31 to –0·04) for brexpiprazole (
figure 2B). Amisulpride, risperidone, olanzapine, paliperidone, and haloperidol were significantly more effective than many other drugs (appendix pp 200–21).
132 studies (33%) with 32 015 (60%) participants reported usable results for negative symptoms (21 antipsychotics). The SMDs for the 18 (86%) antipsychotics that significantly reduced negative symptoms compared with placebo ranged between –0·62 (95% CrI –0·84 to –0·39) for clozapine to –0·22 (–0·33 to –0·11) for iloperidone (
figure 2C). Clozapine, amisulpride, olanzapine, and, to a lesser extent, zotepine and risperidone reduced negative symptoms significantly more than many other drugs. Differences between the remaining drugs were uncertain (appendix pp 200–21).
89 studies (22%) with 19 683 participants (37%) reported usable results for depressive symptoms (28 antipsychotics). The SMDs for the 14 (50%) drugs that significantly reduced depressive symptoms compared with placebo ranged between –0·90 (95% CrI –1·36 to –0·44) for sulpiride and –0·16 (–0·29 to –0·03) for brexpiprazole (
figure 2D). Sulpiride, clozapine, amisulpride, and olanzapine were associated with significantly more reduction of depressive symptoms compared with many other drugs (appendix pp 200–21), but CrIs were wide.
Only ten studies (3%) with 3341 participants (6%) reported usable quality-of-life data (eight antipsychotics). Because 50% of the network loops were inconsistent, we did a pairwise meta-analysis. Compared with placebo, five antipsychotics significantly improved quality of life, with SMDs ranging from –0·49 (95% CI –0·72 to –0·26) for aripiprazole to –0·18 (–0·34 to –0·02) for paliperidone (appendix p 222).
16 studies (4%) with 4370 participants (8%) presented usable results for social functioning (12 antipsychotics). Based on a small number of studies, thioridazine, olanzapine, paliperidone, quetiapine, lurasidone, and brexpiprazole were associated with significant improvement in social functioning compared with placebo with a SMD range from –0·69 (95% CrI –1·24 to –0·14) for thioridazine to –0·25 (–0·38 to –0·12) for brexpiprazole (
figure 2E).
192 studies with 35 115 participants reported study-defined response rates, using very different cutoffs (appendix p 278). 29 (94%) of 31 antipsychotics had significantly higher response rates compared with placebo, with risk ratios ranging from 2·16 (95% CrI 1·53–3·55) for thioridazine to 1·11 (1·01–1·19) for brexpiprazole (appendix p 278).
226 (56%) studies reported all-cause discontinuation rates for 42 672 (80%) participants (32 antipsychotics). Risk ratios for the 20 drugs (63%) that significantly lowered discontinuation rates compared with placebo ranged from 0·52 (95% CrI 0·12–0·95) for clopenthixol to 0·90 (0·85–0·95) for haloperidol (
figure 2F). When examining discontinuation due to inefficacy we found comparable results as for the primary outcome overall change in symptoms (appendix p 279).
116 studies (29%) with 28 317 (53%) participants presented usable results for weight gain. 12 (46%) of 26 antipsychotics caused significantly more weight gain than placebo with mean differences ranging from 0·54 kg (95% CrI 0·15–0·95) for haloperidol to 3·21 kg (2·10–4·31) for zotepine (
figure 4A). Zotepine, olanzapine, and sertindole produced significantly more weight gain than most other drugs (appendix pp 200–21). The hierarchy for patients with at least 7% weight gain was similar, confirming the findings (appendix p 280).
136 studies (34%) with 24 911 (47%) participants reported use of antiparkinson medication. Risk ratios for the 21 (66%) of 32 antipsychotics that were associated with significantly increased use of antiparkinson medication compared with placebo ranged from 1·61 (95% CrI 1·17–2·10) for paliperidone to 6·14 (4·81–6·55) for pimozide (
figure 4B). The following drugs were significantly better than haloperidol starting with the best: clozapine, perazine, sertindole, placebo, olanzapine, quetiapine, asenapine, aripiprazole, thioridazine, amisulpride, iloperidone, brexpiprazole, paliperidone, ziprasidone, risperidone, lurasidone, zotepine, and chlorpromazine (appendix pp 200–21). 116 studies (29%) with 25 783 (48%) participants reported results for akathisia. The hierarchy was similar to use of antiparkinson medication, with significant risk ratios for 20 (67%) of 30 drugs ranging from 1·95 (95% CrI 1·30–2·74) for aripiprazole to 23·81 (7·41–142·86) for zuclopenthixol (
figure 4C).
90 studies (22%) with 21 569 participants (40%) reported usable results for prolactin. Olanzapine, asenapine, lurasidone, sertindole, haloperidol, amisulpride, risperidone, and paliperidone were associated with significantly elevated prolactin levels (mean difference range 4·47–48·51 ng/mL). For many antipsychotics (eg, sulpiride) prolactin data were not available (
figure 4D).
51 studies (13%) with 15 467 participants (29%) reported usable data for QTc prolongation. Seven (50%) of 14 antipsychotics caused significantly more QTc prolongation than placebo with mean differences ranging from 3·43 ms (95% CrI 0·94–6·00) for quetiapine to 23·90 ms (95% CrI 20·56–27·33) for sertindole (
figure 4E).
162 studies (40%) with 30 770 participants (58%) reported results for sedation (32 antipsychotics). Risk ratios for the 18 drugs (56%) that were significantly more sedating than placebo ranged from 1·33 (95% CrI 1·00–1·68) for paliperidone to 10·20 (95% CrI 4·72–29·41) for zuclopenthixol, and there was some evidence of sedation for most of the remaining antipsychotics (
figure 4F).
134 studies (33%) with 26 904 participants (50%) reported anticholinergic side-effects (32 antipsychotics). This outcome can be affected by use of anticholinergic medication, which is often needed for the treatment of extrapyramidal side-effects. Evidence for significantly higher risk than placebo was present for risperidone, haloperidol, olanzapine, clozapine, iloperidone, chlorpromazine, zotepine, thioridazine, and quetiapine (risk ratio range 1·31–3·89;
figure 4G).
Heterogeneity was low to moderate for most outcomes, moderate to high for use of antiparkinson medication, and high for prolactin elevation. SIDE testing showed that the percentage of comparisons with evidence of inconsistency was 2–26% for all outcomes, except for quality of life with 50% comparisons with evidence of inconsistency; therefore, this outcome was examined in a pairwise meta-analysis (appendix p 222). Additionally, prolactin results were significantly inconsistent according to the design-by-treatment interaction test. Because prolactin values vary widely between men and women and assays used in different laboratories, we also applied SMDs, and heterogeneity and inconsistency were substantially lower (appendix pp 223, 224, 281).
The most important differences in terms of study characteristics were that older antipsychotics had less placebo response than newer ones and that the antipsychotics differed in their median baseline severity across studies (appendix pp 53–61). These potential threats to the transitivity assumption and other potential effect modifiers were addressed by metaregressions and sensitivity analyses of the primary outcome, excluding antipsychotics studied in less than 100 participants. The degree of placebo response, which has increased in the past 60 years (
27), had the greatest effect on heterogeneity. The effect sizes of the individual antipsychotics changed after accounting for response to placebo, but the overall hierarchy did not (appendix pp 177–89). This finding was corroborated by removing placebo groups or placebo-controlled studies in sensitivity analyses (appendix pp 190–99). Publication year, mean participants’ age, baseline severity, percentage of male patients, sample size, and sponsoring also did not affect the hierarchy of relative treatment effects compared with the unadjusted analysis (appendix pp 177–89). Sensitivity analyses removing studies with overall high risk of bias, completer analyses, imputed standard deviations, duration more than six weeks, and unfair dose comparisons, failed trials, and trials done before 1990 did not affect the results (appendix pp 190–99).
The certainty of the evidence was low overall (appendix p 242). Concerning the primary outcome, we judged the confidence in the evidence for 75% of the comparisons with placebo to be low or very low (
figure 2A), and this was the case for 92% of the comparisons of two antipsychotic drugs (appendix p 242). Many older antipsychotics are among those with poor CINeMA ratings and often have no evidence for several secondary outcomes.
Comparison of the change in overall symptoms of all antipsychotics with haloperidol by use of a contour-enhanced funnel plot did not reveal any asymmetry and the SMD did not change using the trim-and-fill method (appendix p 285). By contrast, comparison of all antipsychotics with placebo revealed that smaller trials exaggerate the effectiveness of the active interventions versus placebo. SMD changed from 0
·45 to 0
·38, confirming an earlier analysis (
27).
Discussion
To our knowledge, this analysis is the largest network meta-analysis in the field of schizophrenia, based on 402 studies including 53 463 participants randomly assigned to 32 different first-generation and second-generation antipsychotics or placebo. We extended our previous report (
3) by two second-generation antipsychotics and 15 first-generation antipsychotics and by investigating ten additional important outcomes, including specific aspects of efficacy, quality of life, and many more side-effects, and several methodological issues, including placebo response and sample sizes (
3).
Individual effect size estimates suggest that all antipsychotic drugs reduced overall symptoms more than placebo (not significant for six drugs) with mean effect sizes between –0
·89 and –0
·03 (median –0
·42). However, overlapping CrIs between antipsychotics suggest that differences between most individual drugs were not significant. With few exceptions, only clozapine, amisulpride, zotepine, olanzapine, and risperidone were significantly more efficacious for the primary outcome than other antipsychotics. Readers should consult
figure 3, which provides these comparisons. Amisulpride was among the most efficacious antipsychotics, but no placebo-controlled study was available, making this evidence entirely indirect. Nevertheless, amisulpride was significantly superior to placebo in older patients (≥60 years of age; SMD 0.86) and in patients with predominant negative symptoms (SMD 0.47) (
28,
29).
Mainly newer antipsychotics provided data separately for positive and negative symptoms, but they were similar to data for overall change in symptoms. However, all included studies focused on positive symptoms, because studies with predominant negative symptoms were excluded in this analysis and were evaluated separately (
28). Whether differences in negative symptoms relate to primary or just secondary negative symptoms is impossible to clarify in populations with positive symptoms. The fact that many drugs improved depressive symptoms more than placebo might also reflects a reduction of anxiety and distress associated with schizophrenia. Nevertheless, aripiprazole, brexpiprazole, cariprazine, lurasidone, and quetiapine are licensed in several countries for major depression, bipolar depression, or both. So is flupentixol, but we did not find an antidepressant effect for it on the basis of sparse data (62 participants) (
30). Many antipsychotics did not have data for quality of life, an important outcome for patients because it combines efficacy and safety. If reported, most drugs showed better effects than placebo. Some but not all drugs also outperformed placebo in terms of social functioning in these short-term studies, an outcome associated with recovery and social reintegration.
Because all-cause discontinuation combines efficacy and tolerability, it has been used as a measure of effectiveness in the CATIE trial (
31). When reported separately, more patients dropped out due to inefficacy (40%) than due to adverse events (20%) in the included trials so that all-cause discontinuation is primarily an efficacy measure.
Antipsychotics are often taken for a long period, so side-effects have an important role concerning morbidity and adherence and might affect cognition (
32). Antipsychotics very often scored worse than placebo for side-effect outcomes, with different profiles. In general, older antipsychotics were associated often with more extrapyramidal motor side-effects and prolactin elevation (with noticeable exceptions, such as amisulpride, paliperidone, and risperidone), whereas many newer antipsychotics produced more weight gain and sedation. We consider weight gain to be a good proxy for metabolic side-effects in this already dense review (
33). Specific metabolic side-effects such as glucose, insulin, homeostatic model assessment for insulin resistance, total cholesterol, LDL cholesterol, HDL cholesterol, and triglycerides will be addressed in future reviews. In contrast to our previous report, we present QTc prolongation in original units (ms), which facilitates clinical interpretation; lurasidone and the partial dopamine agonists were the most benign drugs.
With regard to efficacy and safety outcomes many older antipsychotics, limited by few direct comparisons, performed well compared with newer antipsychotics. This finding is important, because in low-income and middle-income countries, second-generation antipsychotics might not be affordable. However, older studies with negative results could have remained unpublished more frequently, whereas now all clinical trials should be registered. In an analysis of all antipsychotics compared with placebo, contour-enhanced funnel plots suggested the existence of unpublished studies.
Our analysis had limitations. We used strict inclusion criteria to obtain a homogenous sample, nevertheless the included studies were done over a 60-year period, during which study characteristics changed. Checking for consistency revealed few inconsistent loops and low-to-moderate heterogeneity in most outcomes (appendix pp 223, 224), but the overall power to detect inconsistency is low (
16). Major exceptions were quality of life, for which a network meta-analysis was not calculated, and prolactin increase. Because prolactin results might depend on the laboratory assay used, we calculated SMDs in addition to mean differences (appendix p 281), which reduced heterogeneity strongly. Still, the finding that clozapine and zotepine significantly reduced prolactin compared with placebo (with wide CrIs) might be a statistical artifact driven by outliers, because only two small trials were available. The most important threat to the transitivity assumption of network meta-analysis was the increase of placebo response over the years (
27,
34), because adjusting for placebo response in a metaregression strongly reduced heterogeneity (τ) by 60–63% (appendix pp 179–83). In this metaregression model, the ranking was not substantially different from the primary analysis. Additionally, removing placebo groups, placebo-controlled studies, and failed studies in sensitivity analyses did not substantially change the results nor did metaregressions of six other moderators and further sensitivity analyses, supporting the robustness of the findings. The results of the network meta-analysis were consistent overall with those of pairwise metaanalyses (
figure 3) and single studies. For example, a study comparing brexpiprazole with placebo and quetiapine found that brexpiprazole was better than placebo, but worse than quetiapine, similar to the hierarchy of our analysis (
35). In a long-term study (sponsored by asenapine’s manufacturer) olanzapine was significantly better than asenapine (
36). Thus, we do not believe that placebo response explains all efficacy differences between the compounds. Nevertheless, the statistical methods could not fully account for the heterogeneity (
27), so some efficacy differences might appear larger than they actually are.
Our decision to exclude studies from mainland China reduces the generalisability of the results to this country. However, a literature and telephone interview study suggested that most Chinese trials continue to be of low quality (
14). Chinese reports are usually very short and communication with the authors is often difficult due to language barriers, thus risk of bias is difficult to assess. Therefore, we a-priori decided to exclude Chinese studies.
Clinical trials exclude suicidal patients, and the severely ill are unlikely to be included in modern trials because providing informed consent is often not possible for them. With a mean duration of illness of 12 years, our sample consisted mainly of chronic patients, who are known to respond worse compared with first-episode patients (
37). These factors reduce generalisability.
For feasibility reasons our risk of bias assessment focused on the primary outcome; however, risk of bias is outcome-specific (appendix pp 273–77). Moreover, the evidence for many secondary outcomes (eg, social functioning) was based on much lower sample sizes compared with the primary outcome (4370 vs 40 815).
These limitations reduced the strength of the derivable recommendations, particularly (but not only) for older antipsychotics, because their effect sizes are based primarily on one or two studies with sample sizes smaller than 100. Small sample sizes leave room for small trial effects, which might have inflated some results. For example, the large effect of clozapine concerning reduction of negative symptoms is based on 159 participants, because clozapine is mainly studied in treatment-resistant patients who were excluded from the analysis. The contribution of direct evidence is small for older drugs, resulting in wide CrIs, higher uncertainty, and lower confidence in the evidence evaluated by CINeMA. The generally smaller amounts of data available for old drugs, except for perphenazine, which had more evidence of good quality from a large trial (
31), are highlighted in the figures and should be considered in the interpretation of all findings.
Because so many antipsychotic options are available, our results should help health-care providers find the most suitable drug for the individual patient, balancing side-effect profiles and the efficacy of different drugs. We confirm that antipsychotics differ more in their side-effects than in their efficacy. We believe that efficacy differences between compounds exist, but the fact that their measurement is based on subjective rating scales is problematic. The development of objective efficacy measures would render interpretation easier. Clinicians must remember that reported results are averages and that response and side-effects might vary considerably in individual patients.