In the United States, training, credentialing, and funding increasingly are expected to be evidence based (
1–
6). Evidence-based practice entails “the integration of the best available research with clinical expertise in the context of patient characteristics, culture, and preferences” (
7). The definition requires that there be an accurate understanding of the current research literature. Given the limitations of many frontline clinicians in the time and the expertise necessary to evaluate research (
8), individuals looking to be guided by evidence have increasingly relied on systematic reviews to cull the best available evidence and distill it into useful practice recommendations (
7,
9,
10).
In the past 15 years numerous professional and governmental bodies have undertaken systematic reviews to identify evidence-based treatments (
11); however, these bodies have not always agreed on what merits that label (
12–
14). For example, among interventions for treating trauma in children and adolescents, eye movement desensitization and reprocessing was awarded the top rating (well supported) by the California Evidence-Based Clearinghouse for Child Welfare (
15) but the lowest rating (no support) by the Hawaii Blue Menu of Evidence-Based Child and Adolescent Psychosocial Interventions (
16). Such inconsistency can leave clinicians uncertain about which treatments to use and perhaps foster clinician skepticism about the merits of evidence-based systems.
The mental health treatment field might benefit from having a single, unimpeachably strong systematic review system. Of the review systems currently available, the Cochrane Collaboration, containing more than 5,000 systematic health care reviews (
17), rises as a potential candidate. Cochrane reviews are well respected internationally for their transparent, standardized methodology, and they have been found to be more methodologically rigorous than other systematic reviews (
18–
21). However, notwithstanding the importance of methodological rigor, to be optimally useful a review system must also address the pressing clinical practice issues, be current, accurately reflect the treatment literature, and provide practice guidance (
8,
9,
22–
24). Hence, when evaluating Cochrane’s appropriateness as a mental health review system, it is equally important to assess suitability on these dimensions as well.
Cochrane reviews are initiated and conducted by volunteers who align themselves with one of 53 separate collaboration review groups (CRGs), each with its own scope and editors. Cochrane’s stated aim is to assist providers, consumers, policy makers, and others to make health care decisions (
25,
26); however, historically, author interest and the CRG’s agenda have determined which areas of review are prioritized (
27), raising questions about whether the reviews address the pressing needs of practicing clinicians (
18,
24).
After registering a review title with a CRG, authors publish a peer-reviewed protocol. It is Cochrane’s policy to convert protocols into full systematic reviews within two years and to require authors to update reviews every two years or to include a commentary explaining why the review has not been updated (
26). Research has questioned Cochrane’s ability to adhere to this time line (
28,
29). This finding is noteworthy because recent data suggest that psychopharmacology interventions need updating at a faster rate than drug interventions in other medical specialty areas (
30).
The Cochrane handbook states that randomized controlled trials are the preferred means for addressing the effectiveness of health care interventions and cautions authors that nonrandomized studies cannot “give anything close to a definitive answer about the likely effects of an intervention” (
26). Concurrently, Cochrane advocates searching the “gray literature” to minimize publication bias (
26). Given the competing interests to be both inclusive and exclusive, it is unclear how well Cochrane reviews reflect the psychopathology intervention literature, a notable portion of which does not utilize randomized controlled designs (
31,
32).
Each Cochrane review culminates in a section called “Implication for Practice.” The Cochrane Collaboration has been criticized for a tendency to produce inconclusive reviews (
33–
35). Historically, the Cochrane handbook has stated that reviews should summarize the evidence to assist health care decision makers rather than make direct practice recommendations. More recently, however, the Cochrane Collaboration has begun implementation of the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system. To evaluate the quality of the evidence, GRADE integrates aspects of internal validity with factors related to applicability, such as burden or risk, cost, and values of the recipient, thereby influencing the strength of the recommendations (
36,
37).
Although Cochrane reviews are often recognized as the gold standard for general medical health care, their appropriateness for guiding evidence-based mental health care decisions merits examination. Cochrane’s coverage of current treatment literature and the ability of Cochrane reviews to provide guidance in areas needed by practicing clinicians are of particular interest.
Methods
The Cochrane Database of Systematic Reviews (2005–March 2012) includes all protocols and reviews that first appeared, were updated or amended, or were withdrawn between 2005 and March 2012. The sample of mental health reviews and protocols used for this study represents the database entries from the three CRGs that focus on mental health issues—depression, anxiety, and neurosis; schizophrenia; and developmental, psychosocial, and learning problems. Entries from CRGs that address dementia and substance use disorders were not included.
Each entry was coded for CRG group, type (protocol or review), design (exclusively randomized controlled trials or inclusion of nonrandomized studies), and topic—for example, mood or anxiety. Whether the authors used or planned to use the GRADE system was recorded. Datedness was calculated by using the year that the entry was last considered up to date according to the history section of each database entry. The withdrawal of a review or protocol, as well as the reason for the withdrawal, was also noted.
For reviews only, the number of studies included and excluded was derived from tables at the conclusion of each review. Included studies were also classified by whether any portion of the study had been published. Finally, by using both the Implications for Practice section in the review’s conclusion and the abstract’s conclusion, each review’s conclusions were categorized as primarily positive, primarily negative, mixed, or inconclusive. The conclusions of 36 (10%) reviews that were not withdrawn were evaluated by a secondary coder, and interrater reliability was high (κ=.88, p<.001).
Results
The three CRGs contributed 552 entries—188 (34%) from the depression, anxiety, and neurosis group; 205 (37%) from the schizophrenia CRG; and 159 (29%) from the developmental, psychosocial, and learning problems group. Nearly one-third (N=177, 32%) of entries were protocols. Each entry was coded as representing one of 23 topics. As shown in
Table 1, 182 (33%) systematic reviews involved serious mental illness consisting primarily of psychotic disorders, such as schizophrenia. An additional 16 (3%) entries focused on the side effects of antipsychotic medication. Unipolar and bipolar mood disorder represented 97 (18%) entries, and 43 (8%) entries related to anxiety disorders. No other topic accounted for greater than 5% of the entries.
A total of 27 entries, 16 (60%) protocols and 11 (40%) reviews, representing 5% of the sample, were withdrawn from 2005 until March 2012. Withdrawn entries were more likely to be in the developmental, psychosocial, and learning problems group (N=13, 8%) and in the depression, anxiety, and neuroses group (N=11, 6%) than in the schizophrenia group (N=3, 2%) (χ2=9.24, df=2, p=.01, Φ=.129).
The process of converting mental health protocols into reviews occurred on average in the two-year time frame suggested by the Cochrane Collaboration (N=345; mean±SD=2.15±1.82 years). However, not all protocols followed this path, and a subset of protocols did not progress beyond the protocol stage. Specifically, 46 (26%) mental health protocols had not yet converted into reviews despite having been first published in 2007 or earlier.
Updating of completed reviews did not appear to follow the prescribed two-year timetable. After omitting the 11 reviews that were subsequently withdrawn, a total of 325 reviews listed the year that they were last considered up to date. [All 39 reviews that lacked such information had been published since 2009, so the date of publication was substituted for year last updated.] On average, reviews were last considered up to date in 2006 (2,006.42±3.02; range 1998–2012). Three-quarters (N=272, 75%) of the reviews were based on information no more recent than 2008, and approximately one of nine (N=41, 11%) was last considered up to date a decade or more ago. Datedness was the reason for withdrawal for 19 (70%) of the 27 withdrawn studies.
Excluding withdrawn studies, Cochrane mental health reviews included 12.11±17.75 studies (range 0–194 studies). However, Cochrane mental health reviews typically excluded considerably more studies (33.03±58.60) than they included, with 287 reviews (79%) restricted to studies of randomized clinical trials. Studies of which no part had been previously published were unlikely to appear in a Cochrane review, given the ratio of unpublished to included studies (.019±.056).
Excluding withdrawn entries, the most common outcome (N=159, 44%) of Cochrane mental health reviews was insufficient evidence to form a conclusion. An additional 63 (17%) reviews concluded there was sufficient evidence, but the evidence was mixed. Thirty-seven (10%) reviews had a predominantly negative conclusion, and 105 (29%) reviews had a predominantly positive conclusion. There was no difference between reviews that used only randomized controlled trials (N=123) and those that included nonrandomized studies (N=36) in their likelihood to be inconclusive (43% and 47%, respectively; χ2=.38, df=1, p=.54). In contrast, use of the GRADE system, which had been employed in 47 (13%) of all nonwithdrawn reviews, was associated with conclusions. Reviews that employed the GRADE system (N=14) were less likely to have an inconclusive outcome than those that did not use the GRADE system (N=145) (30% and 46%, respectively; χ2=4.24, df=1, p=.04, Φ=–.108).
Because the most common conclusion of a review was that there was insufficient evidence to form any determination on the merits of the intervention, a logistic regression was conducted to determine predictors of this outcome. Conclusions were collapsed into a dichotomous variable (insufficient evidence to form a conclusion versus all other outcomes) with the following variables entered as predictors: year review was last considered up to date, CRG group, design (exclusively randomized controlled trials versus inclusion of nonrandomized studies), use of the GRADE system, and ratio of studies excluded to total studies.
Use of the GRADE system and ratio of excluded to total studies predicted likelihood that the review would determine that there was too little evidence to form a conclusion (
Table 2). The overall model was significant (χ
2=54.71, df=6, p<.001), with approximately 19% (Nagelkerke R
2=19.0) of the variance in conclusions accounted for by these two predictor variables.
Further examination found significant differences in the ratio of excluded studies to all studies between the reviews that found too little evidence to form a conclusion (.78±.23) and reviews that arrived at a conclusion, be it positive, negative, or mixed (.62±.23) (t=–6.89, df=356, p<.001, d=.78). Inconclusive reviews, on average, omitted 78% of all potential studies on the intervention.
Discussion
Although clinician adherence in daily practice to evidence-based guidelines remains limited (
38,
39), there is increasing focus on the use of evidence-based mental health treatments. Despite this emphasis within the field, there has been inconsistency in how that label is applied. The mental health field might well benefit from having a well-established systematic review system rather than competing, and at times inconsistent, efficacy assessments (
12–
14).
In a number of ways the Cochrane Collaboration appears well suited to be that system. It has a transparent, strong methodology that has been found to be more rigorous than the methodology of other review systems (
18–
21). It is a long-standing, well-established system that health care providers regard as highly credible (
40). However, questions have arisen regarding its topic coverage, datedness, representativeness of the literature, and ability to provide clear guidance, and these questions merit evaluation.
In terms of clinical breadth, a majority (58%) of Cochrane’s mental health entries were concentrated in three areas (psychosis and mood and anxiety disorders). These mental health domains are known for their high disease burden, cost, and prevalence (
41–
44). Moreover, recently the Cochrane system appears to have focused on new diagnostic areas; for example, nine of the 14 entries for personality disorders are protocols that first appeared in 2011 or 2012. Nevertheless, review coverage outside these three major areas is limited at present. This finding is consistent with a recent report that 58% of all Cochrane systematic reviews of substance misuse focused solely on alcohol or opioids (
45). In fact, recently the Cochrane Collaboration has identified the development of a transparent system for prioritizing topics that is responsive to user input as a strategic recommendation (MacLehose, et al., unpublished paper, 2012).
Datedness appears to be a greater area of concern. Although mental health protocols that progressed to reviews did so in a timely manner, 26% of protocols had not been converted to a review five or more years after publication. As a result, some review topics remain undeveloped because an author group has laid a claim to the topic in a prior protocol. Topics can become available again when entries are withdrawn, but withdrawal is a relatively rare occurrence with withdrawal rates differing among CRGs.
Although the Cochrane Collaboration’s policy states that reviews be updated every two years, the average Cochrane mental health review was last considered up to date in 2006. Although there is a lack of consensus regarding how often systematic reviews need to be updated (
29,
30), it is concerning that one in nine mental health reviews was last considered up to date a decade or more ago. The Cochrane Collaboration has acknowledged the need for a more flexible system and has begun to explore alternatives to a uniform two-year updating policy (MacLehose, et al., unpublished paper, 2012).
Another concern is the large portion of reviews (44%) that determined they had insufficient evidence to form any conclusion, even a mixed one. Contrary to the Cochrane Collaboration’s assertion that reviews that include nonrandomized studies are challenged to form definitive conclusions regarding efficacy, the design of the entry was not associated with the likelihood that a review would be inconclusive. The formation of a conclusion was predicted by the ratio of excluded to total studies and use of the GRADE system.
Cochrane reviews excluded for consideration noticeably more studies than they included. Despite a stated desire by the Cochrane Collaboration to minimize publication bias, relatively few reviews included studies with completely unpublished data. In fact, the more studies that the reviews excluded, the more likely the reviews were to determine there was insufficient evidence to form a conclusion regarding the efficacy of an intervention. Although inconclusiveness may accurately reflect the state of the literature, given that data suggest reviews that included more varied designs have arrived at similar conclusions (
23,
46), there is a call for mental health reviews to be based on a wider representation of the psychopathology intervention literature (
34).
The other significant predictor of review conclusions was whether the GRADE system was used. Reviews that employed the GRADE system were less likely to determine that there was insufficient evidence to form a conclusion, perhaps reflecting the author’s process of having evaluated the strength of the evidence in an organized manner. As yet, though, only a small proportion of Cochrane reviews have used the GRADE system, with likelihood of use being CRG dependent.
In terms of limitations, this study examined only mental health entries from the three primary mental health CRGs and thus did not include all mental health reviews and protocols in the Cochrane library. Specifically, neither the dementia nor the substance use disorders CRGs were assessed; moreover, a number of mental health reviews are scattered among other CRGs, for example, reviews on postpartum depression in the pregnancy and childbirth CRG. Thus although the sample included all entries from the three mental health CRGs during this time frame, it did not include every mental health review or protocol in the Cochrane library.
In a related vein, the reviews’ diagnostic topic was assessed to measure whether reviews addressed clinical needs. However, other aspects besides diagnosis are important in determining if a review fits clinician’s needs. Specifically, assessing characteristics of the interventions reviewed (for example, pharmacological versus psychotherapeutic or prevention versus tertiary) and of the populations studied (for example, pediatric or geriatric) would be needed to understand if the reviews targeted the topics of importance to individuals seeking empirically based guidance.
Conclusions
Although the Cochrane system is recognized as a premiere review system for general medical health interventions, it is important to recognize the limitations of its use for the mental health profession. The system’s datedness, its limited coverage of some mental health domains, and the fact that reviews regularly are inconclusive and often do not reflect the full range of the psychopathology treatment literature are worth noting. The Cochrane Collaboration has already begun to discuss changes to address issues of topic prioritization and review updates (MacLehose, et al., unpublished paper, 2012). Continued exploration into the potential role of varied types of evidence, greater integration of the GRADE system into reviews, and consistency among CRGs in attending to protocol progress may assist in further addressing these issues.
Acknowledgments and disclosures
The author is indebted to Kevin Hennessy, Ph.D., for feedback and coding assistance.
The author reports no competing interests.