In this report the term “first-episode psychosis services” refers to services for patients who present for the first time with a diagnosable nonaffective psychosis, including schizophrenia, schizoaffective disorder, schizophreniform disorder, delusional disorder, brief psychotic disorder, and psychosis not otherwise specified. Over the past two decades comprehensive approaches to the early detection and treatment of psychosis have been developed (
1). Extensive research has evaluated both the efficacy (
2) and the effectiveness (
3) of these programs. There has also been large-scale implementation of these programs internationally (
4). Despite these efforts, no systematic review has been conducted of the essential evidence-based components of such services.
The effectiveness of first-episode psychosis services has been examined in several randomized controlled studies in which such services were compared with treatment as usual (
5–
9). The largest study has demonstrated the most robust results (
9). Other randomized studies have provided data supportive of the advantages of first-episode psychosis services, but they lacked power to provide conclusive evidence (
5–
8). Although the evidence of effectiveness of first-episode psychosis services is encouraging (
10), a meta-analysis concluded that further research is required to prove effectiveness (
2).
The Schizophrenia Patient Outcomes Research Team (PORT) is a highly influential U.S. group that develops and disseminates evidence-based clinical guidelines for schizophrenia. Only treatments supported by substantial scientific evidence achieve recommendation status. PORT’s most recent review concluded that a review was merited on the basis of the volume and quality of research on first-episode psychosis, but the evidence was not sufficient to support a treatment recommendation at the time, primarily because of the small number of studies and some inconsistencies among the findings (
11).
A major limitation in research on first-episode psychosis services has been a lack of agreement on what components constitute such services or which package of components is essential. Effectiveness studies of first-episode psychosis services have neither specified nor measured the components of care provided in the experimental and treatment-as-usual arms. Burns (
12) has advocated for the end of studies that use an unspecified treatment-as-usual control group. Furthermore, first-episode psychosis services have been identified as complex care systems, and the United Kingdom Medical Research Council has issued guidelines that outline key principles for evaluating complex interventions (
13). One of the four key principles is understanding and measuring key processes of care, linked to a theory of why the system is effective. A fidelity scale measures these key processes of care. Ideally, fidelity scales should be available to measure both arms of a treatment trial (
14). A review of the effectiveness literature on first-episode psychosis services identified the lack of a fidelity scale as a fundamental barrier to progress in this field (
3). Finally, measurement of fidelity to evidence-based practices has been identified as essential to the effective dissemination and implementation of evidence-based programs (
15).
The World Health Organization outlined core operating principles and practices and made ten broad recommendations concerning services for people experiencing a first episode of psychosis (
16). Examples of recommended practices include early detection, access to comprehensive services, availability of both psychotropic and psychosocial interventions, and public education. A study of essential components for first-episode psychosis services identified a list of 151 elements from ten categories of team, structure, and function (
17). Twenty-one expert clinicians reviewed this list of elements using a Delphi consensus method and reduced the number to 106 service components rated as essential. The authors suggested that these elements were a reasonable basis for defining a service model from which to derive a measure of fidelity. However, the elements were not derived from an evidence-based review and included several broad recommendations not specific to early-psychosis services, such as access to translation services.
Our project fits within the broad rubric of knowledge translation, defined as closing the gaps between knowledge and practice (
18). The purpose of our project was to identify a set of evidence-based components deemed essential for first-episode psychosis services, primarily as a step toward developing a fidelity scale for such services but also to generate a list that can be used for quality monitoring—a key implementation strategy for knowledge translation (
19). During an initial review of the literature, it became apparent that some studies provided only broad descriptions of the services provided in the experimental arm—for example “assertive community treatment enhanced by better specific content via family involvement and social skills training” (
9)—while providing even fewer details of the components of control treatments that are usually described as treatment as usual.
To improve the description, reporting, and measurement of components of care both in research and clinical practice, we used a two-stage approach to identify essential components. A component has been defined as “an activity, material, or facility which can be observed or verified, is logically discrete from other components, and is specific to the innovative program” (
20). We see the identification of essential components of care as a step toward the development of a fidelity scale to measure these essential components (
21).
Methods
Literature review
In consultation with a librarian who is a search specialist, we developed a strategy for an evidence-based review to identify components of first-episode psychosis services. Databases reviewed were MEDLINE, PsycINFO, and EMBASE for the period January 1980 through April 2010. Search terms included early psychosis or early schizo*, early psychotic episode or first psychotic episode, AND fidelity or program development or evaluation or impact or intervention or early intervention or program effect*. [The search strategy and initial results are described in an online data supplement to this article.]
In the first review step, two investigators independently read the articles to identify the components or interventions that were specified in the study. These were typically described in broad terms, such as pharmacotherapy or social skills training. The majority of studies listed multicomponent interventions. If a program description included two or more components, each component was identified as a separate item. In the next step, the two sets of reviews were compared for components and terminology. Where there were differences, a consensus terminology was agreed upon.
In the third step, we followed a two-step methodology designed to integrate the range of levels of evidence from multiple studies into a single rating for each component. We used a system designed to meet standards identified in the domain on rigor of development in AGREE II (Appraisal of Guidelines for Research and Evaluation II) (
22). First, we assigned a quality rating to each study, using the criteria adapted from the Canadian Task Force on Preventive Health Care and modified by Portney and Watkins (
23). [The criteria are listed in the online data supplement to this article.] The criteria rate individual studies on a scale of I to IV, with I indicating a randomized controlled study and IV indicating a descriptive study. When only one component, such as supported employment (
24), was the focus of a randomized controlled study, we assigned a level of evidence of I to that study and to that component. When a study listed multiple components, such as case management and social skills training (
9), we assigned each component the level of evidence given to that study. This left us with a list of components; each component was assigned a range of levels of evidence.
Second, to integrate this range of findings into a single rating for each component, we used a rating system previously used in the Canadian Psychiatric Association’s clinical practice guideline of the treatment of schizophrenia (
25). This system assigns four levels: A, strong evidence; B, moderate; C, weak; and D, no evidence of benefit or harm. [The levels are described in more detail in the online data supplement to this article.] The ratings provide a synthesis of the evidence across relevant studies. The four levels were used to rate the components identified in the systematic review. For example, supported employment was identified as a component of a number of multicomponent interventions for first-episode psychosis but also as the only variable in a randomized controlled study (
26). As a result it received a rating of A.
Delphi expert consensus process
The next step involved use of a multistakeholder consensus process, the Delphi, to rate the importance of the components. The Delphi is a systematic consensus-building process that obtains and quantifies the opinions of a group of experts (
27,
28). The approach allows for communication between experts via questionnaires presented electronically and in rounds. The Delphi avoids the potential bias associated with meetings, in which individuals may be inhibited or intimidated from expressing their views because outspoken members may dominate the group. An additional advantage of using the Delphi is that it brings together stakeholders who are geographically distant. A potential limitation of the Delphi is attrition of members as a result of the process of repeated rounds.
The Delphi technique has been previously used in mental health services research, including in the identification of key components of schizophrenia care (
29), the description of service models of community mental health practice (
30), the characterization of relapse in schizophrenia (
31), and the identification of a set of quality indicators for first-episode psychosis services (
32).
The Delphi process was approved by the local Conjoint Health Research Ethics Board. We selected a panel of experts in the field of early psychosis. The experts were identified through a literature search (English-language publications between 2005 and 2010) that included the search terms early intervention, first-episode psychosis, early psychosis, clinical research, and health service research. An expert was defined as the first author or lead author on at least one relevant publication in a peer-reviewed journal. The experts were individually invited to participate.
The Delphi questionnaire was designed by the investigators and included the evidence-based components that were identified in the review, together with the rationale, definition, and level of supporting evidence for each component. The components were grouped into six domains: population-level interventions, comprehensive assessments and care plan, individual-level interventions, group-level interventions, service system and models of intervention, and evaluation and quality improvement. Respondents rated each component on a 5-point scale of importance. The questionnaire was pilot-tested with two local clinical experts in early-psychosis treatment. Refinements in wording were made, and a test of online administration was conducted.
We used an online survey software program, Qualtrics (2011 edition), to enable electronic tracking and manipulation of data. This is a secure method of collecting data. Participants were sent an e-mail that contained unique links for accessing and completing the Delphi questionnaire. The use of unique identifiers in Qualtrics permitted the linking of results from one round to the next, allowed panelists to save their work and continue later, prevented completion of the questionnaire more than once per round, and allowed researchers to embed data into future rounds, providing the feedback that is necessary in the Delphi.
In round 1, expert panelists rated the importance of each component on a scale of 1 to 5 (1, unimportant; 5, essential). Panelists were also invited to provide comments. The level of consensus on the importance of each component was calculated. The group’s median ratings were provided as feedback to each stakeholder, together with his or her own ratings and all comments. The degree of consensus achieved was assessed by calculating the semi-interquartile range of the score assigned by the stakeholder for each component. The semi-interquartile range is defined as half of the difference between the 75th percentile score minus the 25th percentile score. Each round built upon responses to the former round. Consensus was defined a priori as a semi-interquartile range of ≤.5. A score of 5 was required for a component to be deemed “essential.” The process ended when the predetermined consensus level of ≤.5 was reached on items or when there was no change in consensus between rounds.
Results
Systematic review
The search for peer-reviewed literature (January 1980 through week 1 of April 2010) yielded 13,239 citations. This list was reduced by adding the terms fidelity or program development or evaluation or impact or intervention or early intervention or program effect. This narrowed the results to 1,020 citations. We reviewed the abstracts for relevance and excluded 740, for a total of 280 articles. The two lists of components from the two independent reviewers were compared. Elimination of duplication and alternative labeling reduced the final list to 75 unique components. Finally, the best supporting evidence available was ascribed to each component. [The online data supplement includes a list of the 280 articles and the 75 components with the ascribed evidence level.]
Delphi
A total of 105 authors were identified as potential Delphi stakeholders. We identified individuals who were both clinical researchers and principal authors, which eliminated 49 names. We sent letters of invitation to 56 researchers from July to September 2010. If no response was received within six weeks, one reminder letter was sent. Thirty-one individuals agreed to participate, of whom 27 completed round 1 and 23 completed round 2. Countries of residence of round 1 completers included the United States (eight persons, 29%), Canada (seven persons, 25%), Australia (four persons, 15%), United Kingdom (two persons, 8%), Norway (two persons, 8%, and Singapore, Germany, Denmark, and Ireland (one person each, 4%). The Delphi experts achieved consensus in two rounds of questionnaire administration. The 32 evidence-based components identified by participants as essential are listed in
Table 1, together with the semi-interquartile range. The ranges that are closer to 0 denote a higher level of agreement.
The participants used the full range of ratings, with agreement improving between rounds 1 and 2. The ratings were skewed toward the high end of importance. This skew was even more pronounced in round 2, in which 32 (43%) of the 75 components rated by the 23 participants were rated as essential (a rating of 5), 27 (36%) were given a rating of 4, 12 (16%) were given a rating of 3, and four (5%) were given a rating of 2. No component was rated as 1 or 0.
Discussion
Our review found an adequate corpus of research on first-episode psychosis services to inform a panel of experts in identifying evidence-based components (
21). The level of evidence supporting the components varied from an A to a D. The Delphi was successful in reducing 75 evidence-based components to 32 essential components. This number is a more manageable number than the 106 elements rated as essential by a previous expert panel of clinicians (
17).
Pharmacological components generally had the highest level of supportive evidence, although psychosocial components, such as family psychoeducation or multifamily group psychoeducation and supported employment, both had level A evidence. The lack of evidence for a number of organizational components reflects the lack of attention paid to these issues in the research literature. The experts rated as essential some items with a low level of evidence; for example, component 5, timely contact with a referred individual, which had an evidence level of D, was rated essential. We identified no empirical studies that addressed this component; therefore, rating this component as essential reflects the tension between the level of evidence and the clinical experience of the experts.
The gray literature documents proved useful, because several referenced original research that was not found in the database search and provided descriptions of programs or standards and practices for programs. The descriptions of program practices in the gray literature often provide more detail than the peer reviewed literature. This suggests that the opinions of groups other than clinical research experts may be of value in identifying some important components such as the organizational structure of services. A knowledge synthesis process for decision support can be used to bring together knowledge users such as policy experts and service providers in a process that requires both knowledge synthesis and engagement of decision makers in the development of the research question and the synthesis protocol (
33).
A limitation of the study derives from the parsimonious description of the organizational components of services in the research studies. Although the gray literature provided more detail than the research studies, this was not linked to the level of supportive evidence, which was based exclusively on the systematic review of peer-reviewed research. A second limitation lay in how the level of supportive evidence was assessed in two situations: when a specific component was investigated alone in a randomized controlled trial and when it was simply listed as a component of a package of interventions. When the component was investigated alone, we assigned a level of evidence on the basis of the literature review. Examples of individually investigated components include pharmacotherapy and supported employment. When a component was simply mentioned as part of a package, such as social skills training, the judgment of experts became more influential. Another limitation of the research studies reviewed was the lack of evidence on the dose or duration of an intervention. For example a systematic review of family psychoeducation reported positive outcomes both for brief interventions, which were defined as less than six months but a minimum of four sessions, and for longer interventions, which were defined as longer than six months (
11).
Conclusions
The literature review identified a satisfactory research evidence base from which to identify essential evidence-based components of first-episode psychosis services. The results of the Delphi process suggest that there is a consensus on the essential components, but the description of individual components lacks the precision to measure the components. The components identified can form the basis for developing a fidelity scale that may prove useful in research, quality improvement, and accreditation.
Acknowledgments and disclosures
This research was supported by grants from the Carlos Ogilvie Foundation and the Hotchkiss Brain Institute. The authors acknowledge the support of Diane Lorenzetti, M.L.S., who provided valuable assistance with the literature search, and the international group of experts who participated in the Delphi process.
The authors report no competing interests.