Modern medicine emphasizes prevention as the optimal method of promoting health care and reducing public health costs. Over time, prevention has progressed from identifying populations at risk to personalizing risk estimates (
1,
2). This movement is due to several factors. In particular, improvements in technology have made it easier to generate advanced statistical prediction models. As a result, risk calculators have been developed that provide an effective way of teasing apart individuals with the highest probability of illness who require the most aggressive intervention from those who need minimal treatment (
3). A number of risk calculators are now freely available to the public and can estimate the risk, for example, of prostate, ovarian, breast, pancreatic, and colorectal cancers (
4–
6), type 2 diabetes (
7,
8), and cardiovascular disease (
9–
11). Surprisingly, risk calculators for mental health conditions are almost nonexistent, even though serious mental illness costs the United States $193.2 billion in lost earnings per year (
12).
Over the past 20 years, considerable progress has been made in formalizing and refining the prediction of psychosis. Advances include operationalizing and validating clinical criteria that identify individuals considered to be prodromal or at clinical high risk for psychosis, nearly 30% of whom will develop a psychotic illness over 2 years (
13). In addition, the publication of sophisticated multivariable models that predict psychosis with a wide set of risk factors (
14–
19) has also provided further steps toward enhancing prediction. However, progress has been less rapid in determining how to individualize prediction and determine the probability of psychosis risk on a case-by-case basis. At present, in clinical settings, a mental health professional can derive a general estimate of risk of psychosis for a given patient from the presence of traditional risk factors. However, unlike in other fields of medicine, there is no widely available tool for calculating a more precise estimate of risk for a help-seeking or referred individual.
In an article published concurrently with this one, Cannon and colleagues (
20), as part of the second phase of the North American Prodrome Longitudinal Study (NAPLS-2) (
21), report on a risk calculator for the individualized prediction of a psychotic disorder over a 2-year period. This prediction tool represents a potential breakthrough for early intervention in psychiatry. However, as with any predictive analytic model, its performance must be validated in samples of clinical high-risk patients collected independently of NAPLS-2 (
22). In the present study, we evaluated the performance of the NAPLS-2 risk calculator in an external, independent sample of individuals at clinical high risk for psychosis collected as part of the Early Detection, Intervention, and Prevention of Psychosis Program (EDIPPP).
The EDIPPP project was a large nationwide clinical trial designed to examine the effectiveness of Family-Aided Assertive Community Treatment (
23,
24) in preventing the onset of psychosis (
25). Over the span of 3 years (September 2007 through June 1, 2010), EDIPPP recruited a large sample (N=337) of adolescents and young adults who were at risk or were in the very early stages of a major psychotic disorder.
The EDIPPP sample offers the opportunity for a clear test of the applicability of the NAPLS-2 risk calculator to an independent sample of clinical high-risk subjects, as the two projects had key differences in goals, recruitment strategies, and ascertainment criteria. The EDIPPP project established a community education and outreach network at six urban and rural sites across the United States with the goals of raising awareness of the early warning signs of psychosis and demonstrating the effectiveness of community outreach, education, and early referral, combined with the Family-Aided Assertive Community Treatment intervention, to reduce the incidence of psychosis (
26). In EDIPPP, allocation to treatment was based on clinical risk (higher versus lower), which was determined by a cutoff of 7 on the total attenuated positive symptom severity score as specified by the Scale of Prodromal Symptoms from the Structured Interview for Prodromal Syndromes (SIPS) (
25–
28). EDIPPP participants thus had a wide range of attenuated positive symptom severity levels, from no symptoms in the prodromal range to positive symptoms that reached the threshold for psychosis. Because the original EDIPPP sample was categorized differently, this external validation sample was reconfigured to match the NAPLS-2 intake criteria, which includes all subjects meeting the criteria for prodromal syndromes as defined by the SIPS (
29–
31).
In the present study, our aim was to assess the predictive ability of the NAPLS-2 psychosis risk calculator in an external, independent sample of patients at clinical high risk for psychosis. Thus, in this report, we refer to the NAPLS-2 sample as the development sample and the EDIPPP sample as the external validation sample. Given that the predictors in the NAPLS-2 calculator were based on theoretical considerations, we first evaluated the predictive ability of the components of the NAPLS-2 model. Six key predictor variables used in the NAPLS-2 psychosis risk calculator were used in the external validation sample to construct a new model predicting psychosis. Second, we assessed the performance of the NAPLS-2 model by evaluating the predictive accuracy of the risk calculator when applied to the external validation sample. Evaluating the performance of the risk calculator in different clinical high-risk samples and settings than those used to initially test the model can further support its empirical validity and clinical utility prior to its widespread use (
22).
Method
The data reported here were collected as part of EDIPPP, a large multisite clinical trial for preventing psychosis among young people, funded by the Robert Wood Johnson Foundation (2007–2011) (
25–
27). EDIPPP consisted of six participating sites: Portland Identification and Early Referral, Portland, Maine; the Recognition and Prevention Program, Zucker Hillside Hospital, Glen Oaks, N.Y.; the Michigan Prevents Prodromal Progression Program, Ann Arbor, Mich.; the Early Assessment and Support Team Program, Salem, Ore.; the Early Diagnosis and Preventive Treatment Clinic, Sacramento, Calif.; and Early Assessment and Resource Linkage for Youth, Albuquerque, N.M. Details of the study design, study implementation, assessments, psychosocial and pharmacological treatments, methods, and sample characteristics have been reported elsewhere (
25,
26). Although the Zucker Hillside Hospital site was part of both NAPLS-2 and EDIPPP, the present analyses included only four overlapping subjects, none of whom converted to psychosis and whose outcomes did not have an impact on the study findings.
Clinical high-risk subjects met criteria for one of the three prodromal syndromes based on the SIPS (
29–
31): attenuated positive symptom syndrome, with the presence of one or more moderate, moderately severe, or severe attenuated positive symptoms (scores of 3, 4, or 5 on the Scale of Prodromal Symptoms, on a scale of 0–6); genetic risk and deterioration syndrome, with genetic risk for psychosis coupled with deterioration in functioning; and brief intermittent psychotic syndrome, with intermittent psychotic symptoms that are recent, brief in duration, and not seriously disorganizing or dangerous.
A total of 210 clinical high-risk subjects were included in the validation sample; 92% had attenuated positive symptom syndrome, 6.3% had brief intermittent psychotic syndrome, and 1.7% had genetic risk and deterioration syndrome. In addition to these subjects, the EDIPPP sample included 32 early first-episode psychosis and 95 low-risk comparison subjects, who were excluded from the present study.
The EDIPPP study included participants 12–25 years old. Exclusion criteria for the study were a current or previous frank psychotic episode; treatment with antipsychotic medication for ≥30 days at a dosage appropriate for treating a psychotic episode; an IQ <70; permanent residence outside the catchment area; lack of fluency in English; current incarceration; and psychotic symptoms due to an acute toxic or medical cause.
Patients age 18 or older provided written informed consent; for patients under 18, the parents provided written informed consent and the patient provided written assent. The research protocol was approved by all sites’ institutional review boards.
Baseline Assessments
Details of the baseline clinical assessment have been reported previously (
25). Prodromal symptoms were assessed by the SIPS and the companion Scale of Prodromal Symptoms (
29–
31). Social and role functioning was assessed using the Global Functioning: Social and Global Functioning: Role scales (
32). In addition to several other clinical measures, the baseline assessment included the Measurement and Treatment Research to Improve Cognition in Schizophrenia Consensus Cognitive Battery (
33). The present analyses utilized data from two of the tests—the symbol coding subtest of the Brief Assessment of Cognition in Schizophrenia (BACS) (
34) and the Hopkins Verbal Learning Test–Revised (
35).
Clinical Outcome
Of the initial 210 clinical high-risk subjects, 176 (83.8%) had at least one follow-up assessment. Of those, 12 (6.8%) transitioned to psychosis over 2 years of follow-up (
25). Conversion to psychosis was defined according to the Presence of Psychosis Scale criteria on the SIPS: developing any psychotic-level-intensity positive symptom (score of 6) that is sustained for at least an hour per day, at an average of 4 days per week over 1 month, or demonstrating seriously disorganized or dangerous behavior. The mean follow-up period (time to conversion to psychosis or last follow-up) was 99.29 weeks (SD=21.51; median=106.00).
Statistical Analysis
All analyses were conducted using SPSS Statistics 20.0 (IBM, Armonk, N.Y.). Comparisons of demographic and clinical characteristics were performed with Student’s t tests for continuous variables and chi-square tests for categorical variables (two-tailed, p<0.05). Overall, 2.4% of the data (25 of 1,056 values) were missing. Among participants followed, missing values were imputed using mean values for scores on the BACS symbol coding test and the Hopkins Verbal Learning Test–Revised (missing one value each), and modal values for family history (missing 14 values) and decline in Global Functioning: Social scale (missing nine values) prior to use in prediction analyses.
The external validation analysis was carried out in several steps. First, a multivariable Cox proportional hazards regression model was used to estimate hazard ratios and 95% confidence intervals for risk of conversion to psychosis in the external validation sample. We evaluated the ability of six predictor variables used in the NAPLS-2 psychosis risk calculator to predict psychosis: baseline age; severity of SIPS items P1 and P2 (unusual thought content and suspiciousness) recoded (
18); raw score on the BACS symbol coding test; the sum of trials 1–3 on the Hopkins Verbal Learning Test–Revised, (
35); a decline in social functioning in the year prior to the baseline assessment, measured using the Global Functioning: Social scale (
32,
36,
37); and having a first-degree relative with a psychotic disorder (critical alpha, 0.05). Predicted probabilities of risk (based on the cumulative hazard function) were computed for each subject in the external validation sample. Trauma and life events, which were not significant in the development sample, were not included.
Discrimination performance (ability of the model to correctly distinguish between outcomes) was assessed for all models by the area under the receiver operating characteristic curve (AUC, equivalent to Harrell's c-statistic) (
38,
39). The NAPLS-2 calculator was then used to generate risk estimates for each case in the external validation sample. The gamma statistic was used to examine the agreement between the predicted levels (classes) of risk as predicted by the NAPLS calculator when applied to EDIPPP cases and the predicted risk classes generated by the EDIPPP validation model (
40). The gamma statistic ranges from −1 (perfect negative association) to +1 (perfect agreement), with a value of 0 indicating no association. Precision and bias of the NAPLS-2 calculator were assessed with the Brier score and mean prediction error, respectively. The Brier score is the mean squared difference between the observed outcome and the predicted risk score (
41). Mean prediction error was calculated as the difference between the risk estimates generated by the external validation model and the NAPLS-2 psychosis risk calculator, providing a metric for the tendency for over- or underestimation of risk (
42). For both measures, a lower score indicates higher precision and less bias; ≤15% was considered to be an acceptable level (
43). Spearman’s rho correlation analysis was also used to examine the correspondence between the risk estimates generated by both models. Finally, the diagnostic accuracy (sensitivity, specificity, positive predictive value, negative predictive value) of the NAPLS-2 calculator was examined across different levels of predicted risk.
Results
Table 1 summarizes baseline demographic and clinical data for the clinical high-risk sample. There was no difference between subjects with and without follow-up on any major demographic or clinical variable, including baseline age, gender, race, and education level. Clinical high-risk subjects in the validation sample had a mean age of 16.6 years (SD=3.3), and majorities were male (58.5%) and white (64.1%). The populations in the external validation (EDIPPP) sample and the development (NAPLS-2) sample were similar on most of the major demographic and clinical features. However, patients in the external validation sample were markedly younger on average than those in the development sample (mean ages of 16.6 years and 18.5 years, respectively), most likely because cases were referred from school-based sources.
Table 2 presents the regression model constructed with the validation sample that includes the six key variables selected from the NAPLS-2 psychosis risk calculator. The overall model was significant when all six independent variables were entered simultaneously (χ
2=19.68, df=6, p=0.003). The base model of SIPS items P1 and P2 showed an acceptable discrimination performance, with an AUC of 0.67. Combining the additional five variables with SIPS items P1 and P2 increased the AUC by 0.12, resulting in an AUC of 0.79 (95% CI=0.644–0.937, p=0.001), indicating good discrimination performance (
Figure 1). Scores on the neurocognitive tests and baseline age were associated with the largest increases in the AUC when added to the base model (i.e., SIPS items P1 and P2).
As also shown in
Table 2, in terms of individual variables, score on SIPS items P1 and P2 and baseline age bordered on statistical significance (p=0.05), while scores on the neurocognitive tests (the symbol coding test and the Hopkins Verbal Learning Test–Revised) only approached significance (p<0.10). A decline in social functioning and having a first-degree relative with psychosis were not significant predictors of psychosis in the validation sample.
The NAPLS-2 risk calculator was then used to provide probability estimates of conversion to psychosis for each individual in the external validation sample. Both the mean prediction error and the Brier score were at acceptable levels (<15%) (Brier score=7.5%, SD=15.8; mean prediction error=9.5%, SD=12.14), suggesting that the NAPLS-2 calculator provided a reasonable estimation of psychosis risk when comparing the risk prediction generated by the validation model compared with observed outcomes.
In addition, the risk estimates generated by the external validation model and the NAPLS-2 psychosis risk calculator were strongly correlated (rs=0.66, p<0.001), suggesting correspondence between the predicted risks of both models. There was also strong agreement between the predicted levels of risk generated by the NAPLS-2 calculator and the external validation model (gamma=0.7, p<0.001).
Table 3 summarizes the performance of the NAPLS-2 calculator when applied to the external validation sample across increasing levels of model-predicted risk. The sensitivity and specificity values for each threshold were comparable to those observed in the development sample. For example, 10% model predicted risk provided a sensitivity of 91% and a specificity of 37% with the external validation sample, compared with a sensitivity of 94.1% and a specificity of 23.6% in the development sample. A model-predicted risk of 20% provides a better balance between sensitivity and specificity levels at 58.3% and 72.6%, respectively, which is again similar to the development model (66.7% sensitivity and 72.1% specificity).
Discussion
This study represents a critical external validation of the first major risk calculator developed in the field of mental health to estimate the probability that a given individual will develop a psychotic disorder within a 2-year period. Six risk factors from the NAPLS-2 calculator—baseline age, unusual thought content and suspiciousness, family history of a psychotic disorder, verbal learning, processing speed performance, and social decline—were able to distinguish individuals who developed psychosis from those who did not with a good degree of accuracy in the EDIPPP validation sample. In addition, there was good agreement between the risk prediction from the NAPLS-2 model and observed outcomes in the EDIPPP sample. Thus, this novel approach to constructing a psychosis prediction model using theoretical predictors has now been validated in two independent clinical high-risk samples: the development sample initially used to test the theoretical model (NAPLS-2) and an external, independent clinical high-risk sample (EDIPPP). To the best of our knowledge, this type of validation has not been performed on a psychosis prediction model. It provides a critical first step in the introduction of the NAPLS psychosis risk calculator for widespread use.
Given the availability of risk calculators for numerous medical conditions, it is somewhat surprising that it has taken so long for a tool for a psychiatric disorder to be developed. Lack of replications of well-performing models and complex biological findings with limited clinical applicability may have contributed to this delay (
44). Our findings highlight the importance of building a predictor model that includes a set of theoretically derived risk factors that have strong ties to vulnerability to the disease and can easily be applied in a clinical setting. The performance of the independent prediction model built with the EDIPPP sample using the risk factors included in the NAPLS-2 calculator showed good discrimination ability, comparable with that of the original development cohort; the overall model accuracy rates were 79% and 71%, respectively. Moreover, for the range of predicted risks that are adequately represented, the sensitivity and specificity for the levels of predicted risk generated by applying the NAPLS-2 calculator to the EDIPPP sample corresponded to levels of predicted risk of the development model seen in the Cannon et al. study (
20). The values for positive predictive value in the present study were lower, however, than those observed in the development sample because of the lower conversion rate (i.e., prevalence) in the validation sample.
The discrimination accuracy of the base model (SIPS items P1 and P2) was improved by almost 12% with the addition of the other four variables. This provides further evidence that a combination of variables can discriminate among patients in clinical high-risk samples better than any individual predictor. It also potentially protects against type II error (i.e., missing a true difference with a smaller number of factors). In contrast to the development model, social functioning decline and scores on neurocognitive tests were not significant predictors of psychosis in the external validation sample. This may be related to sample size differences between the validation and development cohorts rather than the inability of any single risk factor to predict psychosis. The younger age of this sample and a higher proportion of school-based ascertainment could also account for the lower predictive power of social functional decline, since the younger participants would have had less time to deteriorate. Overall, these results suggest that the performance of the model is driven by the six risk factors working in concert to predict psychosis.
Interpreting Psychosis Risk
In continuing to establish the validity of the risk calculator, a number of additional issues and caveats must be considered. First, and perhaps foremost, the calculator should be used only by mental health professionals trained to a reliability standard in identifying the prodromal syndromes criteria with the SIPS/Scale of Prodromal Symptoms and administering neuropsychology and other clinical measures with good reliability. Second, as with any medical risk calculator, psychosis risk estimates provide a relative probability that illness will develop in the future, but not inevitability. Third, there is a risk of an inaccurate prediction. To mitigate this risk, in addition to reporting the exact estimate, the clinician should discuss with the patient the cost-benefit ratio of treatment in terms of the different levels of risk as shown in
Table 3. This discussion of the risk-benefit ratio should be part of the first step in an informed decision-making process between clinician and patient (
45). Risk should not be overestimated, but at the same time, these estimates convey important information; for example, the difference between risks of 5% and 25% should have a meaningful impact on treatment decisions (
46). Fourth, and of particular relevance to intervention, treatment recommendations should take into account the possible adverse effects, as well as the magnitude of the estimate (i.e., high versus low risk) and the particular predictor(s) that are driving the estimate. Low-risk individuals, for example, can be offered a less invasive treatment. Individuals with substantial neurocognitive difficulties could be offered, for example, cognitive training (
47). On the other hand, high-risk individuals or those with more severe positive symptoms would potentially be offered more aggressive intervention, possibly involving medications. Finally, since the field of psychosis prevention is constantly evolving, an accurate assessment of risk requires constant updating to take into account other factors not included in the prediction model that may alter the balance of risks and benefits (
1).
Next Steps: Integration Into Clinical Practice
The NAPLS-2 psychosis risk calculator represents a major advance toward achieving the goal of personalized medicine in psychiatry. According to the guidelines reported by McGinn et al. (
22), four steps are involved in establishing a validated predictive tool and decision rules for use in clinical practice: selecting variables (level 4), validation at a single site or in a small prospective sample (level 3), validation at different sites (level 2), and then evaluation of impact on clinical practice (level 1). Our findings provide preliminary evidence for level 2 validation according to this schema. The critical last step in recommending the psychosis risk calculator (level 1 validation) in a wide variety of settings would require an analysis of the impact of the tool in clinical practice (
22), which would involve demonstrating that the prediction tool can both change clinician behavior and benefit patient outcomes.
Limitations
Our findings should be considered in the context of certain limitations. First, although we found strong evidence of the applicability of the NAPLS-2 psychosis risk calculator in an independent clinical high-risk sample, the performance of the risk calculator needs further replications. The performance of the model should be evaluated on subcohorts (e.g., sociodemographically or clinically defined) and longer-term outcomes. In addition, the calculator should be continually updated and fine-tuned as new biological markers emerge (
48). It should be noted that the optimal range for maximizing sensitivity and specificity lies between 15% (75.0% sensitivity, 58.5% specificity) and 25% (50.0% sensitivity, 81.7% specificity). However, additional prospective studies using the risk calculator to predict later illness are needed to validate this range as the appropriate target for intervention. Second, it is unclear how the NAPLS-2 risk calculator would perform on clinical high-risk cohorts obtained outside of North America, such as in populations recruited in the European and Australian high-risk projects. Finally, as with any risk calculator, the accuracy of the psychosis risk estimate is dependent on valid and accurate data.
Conclusions
The data reported in this study have shown that the performance of the NAPLS-2 psychosis risk calculator, available on the Internet and incorporating a set of theoretically derived risk factors, can be replicated in a separate, independently collected clinical high-risk population. Although further replication is needed, at present the risk calculator appears to have considerable potential for determining the probability that an individual will develop psychosis, and it may provide a foundation for the personalized treatment of clinical high-risk individuals.