Full access

Priority Data Letter

Published Online: 4 October 2023

Are We There Yet? Predicting Conversion to Psychosis Using Machine Learning

Jason Smucny, Ph.D. [email protected], Ian Davidson, Ph.D., and Cameron S. Carter, M.D.Authors Info & Affiliations

Publication: American Journal of Psychiatry

https://doi.org/10.1176/appi.ajp.20220973

The ability to accurately predict conversion to psychosis from clinical and other measurable features of an at-risk state is critically important to clinicians. To demonstrate clinical utility, these models should predict conversion with at least 80% sensitivity and specificity (1) with concurrently high positive and negative predictive value (PPV/NPV). Over the past decade substantial progress has been made in this area with the development of “risk calculators,” which consider various demographic, clinical, and neurocognitive factors in addition to family history to predict future conversion (e.g., 2, 3). The most well-studied of these, the risk calculator based on the second North American Prodromal Longitudinal Study (NAPLS) 2-based calculator, achieved a 71% model concordance index (analogous to the area under a receiver operating characteristic curve) (2).

These encouraging results helped motivate the NAPLS-3 (4) study. The NAPLS-3 includes longitudinal measurements from 710 individuals at clinical high risk for psychosis and 96 age- and sex-matched healthy control participants (4). To our knowledge, the ability of the features specified in the NAPLS-2 calculator to predict conversion in the NAPLS-3 sample has not yet been evaluated. We thus examined the ability of these features as well as cortisol (assessed at baseline) to predict conversion in clinical high risk using various linear (e.g., Cox proportional hazards regression, logistic regression, support vector machine) and nonlinear (e.g., random forest) machine learning algorithms. We hypothesized that these features would predict conversion with performance in line with models from other data sets, with some variability depending on the machine learning algorithm. We also hypothesized that nonlinear machine learning methods would perform qualitatively better than linear machine learning methods due to their ability to model complex nonlinear relationships.

Participants

The NAPLS-3 is an NIMH-funded study conducted at nine sites. All participants provided written informed consent, including parental consent for minors. The study was approved by all sites’ institutional review boards.

A detailed description of NAPLS-3 participants (including exclusion criteria) is provided in Addington et al. (4). Briefly, 710 clinical high-risk individuals and 96 individuals in the healthy control group were recruited and followed for up to 2 years, with some longer exceptions (see Results). Participants were between 12 and 30 years old. Predictors included those used by the NAPLS-2 calculator (riskcalc.org/napls; see Table 1 for list). As a recent study found that salivary cortisol improved prediction in the NAPLS-2 (5), we examined models both with and without cortisol as a predictor. Participants in the healthy control group and clinical high-risk participants who lacked follow-up data were not included in machine learning models.

TABLE 1. Healthy control group and clinical high-risk group demographic and clinical information, excluding participants with missing data (except for cortisol)^a

HC Versus CHR	Healthy Control Group		Clinical High Risk Group		CHR Versus HC (t or χ²)	p	Missing From CHR Group (N)
	N		N
Starting sample	96		710
No follow-up data	8		76
Missing baseline data	3		36
Final sample	85		598
Sex					0.35	0.55	0
Male	43		323
Female	42		275
First-degree relative with psychosis^b					17.61	<0.001	9
Yes	1		117
No	84		481
	Mean	SD	Mean	SD
Age in years^b	19.20	4.31	18.79	4.03	0.86	0.39	0
BACS symbol coding raw score^b	60.61	10.72	54.74	13.21	4.58	<0.001	15
HVLT raw score^b	27.60	4.2	26.39	5.21	2.05	0.041	16
Number of trauma types^b	0.64	1.12	1.87	1.61	−8.91	<0.001	7
Decrease in global social functioning score over the past year^b	0.58	0.78	1.03	0.97	−4.14	<0.001	0
Number of undesirable life events^b	5.87	4.19	9.56	4.86	−7.44	<0.001	12
SIPS delusions plus suspicions^b	0.44	0.81	6.80	1.87	−54.65	<0.001	0
Salivary cortisol μg/dL^b	0.12	0.08	0.14	0.13	−1.84	0.067	11
Days from baseline to conversion			278.0	286.1
	N		N
Converters	0		62		9.69	0.002	0
Non-converters	85		536

T and chi-square tests comparing converters versus non-converters were not performed as this would constitute a circular analysis vis-à-vis the machine learning-based modeling. Numbers in parentheses represent the standard deviation unless otherwise stated. BACS=Brief Assessment of Cognition in Schizophrenia; HVLT=Hopkins Verbal Learning Test; SIPS=Structured Interview for Psychosis-Risk Syndromes.

Included as features in Cox regression and machine learning models.

Consistent with prior work (6), conversion to psychosis was defined as meeting the Presence of Psychotic Symptoms criteria: one of the five SIPS Scale of Psychosis-Risk Symptoms positive symptoms must reach a psychotic level of intensity (rated 6) for ≥1 hour per day for 4 days per week during the past month in the clinical high-risk individual, and/or the clinical high-risk person must show that these symptoms seriously impact their functioning.

Analyses

First, as performed previously (2), a Cox proportional-hazards regression analysis was performed using these predictors (SAS v.9.4) to examine consistency with prior NAPLS-2 findings.

For machine learning, standard algorithms were employed using Weka software (University of Waikato, New Zealand) and included logistic regression, naive Bayes, a three kernel support vector machine, KStar, J48 decision tree, random forest, decision stump (with 100 iterations of AdaBoost), and multilayer perceptron. Classifier accuracies were calculated by averaging performance across 100 random assortments of 90% training data and 10% test data for each algorithm. Individuals with missing data were excluded from analysis. Because of class imbalance, prior to machine learning training data for the minority (converter) class was upsampled using the Synthetic Minority Oversampling Technique (SMOTE) (7). Due to class imbalance (see Results), the minority class was 400% oversampled in the present study, with k (number of nearest neighbors) set to five. We also determined feature importance ranking for the best classifier based on contributions to receiver operating characteristic area under the curve.

Results

Demographic and clinical information for participants (including the healthy control group) is provided in Table 1 and Table 2. As previously reported (8), relative to the healthy control group, clinical high-risk participants had lower Brief Assessment of Cognition in Schizophrenia (BACS) symbol coding and Hopkins Verbal Learning Test (HVLT) scores, more trauma, greater decrease in social functioning over the past year (i.e., prior to baseline), more undesirable life events, higher SIPS delusions plus suspicions score, and greater salivary cortisol. A higher percentage of clinical high-risk participants also had a first-degree relative with psychosis.

TABLE 2. Converter versus non-converter demographic and clinical information, excluding participants with missing data (except for cortisol)^a

Converter Versus Non-Converter	Converter Mean (N=62)		Non-Converter Mean (N=536)		Missing From Converters (N)	Missing From Non-Converters (N)
	N		N
Sex					0	0
Male	35		288
Female	27		248
First-degree relative with psychosis^b					3	6
Yes	15		102
No	47		434
	Mean	SD	Mean	SD
Age (years)^b	19.31	4.04	18.73	4.02	0	0
BACS symbol coding raw score^b	50.29	15.64	55.26	12.82	4	11
HVLT raw score^b	24.87	5.89	26.57	5.10	5	11
Number of trauma types^b	1.90	1.89	1.86	1.57	2	5
Decrease in global social functioning score over past year^b	1.11	1.04	1.02	0.97	0	0
Number of undesirable life events^b	9.35	5.94	9.58	4.72	3	9
SIPS delusions plus suspicions^b	7.61	1.74	6.71	1.87	0	0
Salivary cortisol (μg/dL)^b	0.13	0.09	0.15	0.13	3	8

Included as features in Cox regression and machine learning models.

Examining conversion rates, out of 598 clinical high-risk participants with complete data, 62 converted and 536 did not over the course of the follow-up period. The average time from baseline to conversion was 278 days, with a range of 4 to 1,361 days. Four clinical high-risk individuals converted more than 2 years after their baseline assessment.

Results of the Cox regression analysis without cortisol suggested that the overall model was significant (likelihood ratio χ²=26.04, p=0.001, Harrell’s concordance index=0.70 [SE=0.04], mean specificity [across time]=0.67, mean sensitivity=0.62, mean PPV=0.15, mean NPV=0.95). Including cortisol did not substantially improve the model (likelihood ratio χ²=24.93, p=0.003, Harrell’s concordance index=0.70 [SE=0.03], mean specificity=0.54, mean sensitivity=0.75, mean PPV=0.14, mean NPV=0.96).

Machine learning performance metrics for each machine learning algorithm are provided in Table 3. Briefly, all models performed significantly above chance. The algorithm that showed the best overall performance was random forest. Including cortisol as a predictor did not appreciably alter performance metrics of most algorithms. Feature importance in order of greatest to lowest for the random forest algorithm was as follows: baseline SIPS P1 and P2 (delusions plus suspiciousness), HVLT raw score, number of undesirable life events, number of trauma types, BACS symbol coding raw score, decrease in global social functioning over the past year, age, having a first-degree relative with psychosis, and cortisol.

TABLE 3. Predicting conversion to psychosis from baseline NAPLS-3 clinical/demographic data and cortisol levels using various machine learning methods^a

Method	Accuracy	95% CI	Sensitivity	95% CI	Specificity	95% CI	PPV	95% CI	NPV	95% CI
Without cortisol
Random forest	90	89–90	79	78–80	96	95–96	92	91–92	89	88–89
J48 decision tree	85	84–86	76	75–77	90	89–91	82	81–83	87	86–87
KStar	78	78–79	77	76–78	79	79–80	69	68–70	86	85–86
SVM (3 Kernel)	74	74–74	41	40–41	93	93–94	78	77–80	73	73–73
MLP^b	73	72–74	48	45–50	88	87–89	70	69–72	75	74–75
Decision stump with AdaBoost	71	71–72	43	40–46	88	87–89	68	66–70	73	72–74
Naive Bayes	70	69–70	45	45–46	84	83–84	62	61–62	73	72–73
Logistic regression	68	67–68	35	34–35	87	86–87	61	60–62	70	70–70
With cortisol
Random forest	89	89–90	76	75–77	97	97–97	94	93–94	88	87–88
J48 decision tree	84	83–85	74	72–76	90	89–90	81	80–82	86	85–87
Decision stump with AdaBoost	75	74–76	57	56–58	85	84–87	70	68–72	78	78–79
KStar	75	75–76	77	76–78	75	74–75	64	63–64	85	85–86
SVM (3 Kernel)	72	72–72	37	36–38	92	91–92	73	72–75	72	72–72
MLP^b	71	70–72	54	51–58	81	79–82	63	61–64	76	75–77
Logistic regression	66	66–67	30	29–31	87	87–87	58	56–59	69	69–69
Naive Bayes	65	64–65	56	55–57	70	69–70	52	51–52	74	73–74

Values are percentages. Methods are sorted from highest to lowest accuracy. Confidence intervals are measured over 10 repetitions of random 90% training/10% test data allocations. NPV=negative predictive value; PPV=positive predictive value.

Multilayer perceptron (MLP) with two hidden layers (five nodes in the first and two in the second).

Discussion

As expected and previously reported (8), clinical high-risk participants in the NAPLS-3 had a greater percentage of first degree relatives with psychosis, worse neurocognition, more trauma and deleterious life events, greater decrease in social functioning prior to baseline, and higher levels of psychotic symptoms compared to the healthy control group. Clinical high-risk participants also had higher cortisol, possibly indicative of greater chronic stress levels compared to the healthy control group. Cox regression performance was comparable to previous clinical high risk studies (2, 9). Logistic regression performance (66%–68% accuracy, depending on inclusion of cortisol) was in line with prior studies (2, 3, 5, 9–13). All machine learning algorithms performed above chance, with accuracies 65% and higher. As hypothesized, linear methods (Cox regression, logistic regression, support vector machine) showed worse performance compared with most nonlinear methods (e.g., random forest). Furthermore, the highest performing algorithm (random forest with or without cortisol) achieved ∼90% accuracy while maintaining >75% sensitivity and >85% specificity, PPV, and NPV. Baseline SIPS delusions plus suspiciousness score was found to be the most important predictor.

Although it was expected that all algorithms would perform better than chance at predicting conversion to psychosis in clinical high-risk individuals, it was somewhat surprising to find that the best algorithm (random forest) performed at such a high level given that previous studies suggest that these features predict conversion with accuracies (or metrics related to accuracy, e.g., concordances) between ∼70% and 80% (2, 3, 5, 9–13). Notably, however, the majority of these studies used regression-based modeling to predict conversion (logistic regression performed worse than most other methods in this study), and no studies used the random forest algorithm. What aspects of the random forest may have enhanced performance to this degree? First, unlike most classifiers, a random forest is an “ensemble” classifier, in which the predictions (converter or non-converter) of several decision trees are tallied and the majority vote is used to make an overall prediction (14). These individual trees are comprised of random combinations of features, such that each tree makes its vote independent of and decorrelated from all others. The decision boundary induced by a random forest is therefore highly nonlinear compared with some other methods (e.g., logistic regression). Because not all the features are used in each tree, the random forest is relatively immune to the “curse of dimensionality,” where increasing the number of features causes overfitting unless the sample size is also exponentially increased in parallel. Averaging the votes of decision trees also helps reduce the overall variance. As the generalizability of this performance enhancement is unclear, an interesting future direction would be to apply the random forest algorithm to predict conversion in clinical high-risk individuals using other data sets (e.g., the NAPLS-2).

Limitations of the present analyses were the small sample size (particularly for converters) and heterogeneity of sample outcome (time to conversion ranged from 4 to 1,361 days). The imbalanced data set also necessitated the use of a minority class oversampling procedure (SMOTE) to prevent models from defaulting to predict the majority class (results without SMOTE showed poor sensitivity and PPV [data not shown]). The converter/non-converter distribution for training models in this study may not be representative of the general clinical high-risk population. Our result also requires replication in an independent data set to determine if overfitting occurred during machine learning as a result of SMOTE. Overall, however, the relatively high level of performance of random forest and other methods suggests that when features selected from previous, independent studies are combined with modern machine learning methods, performance levels of clinical outcome prediction may approach the performance standards needed for a predictive biomarker that provides early identification of individuals likely to transition to psychosis. Provided these results can be replicated in other clinical high risk data sets, researchers can thus begin searching for the primary causes of this transition while preparing for delivery of palliative care. In the context of study limitations, when asking “are we there yet?” in regard to the development of predictive biomarkers for psychiatric practice, the answer may be, “We’re on the way, but we need more data.”

References

First MB, Drevets WC, Carter C, et al: Clinical applications of neuroimaging in psychiatric disorders. Am J Psychiatry 2018; 175:915–916

Crossref

PubMed

Google Scholar

Cannon TD, Yu C, Addington J, et al: An individualized risk calculator for research in prodromal psychosis. Am J Psychiatry 2016; 173:980–988

Crossref

PubMed

Google Scholar

Fusar-Poli P, Rutigliano G, Stahl D, et al: Development and validation of a clinically based risk calculator for the transdiagnostic prediction of psychosis. JAMA Psychiatry 2017; 74:493–500

Crossref

PubMed

Google Scholar

Addington J, Liu L, Brummitt K, et al: North American Prodrome Longitudinal Study (NAPLS 3): methods and baseline description. Schizophr Res 2022; 243:262–267

Crossref

PubMed

Google Scholar

Worthington MA, Walker EF, Addington J, et al: Incorporating cortisol into the NAPLS2 individualized risk calculator for prediction of psychosis. Schizophr Res 2021; 227:95–100

Crossref

PubMed

Google Scholar

Addington J, Liu L, Buchy L, et al: North American Prodrome Longitudinal Study (NAPLS 2): the prodromal symptoms. J Nerv Ment Dis 2015; 203:328–335

Crossref

PubMed

Google Scholar

Nitesh V, Bowyer KW, Hall LO, et al: Synthetic minority over-sampling technique. J Artif Intelligence Res 2002; 16:321–357

Crossref

Google Scholar

Zaks N, Velikonja T, Parvaz MA, et al: Sleep disturbance in individuals at clinical high risk for psychosis. Schizophr Bull 2022; 48:111–121

Crossref

PubMed

Google Scholar

Carrión RE, Cornblatt BA, Burton CZ, et al: Personalized prediction of psychosis: external validation of the NAPLS-2 psychosis risk calculator with the EDIPPP project. Am J Psychiatry 2016; 173:989–996

Crossref

PubMed

Google Scholar

10.

Fusar-Poli P, Werbeloff N, Rutigliano G, et al: Transdiagnostic risk calculator for the automatic detection of individuals at risk and the prediction of psychosis: second replication in an independent National Health Service Trust. Schizophr Bull 2019; 45:562–570

Crossref

PubMed

Google Scholar

11.

Puntis S, Oliver D, Fusar-Poli P: Third external replication of an individualised transdiagnostic prediction model for the automatic detection of individuals at risk of psychosis using electronic health records. Schizophr Res 2021; 228:403–409

Crossref

PubMed

Google Scholar

12.

Oliver D, Wong CMJ, Bog M, et al: Transdiagnostic individualized clinically-based risk calculator for the automatic detection of individuals at-risk and the prediction of psychosis: external replication in 2,430,333 US patients. Transl Psychiatry 2020; 10:364

Crossref

PubMed

Google Scholar

13.

Moore TM, Calkins ME, Rosen AFG, et al: Development of a probability calculator for psychosis risk in children, adolescents, and young adults. Psychol Med 2021:1–9

Google Scholar

14.

Breiman L: Random forests. Machine Learn 2001; 45:5–32

Crossref

Google Scholar

Information & Authors

Information

Published In

American Journal of Psychiatry

Volume 180 • Number 11 • November 1, 2023

Pages: 836 - 840

PubMed: 37789742

History

Received: 29 November 2022

Revision received: 13 March 2023

Revision received: 20 April 2023

Revision received: 30 May 2023

Accepted: 5 June 2023

Published online: 4 October 2023

Published in print: November 1, 2023

Keywords

Authors

Details

Jason Smucny, Ph.D. [email protected]

Department of Psychiatry, University of California, Davis (Smucny, Carter); Department of Computer Science, University of California, Davis (Davidson).

View all articles by this author

Ian Davidson, Ph.D.

Department of Psychiatry, University of California, Davis (Smucny, Carter); Department of Computer Science, University of California, Davis (Davidson).

View all articles by this author

Cameron S. Carter, M.D.

Department of Psychiatry, University of California, Davis (Smucny, Carter); Department of Computer Science, University of California, Davis (Davidson).

View all articles by this author

Notes

Send correspondence to Dr. Smucny ([email protected]).

Competing Interests

The authors thank the NAPLS-3 study investigators and research participants for making these data available for analysis on the NIMH Data Archive.

Competing Interests

The authors report no financial relationships with commercial interests.

Funding Information

This work was supported by NIMH grants K01-MH125096 and R01-MH122139.

Metrics & Citations

Metrics

Citations

Export Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.

Format	RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks
Direct importt
Citation style
Style

Copy to clipboard
Tips for downloading citations

View Options

View options

PDF/EPUB

View PDF/EPUB

Login options

Already a subscriber? Access your subscription through your login credentials or your institution for full access to this article.

Personal login Institutional Login Open Athens login

Purchase Options

Purchase this article to access the full text.

PPV Articles - American Journal of Psychiatry

Not a subscriber?

Subscribe Now / Learn More

PsychiatryOnline subscription options offer access to the DSM-5-TR^® library, books, journals, CME, and patient resources. This all-in-one virtual library provides psychiatrists and mental health professionals with key resources for diagnosis, treatment, research, and professional development.

Need more help? PsychiatryOnline Customer Service may be reached by emailing [email protected] or by calling 800-368-5777 (in the U.S.) or 703-907-7322 (outside the U.S.).

Are We There Yet? Predicting Conversion to Psychosis Using Machine Learning

Participants

Analyses

Results

Discussion

References