The decision to admit a child or youth to an acute psychiatric unit has both economic and personal consequences. Inpatient psychiatric units are among the most expensive and restrictive treatment options, and hospitalized youths are more likely to experience increased stigma (
1). Amid recent discussion of systems-of-care approaches for youths with psychiatric conditions (
2), and questions about the benefits of hospitalization, mobile crisis services have been increasingly recognized as a critical component of the service array (
3). Mobile crisis services decrease the rates of hospitalization and mental health spending per youth, increase the youth’s time in the community, and maintain safety (
4,
5). For mobile crisis services to function effectively, however, clinicians must be able to determine efficiently and accurately who requires hospitalization. Moreover, community-based crisis services must have the capacity to target acute needs of children and youths in psychiatric crisis as a means of reducing inpatient admission rates. To better understand the characteristics of children and youths who are in psychiatric crisis and who require higher levels of care, this study aimed to identify factors associated with clinician decision making in regard to admitting a child or youth to a psychiatric hospital.
Previous research on the relationship between psychiatric hospitalization and patient factors has included lifetime nonacute factors and imminent acute factors. As lifetime nonacute factors, increasing age and male gender have both been associated with increased hospitalization among youths (
6,
7), whereas studies on the influence of race-ethnicity on hospitalization have reported mixed outcomes (
8). Having one or more clinical diagnoses of, for example, mood or psychotic disorders has also been related to increased hospitalization (
6,
9). However, criteria for inpatient admission typically focus on imminent risk behaviors. Youths with severe emotional disturbance, psychosis, substance use, low cognitive functioning, conduct problems, nonsuicidal self-injury, or suicidal thoughts or behaviors are all at increased risk for psychiatric hospitalization (
10–
12). Although demographic, diagnostic, and clinical characteristics are all associated with a youth’s risk for hospitalization, whether these characteristics provide clinically meaningful information to aid in decision making in crisis settings remains unknown.
Limitations found in previous studies have prevented researchers from determining meaningful factors associated with increased risk for hospitalization among children and youths. First, many factors identified in previous studies were distal to the decision of inpatient admission. Historical factors may not have good predictive utility in aiding a clinician in deciding whether to hospitalize a youth. Second, retrospective billing data may not provide a clear temporal order of a youth’s behavior and subsequent hospitalization. Any associations identified in an analysis of such data could confound the reason for the acute hospitalization with the clinical diagnosis a youth received at the hospital. Third, earlier studies have primarily examined main or additive effects of predictors. This statistical approach follows the principle of parsimony and emphasizes parameter interpretability. However, factors may also interact with each other to create nonlinear, multiplicative increases in risk, which are usually not captured in simpler models. When traditional predictive models, such as linear regression, are applied to complex real-world data, statistical limitations may lead to weaknesses in predictive ability and accuracy.
A larger set of factors occurring before a psychiatric hospitalization need to be considered in providing clinically applicable risk estimates to aid in the decision to hospitalize a child or youth. Machine learning is one approach to modeling the complexity of multiple clinical assessments, and this approach has been successfully applied in other areas of medicine and is increasingly being used in mental health (
13–
15). The purpose of this retrospective study was to model clinicians’ decisions whether to hospitalize a child or youth in the setting of mobile crisis services. By using the random-forest model, we aimed to identify factors critical to this complex decision-making process. We addressed temporal limitations in the literature by using data available at the time of the hospitalization decision in this machine-learning approach.
Methods
Data
Data were deidentified for research use and abstracted from a state public mental health system’s electronic health record (EHR). The data consisted of 5,434 assessments by mobile crisis response teams (MCRTs) in two metropolitan areas in Nevada (Las Vegas and Reno) between January 2015 and May 2019. Assessments missing all demographic and clinical variables (N=648, 11.9%) were removed from this study, resulting in a final sample of 4,786 assessments. The completed assessments in the final sample represented 4,338 unique children and youths. Models were trained and validated on assessment-level data, with each crisis response considered a unique decision. Inclusion criteria for this data set were MCRT response to a hotline call and collection of systematic data. We specified no exclusion criteria. This study was approved by the institutional review board of the University of Nevada, Las Vegas.
Procedure and Measures
Demographic and clinical factors.
Master’s level MCRT clinicians completed the crisis assessment tool (CAT) (
16). The CAT is a standardized, semistructured assessment and was completed with the children and youths and their caregivers to assess risk behaviors (e.g., suicidal risk, danger to others, or runaway), current psychiatric symptoms (e.g., psychosis, impulsivity or hyperactivity, or depressive symptoms), functioning problems (e.g., at home or in the community), child protective services involvement, and caregiver needs and strengths (e.g., a caregiver’s health and involvement with care). The clinicians rated these factors on a scale of increasing severity or need from 0, no evidence of behavior or symptom/need; 1, history of behavior or symptom/watchful waiting or prevention; 2, behavior or symptom present but not acute or severe/action needed to meet need; to 3, acute or severe behavior or symptom present/immediate or intensive action needed to meet need. Previous studies have suggested that the CAT can be used to measure children’s mental health needs and clinical outcomes (
17,
18). The clinicians also provided
DSM-5 diagnoses. Psychiatric caseworkers collected demographic information about the children and youths, including gender, age, and race-ethnicity. The data set consisted of 54 demographic and clinical variables.
Psychiatric hospitalization.
After the evaluation, the MCRT consulted with a licensed clinical supervisor to determine whether the child or youth should be hospitalized, receive high-intensity stabilization from the MCRT, be connected to outpatient treatment resources, or did not need other services. Children and youths admitted to an inpatient unit were classified as hospitalized (N=638, 13%); all others were classified as not hospitalized (N=4,148, 87%).
Analytic Plan
All analyses were conducted in R (
19). Models were fit by using the caret package (
20). Among all available model algorithms, we chose “random forest” and specified the “ranger” method, which provided faster implementation of random forests (
21). Some EHR data were missing information on some variables because the clinician had not asked for the information, the record form was not completed, or the data had not been entered. Before the analysis, missing data on clinical variables were imputed by using the K-nearest–neighbor algorithm to include the most observations in the analyses (
20,
22). Random-forest models were then generated to explore which demographic or clinical factors best classified children and youths who were or were not hospitalized after the MCRT assessment.
We first trained random-forest algorithms to fine-tune the parameters and evaluate model performance. Models were trained with data from assessments completed between January 2015 and July 2018 (training sample, N=3,640) and were externally validated with data from assessments completed between August 2018 and May 2019 (testing sample, N=1,146).
Random-forest analyses consolidate multiple classification and regression trees (CART) into a single model to optimize accuracy of the prediction (
23). A single CART consists of an exhaustive search of all predictors and cutoff points to create a series of binary, recursive partitions that are based on an individual predictor’s ability to create homogeneous subgroups (i.e., nodes). However, a single tree-based CART algorithm is usually unstable, tends to generalize poorly, and overfits the training sample (
24). To overcome the instability of a single model, 500 trees were fit in the random-forest model, and each tree included repeated internal cross-validation by using 10 folds 10 times. By using repeated cross-validation, random-forest analyses repeatedly resampled the training data and predictor set to create training samples with unique predictors (
25,
26). (Details on model preprocessing are available in the
online supplement to this article.) Class imbalance in the criterion variable can result in models with good specificity but poor sensitivity (
27). Therefore, we also conducted analyses using the upsampling technique, which randomly duplicated cases of hospitalization (i.e., minority class) to correct for imbalances in the data (
20). The comparison between the uncorrected and expanded models was beyond the scope of this study (see the upsampling results in the
online supplement). The 20 most important variables were identified by using permutation importance, which was computed with out-of-bag data to measure the prediction strength of each variable (
28).
Several global performance statistics were used to evaluate model performance, including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), area under the curve (AUC), Cohen's kappa coefficient, and Brier score. (Descriptions of the range of each performance statistic are available in the online supplement.) Finally, we examined whether the best training model could be externally validated with an independent testing sample, because external validation provides a stronger test of generalization. Externally validated models were also evaluated by sensitivity, specificity, PPV, NPV, AUC, Cohen’s kappa coefficient, and Brier score.
Results
Descriptive Statistics
Table 1 presents the demographic characteristics of the children and youths in psychiatric crisis in the training and testing samples. The children and youths ranged in age from 4.0 to 19.5 years (mean±SD=14.0±2.7 years), and 56% were female. Approximately 62% (N=2,985) identified as White, 23% (N=1,081) as African American, 4% (N=210) as Asian American, 1% (N=65) as Pacific Islander or Native Hawaiian, 1% (N=28) as American Indian or Alaska Native, and 9% (N=417) as unknown or did not disclose. About 38% (N=1,824) identified as Hispanic.
As indicated by effect sizes in
Table 1, relative to the testing sample, the training sample contained significantly fewer children and youths who identified as African American or American Indian or Alaska Native, and more children and youths who reported Hispanic or unknown racial-ethnic identity. Other demographic characteristics did not significantly differ between the two samples. Descriptive statistics for psychiatric diagnoses and clinical factors are available in Tables S1 and S2 of the
online supplement.
Training Models
In the training data, random-forest analyses identified a model consisting of 16 predictors, with N=36 as the smallest nodes. As shown in
Table 2, the random-forest model in the training data provided excellent discrimination (AUC=0.91). This model had high specificity (0.98) and poor sensitivity (0.35), likely resulting from class imbalance in the criterion variable (
27). This model also had a PPV of 0.77 and an NPV of 0.91. A Cohen’s kappa coefficient of 0.45 indicated moderate agreement between predicted and observed hospitalization. A Brier score of 0.05 indicated excellent calibration in the training data. See Table S3 in the
online supplement for model performance conducted with the upsampling technique to correct for imbalance in the data.
Figure 1 shows the 20 variables the model identified as being the most important in clinicians’ decision making regarding whether to hospitalize a child or youth in crisis. Clinical variables indicating increased safety concerns regarding self or others were critical in crisis assessment and included increased suicide risk, poor judgment or decision making, danger to others, impulsivity, runaway behavior, other risky behavior, and nonsuicidal self-injury. Psychiatric symptoms, such as psychosis, depressive symptoms, sleep problems, and oppositional behavior, were also identified as important variables. Two functional impairment domains were critical: functioning at home and in peer relationships. Psychiatric diagnoses identified as important in clinicians’ decision making were depressive disorders and schizophrenia spectrum disorders; age was the sole demographic characteristic identified as important.
External Validation
The random-forest model was externally validated in a testing sample that was independent from the training sample (
Table 2). The model in the testing sample provided excellent discrimination (AUC=0.92), high specificity (0.99), and poor sensitivity (0.37). This model also had high PPV (0.82) and NPV (0.91). A Cohen’s kappa coefficient of 0.46 indicated moderate agreement between predicted and observed hospitalization. A Brier score of 0.07 indicated excellent calibration in the testing data. See Table S3 in the
online supplement for the upsampling model performance in the testing sample. In summary, the identified random-forest model was found to perform well in the independent testing sample.
Discussion
In this study, we used a machine-learning approach to assess factors associated with referral to psychiatric hospitalization of children and youths who used mobile crisis services. Traditional statistical approaches, such as regression analysis, use simpler models that provide parsimony but oversimplify what are often complex relationships in clinical data. A modern, machine learning–based approach could provide more clinically meaningful findings, because it allows for more complex, nonlinear, and interactive modeling of a complex decision-making process. Our random-forest analyses were well calibrated, had good model performance, and were validated in an independent testing sample. Moreover, overall model performance was sustained in the testing sample, with model performance being similar to that in the training sample. The consistent performance in both the training and testing samples likely reflected the consistent decision-making processes implemented by the MCRT. When faced with the decision whether to hospitalize a child or youth, MCRT clinicians focused on a core set of variables, with acute suicidality identified as the most important variable.
Our findings support results from previous research underlining the potential value of using machine-learning approaches in mental health (
14,
15). As expected, acute suicidality was consistently the most important factor associated with psychiatric hospitalization of children and youths. Most of the supporting clinical factors were other high-risk behaviors, such as harming others or running away from home. Clinical diagnoses (i.e., depressive disorders and schizophrenia spectrum disorders) were due to the presence of current acute depressive or psychotic symptoms. Random-forest models create interactive predictions, meaning that the importance of lower-order variables depends on their interactions with higher-order variables. MCRT clinicians made different determinations for youths with acute suicidality, depending on considerations of other factors, such as the presence of good judgment. The machine-learning model we developed did not simply indicate a specific panacea for what clinicians should do when conducting crisis assessments. Instead, this approach to modeling decision making indicated that clinicians should not only identify important clinical factors but also consider how these factors interact with each other to increase risk for danger to self or others. For instance, although suicidality was a critical factor in the current data set, MCRT clinicians might lower their concerns if the child or youth demonstrated good judgment, or they might amplify their concerns if the child or youth displayed acute problems with judgment or decision making. Among all variables, demographic factors and psychiatric diagnoses appeared less critical in deciding whether an inpatient admission was needed.
The current findings have clinical implications for psychiatric crisis services. The random-forest model had high specificity, indicating that this model might inform clinicians’ decisions about the need for hospitalization of children and youths in psychiatric crisis. If a youth scores high on multiple important variables, the clinician can be confident about deciding that a higher level of care is needed. On the other hand, the model had low sensitivity, indicating that it was poorer at ruling out the need for psychiatric hospitalization in the absence of identified high-risk factors. The set of variables identified as important may be too constrained to identify all children and youths who need hospitalization. Even if a youth is assessed as having low scores on these particular items, the youth still may require a higher level of care, but the clinician may fail to accurately identify this need. Other factors not captured in this study would still inform danger to self or others, such as acute manic symptoms or inability to self-care. Accurately identifying these additional factors is critical to preventing false negatives. Moreover, to reduce hospitalization, community-based services need to address the identified important factors to ensure the effectiveness of crisis stabilization. Although the results of evidence-based assessments are helpful if they predict clinical outcomes (i.e., hospitalization vs. community-based stabilization), they offer additional value if they prescribe a specific treatment (
29). Thus, clinicians who provide stabilization services need to address the identified clinical factors in interventions.
The present findings suggest a need for testing the generalizability and reproducibility of this random-forest model and to determine the variables associated with demand for inpatient services in other populations. Several state-funded programs have launched community-based mobile crisis services that also use the CAT or similar tools (
30,
31). Determining whether our model can be used in other mobile crisis programs is a critical next step, because external validation of machine-learning models is often lacking in real-world conditions (
32). Future research may also benefit from integrating the current findings into standardized crisis assessments. Structured clinical decisions improve on unstructured clinical decisions, are more clinician friendly, and usually result in outcomes similar to those of actuarial approaches (
33–
35). Efforts have been increasing to improve the quality and standardization of crisis care in various clinical settings (
36,
37). The recent launch of the 988 Suicide & Crisis Lifeline also aims to improve accessibility and effectiveness of psychiatric crisis care. Creating an algorithm to identify at-risk youths may reduce clinician burden in making high-risk decisions that carry economic costs (e.g., expenses for service and opportunity costs) and noneconomic costs (e.g., stigma, increased distress, and disillusionment with the mental health system) for youths in crisis. For example, building a clinically meaningful decision tree with identified important variables could provide a road map for clinicians who need to quickly make decisions regarding a youth’s safety.
Our study had several limitations. First, the model was developed with data from youths seeking MCRT services and therefore was not representative of all youths in psychiatric crisis. For example, data from those presenting directly to an inpatient unit or who had very severe suicide attempts were not captured. Systematic differences between these populations might represent an important limitation. Second, the quality of a predictive model depends on the quality of the training data. In our study, the criterion variable, that is, hospitalization, was based on clinical judgment, which was often imperfect and thus may have yielded lower reliability of the model (
38). Third, other sources of data would likely have added predictive value but were not included in the present study. For instance, clinically relevant information from unstructured data, such as clinical narratives, was not included. Techniques such as text mining to extract structured features from unstructured data may be a meaningful future step. Fourth, the hospitalization decision (i.e., reference standard) and the decision-making algorithm (i.e., index test) were not independent of each other. In studies of decision making, the strongest test of predictive utility occurs when the reference standard and index test are independent. In contrast, our findings may represent only a statistical model of how the collective decision-making process within MCRTs unfolds.
Conclusions
Determining whether a child or youth needs psychiatric hospitalization is a high-risk clinical decision. An enhanced understanding of the characteristics of youths who use mobile crisis services and require a higher level of care may improve crisis assessment and intervention efforts. The model developed in this study represents one method for assessing factors associated with clinician decision making in regard to hospitalization of youths in crisis. The findings may help clinicians working in crisis settings to formulate a better decision-making process that is based on thoughtful consideration of critical factors and their interplay for youths in psychiatric crisis. This study may also lay the foundation for future work in crisis care, including replication of the results in other populations or settings, creation of decision-making aids to reduce variability in crisis assessment, and development of targeted stabilization interventions to reduce psychiatric hospitalization among youths in crisis.