Mobile phone texting is gaining popularity as a promising method for delivering health care outside brick-and-mortar clinics (
1). Texting may be particularly well suited for psychosocial interventions, in which language is the primary vehicle for administering treatment. Texting interventions have been deployed successfully among clients with a range of mental health conditions, including serious mental illness (
2). People with serious mental illness in the United States have access to text-capable mobile phones at a level only slightly lower than that of the general population; in addition, surveys suggest that most people with serious mental illness actively use text messaging and are interested in text-based interventions (
3). Recent research has demonstrated that augmentation of routine services with texting is feasible, acceptable, and clinically useful among people with serious mental illness (
4).
Although technology-mediated services are feasible, a lack of training and clinician expertise presents a barrier to their adoption for treatment of serious mental illness in community settings (
5). Opportunities for training in technology-based services are limited, and providers adopting these technologies commonly express a need for increased support (
6). When available, training and supervision in the use of novel interventions are resource intensive and may preclude adoption of texting interventions (
7). New methods to monitor and support clinicians who offer services via texting are needed to encourage consistent intervention delivery, reduce human resource requirements, and allow more patients to receive care via this modality.
Natural language processing (NLP) is the computational analysis of human language and may augment delivery of text-based care. NLP has shown promise for identifying cognitive distortions expressed by individuals receiving support for mood disorders (
8,
9). Cognitive distortions are systematic errors in thinking that affect how individuals make sense of the world (
10). Among people with serious mental illness, distortions exacerbate negative effects of psychiatric symptoms (
11,
12). To identify and restructure these distortions is an important goal of cognitive-behavioral therapy (CBT). Access to and engagement in high-fidelity CBT interventions that ameliorate distortions are limited for individuals with serious mental illness (
13). Training clinicians to provide high-quality CBT via text messaging can help alleviate this issue and successfully augment in-clinic services (
2,
4). Automated methods for proactively flagging distortions in text-based interactions may assist clinicians who provide CBT in reducing distortion-related psychiatric symptoms.
We are not aware of previous work that has applied NLP methods to text-based interactions between clinicians and clients receiving treatment for serious mental illness. However, prior work that used machine learning methods to identify distortions in data from other populations informed our study. Shickel et al. (
8) identified cognitive distortions in journal entries from college students with a self-guided online intervention and in crowdsourced samples from participants who were prompted to provide examples of distortions. Shreevastav and Foltz (
9) explored identification of cognitive distortions in publicly available therapist question-and-answer transcripts. Whether NLP can be applied to text-based client-clinician interactions and successfully detect cognitive distortions is an open research question.
The purpose of this study was to test whether NLP techniques can automatically identify distortions in text-message exchanges between clients with serious mental illness and their therapists at a similar level to that of trained clinicians. We hypothesized that a fine-tuned NLP classifier would be able to identify distortions with a level of precision and recall, operationalized as an F1 score (range=0–1, where higher scores indicate better precision and recall), that is comparable to that of trained clinician annotators.
Methods
Data were collected from 39 clients enrolled in the active intervention arm of a randomized controlled trial between December 2017 and October 2019. Seven clinicians from mental health agencies in the Midwest and Pacific Northwest provided the texting intervention. Clients who contributed data to the current analysis had diagnoses of schizophrenia, schizoaffective disorder, major depressive disorder, or bipolar disorder. Most were male (N=22, 56%), with a mean±SD age of 45.4±11.1 years, a mean of 12.8±2.4 years of education, and a mean of 2.8±3.4 lifetime psychiatric hospitalizations. A full description of the trial, including intervention feasibility, acceptability, engagement, and clinical outcomes, was recently published in
Psychiatric Services (
4). The study was approved by the University of Washington’s Institutional Review Board, and clients provided informed consent. As part of the study, clients participating in standard care engaged with trained clinicians in back-and-forth text-message conversations for 12 weeks. In total, 14,312 messages were sent, with 7,354 coming from clients (mean±SD=188.6±226.4 messages per client). Clients had variable levels of engagement, with the average number of client messages ranging from 0.3 to 12.5 messages per day. All data were stored on a secure server, and client and clinician identifiers were removed before annotation and analysis.
We created a data set of text messages annotated for five common cognitive distortion types: mental filtering, jumping to conclusions, catastrophizing, should statements, and overgeneralizing (
14). To do this, we selected distortion types noted to play a role in psychosis (e.g., jumping to conclusions) and those used in previous automated classification studies (
8,
9,
11). Next, we developed an annotation guide that provides definitions and examples of these distortions to facilitate labeling of text messages by two clinically trained annotators (
15). One annotator holds a psychology master’s degree (A.C.), and the other is a doctoral-level licensed clinical mental health counselor (J.S.T.). The annotators labeled client text messages as representing any of the five distortion types or as no distortion (see the
online supplement to this report for definitions and examples). Multiple labels were assigned to messages when more than one distortion type was present. Annotators were trained to use the annotation guide and practiced coding transcripts that contained all messages from three randomly selected clients. Interrater discrepancies were discussed, code definitions were refined, and agreement was reached about how to apply codes. Annotators then coded messages from a sample of 23% (N=9) of clients (2,318 messages) and reached a moderate level of agreement (Cohen’s κ=0.51), a level comparable with previous work that used human annotation of cognitive distortion (
8). All remaining transcripts were then equally divided between annotators for coding.
We created three NLP distortion classification models for comparison with human rater performance. he first model used bidirectional encoder representations from transformers (BERT) (
16). BERT allows for pretraining on large quantities of unlabeled text and fine-tuning model parameters to specific classification tasks. BERT fine-tuning is an effective transfer learning strategy that has achieved state-of-the-art performance in many classification tasks and domains (
17). Because BERT has not been previously used for identifying cognitive distortions, we also included the two best-performing models from prior cognitive distortion work with which to compare our BERT model: logistic regression with term frequency–inverse document frequency features (LR) and support vector machines with input features generated by sentence-BERT without fine-tuning (SVM) (
8,
9). Using the NLP models, we classified the five cognitive distortions and an “any distortion” label, indicating whether one or more of the five cognitive distortions were present, as a multilabel binary classification task.
Models were evaluated with a nested cross-validation procedure (
18). We randomly split the annotated data set into five folds (1, 2, . . . 5). These folds were used to split data into train (D
train), validation (D
val), and test (D
test) sets. There was no overlap between the messages in the train and test sets; however, a given client could appear in both the training and test sets. We tuned hyperparameters by using D
train, and we evaluated D
val by using splits of the data. After hyperparameters were tuned, final model performance was assessed by combining D
train and D
val splits to train a final model to evaluate performance on the previously unused D
test split. As a form of repeated holdout testing, we iterated over folds assigned to D
train, D
val, and D
test, retuning the hyperparameters for each iteration. For example, fold assignments for iteration 1 were D
train={1, 2, 3}, D
val={4}, D
test={5}; fold assignments for iteration 2 were D
train={2, 3, 4}, D
val={5}, D
test={1}; and fold assignments for iteration 5 were D
train={5, 1, 2}, D
val={3}, D
test={4}. We assessed performance of aggregated D
test predictions with F1 scores and receiver operating characteristic (ROC) area under the curve (AUC) values. Hyperparameters were optimized to maximize average F1 (macro F1) score across the six binary targets. The F1 score is a standard performance metric in NLP evaluation, and it is calculated by combining precision and recall into a mean score. The selected model configurations are summarized in the
online supplement.
We used the Shapley additive explanations (SHAP) package to identify portions of client messages that most influenced classification decisions and to demonstrate the interpretability of the BERT model (
19). The
online supplement provides examples of SHAP plots for each cognitive distortion type, in which each word is evaluated for its impact on the model output.
Discussion
This study demonstrated that NLP models can identify cognitive distortions in text messages between people with serious mental illness and clinicians at a level comparable to that of clinically trained raters. Our BERT implementation outperformed previously published NLP methods for detecting cognitive distortions in text. BERT’s performance in this task is attributable to the transfer learning paradigm, in which a rich representation of language is learned from unlabeled text (pretraining) and then adapted to the target task (fine-tuning). Our work extends previous research on detection of cognitive distortions by confirming that detecting distortions in text messages during text-based care for people with serious mental illness is possible.
A key strength of this study is that data came from real-world communications between clients and clinicians that focused on management of symptoms of serious mental illness. There were no prompts to elicit distortions in these interactions, which allowed our classification model to be trained on the natural language used by clients as they typically communicate. This study is limited by size (number of clients, therapists, and messages). Although the study’s BERT model was able to demonstrate strong binary classification of the any distortion label and the most commonly occurring distortions, all NLP models were not as successful in identifying distortion types with fewer instances in the data set. Similar to prior cognitive distortion work, classification performance was limited by the challenge of manually annotating distortions (
8).
Additional work with an expanded data set is needed to assess generalizability to a diverse client population, including clients not represented in the training data and clients engaged in other text-based messaging interventions. Incorporating previously sent messages may also prove beneficial to model performance by ensuring that predictions are grounded in the context of entire conversations.