History of Key Concepts
Real-time monitoring has had a varied adoption across different medical specialties and conditions. In the treatment of some conditions, real-time monitoring has become so routine that to not use it would be a significant deviation from the usual practice and might not even come to mind as an instance of real-time monitoring. For example, in managing insulin therapy for patients with diabetes, an endocrinologist who did not incorporate the use of at-home glucose monitoring into usual practice and instead checked levels at only quarterly office visits would likely be considered to be providing substandard care. Because psychiatric disorders are characterized by their complex integration of both subjective and objective phenomena, the use of real-time monitoring technologies in psychiatry is more complicated, compared with their use in many other specialties.
Several concepts inform our current understanding of and approaches to the use of real-time monitoring in the field of psychiatry (
Table 1). One of the oldest is ecological momentary assessment (EMA), which “involves repeated sampling of subjects’ current behaviors and experiences in real time, in subjects' natural environments” (
1). The principles of EMA were first described by Stone and Shiffman in 1994 and addressed two key limitations of then common assessment methods: that “laboratory studies . . . may not faithfully capture real-world phenomena” and that “[r]etrospective self-report data . . . are subject to a number of biases” (
2). Stone and Shiffman were focused on the collection of data in a research setting; however, these limitations are also present in the typical model of outpatient psychiatric care wherein patients are seen in the office and they are asked to provide retrospective reports of their symptomatology and to complete a mental status exam. Because the patient reports are retrospective and subjective, they can be subject to several biases, such as recency, novelty, and mood-congruent memory effects (
3). Whereas the mental status exam provides more objective data, it is limited to assessing only those elements that can be assessed in the office setting. For example, a mental status exam may provide objective information about a patient’s thought process and affect regulation, but whereas a patient may demonstrate adequate levels of functioning of these domains in the setting of a quiet office, how this translates to the patient’s day-to-day life is not always clear. Furthermore, the mental status exam provides an assessment at a single point in time. Depending on the frequency of exams, there may be gaps between assessments during which salient changes may occur. Although early efforts in EMA tended to focus on the delivery of prompts to patients via phones and personal digital assistants, it was also recognized that these devices could also be used to capture other data streams, such as “audio, video, geographical positioning, and (through attachments) some physiological and biological data” (
3).
Digital exhaust, also described as data exhaust, is a more recently developed concept that has led to the development of new paradigms of data collection. These terms refer to data that are passively generated as byproducts of people’s interactions with digital technologies (e.g., website or mobile app log files) that store granular records of the various actions that users take when interacting with the digital resource in question (
4). It has become recognized that these data can be analyzed to generate useful information about user behaviors and preferences. Although the actual collection of these data may often require creating software specifically for the purpose of data collection, the capture of digital exhaust data is not typically the primary reason for using the digital technology. In this regard, digital exhaust data are passively and unobtrusively collected. This contrasts with active data collection, in which a user might be asked to perform a specific task to assess his or her cognition or even to simply report his or her mood.
The most recently developed concept regarding the use of real-time monitoring in psychiatry is the digital phenotype. Whereas a traditional phenotype is defined as the product of the interaction between an organism’s genotype and its environment, a digital phenotype is the product of the interaction between an individual’s psychiatric disorder(s) and that individual’s use of digital technologies (
5). Similar to a regular phenotype, the digital phenotype is a rather broad and abstract concept, as there are many technologies and data streams that can be analyzed, as well as a multitude of methods that can be used to conduct these analyses. Consequently, typical implementation of digital phenotypes focus on specific disorders and digital technologies.
What unifies all these concepts is that they describe methods for obtaining views into people’s daily experiences and behaviors by capturing data in real time; thus, they can all be considered as different, specific frameworks for conceptualizing real-time monitoring. Exactly what data are collected and how they are collected will vary depending on the specific implementation of these frameworks.
Theoretical Underpinnings
Psychiatric disorders are heterogeneous in both their presentations and etiologies. One of the ways in which the field has attempted to get a better handle on these difficulties is with the concept of endophenotypes. This concept was first described by Gottesman and Gould (
6). The essence of this idea is that there are reliably measurable traits or behaviors associated with specific psychiatric disorders and that these traits or behaviors have a genetic basis. When a given endophenotype can be used to separate people with a disorder into different subgroups, those groups can be thought of as different subtypes of the disorder (
6,
7). It is worth noting that Gottesman and Gould’s original specification of endophenotypes included rather strict criteria and that the term is sometimes used more loosely to simply describe characteristics that can be reliably associated with a given disorder. Such characteristics would be more properly described as biomarkers (
8). It is also worth noting that several biomarkers and purported endophenotypes have been found to be transdiagnostic and thus would not meet the stricter definition of endophenotypes proposed by Gottesman and Gould (
9,
10).
When considered in the context of endophenotypes and biomarkers, digital phenotypes can be considered as a form of biomarker. The development of biomarkers in psychiatry have a rather storied history, with many candidates but nothing that has been widely accepted into clinical use. Many of the most promising biomarkers in psychiatry involve the use of neuroimaging or the assay of some kind of a physical analyte, such as blood or cerebrospinal fluid (
11). Digital phenotypes distinguish themselves from these “traditional” biomarkers in that they can be sampled at high frequency and without the need for highly specialized equipment or testing procedures. This creates great potential for the use of digital biomarkers in population screening and management as well as tracking the progression of a disorder.
One of the difficulties in using biomarkers to make clinically relevant predictions is that the presence of group-level differences in a biomarker does not necessarily mean that the biomarker will be useful in making a prediction about an individual. Arbabshirani et al. described this phenomenon with regard to neuroimaging-based biomarkers, but the same general principles apply to digital biomarkers (
12). One approach that addresses this issue at least partially is the “
n-of-1” paradigm. In an
n-of-1 study, rather than trying to identify group-level parameters, the primary unit of observation is the individual (
13). These types of studies are often used in intervention trials in which the aim is to optimize a given clinical outcome for an individual. There are many design options for
n-of-1 studies, with a feature of some designs being that the subjects are randomized to intervention arms multiple times over the course of the study, with this randomization guided in real time by response to previous interventions. The goal of such studies is to create “just-in-time adaptive interventions” also known as JITAIs, which, because of their context sensitivity and high level of personalization, could theoretically be more effective than traditional interventions (
14). Although the primary goal of
n-of-1 studies is to draw inferences for individuals,
n-of-1 study results can be pooled to try to identify population-level parameters (
15). Given the complexity of the signals generated by many real-time monitoring technologies and the complexity of psychiatric disorders themselves,
n-of-1 trials may prove to be an important research methodology for fully leveraging the use of real-time monitoring in understanding psychiatric disorders; however, to discern the population outcomes that would increase our understanding of these disorders would likely require a large number of subjects and longitudinal monitoring. A recent review of the use of
n-of-1 trials in schizophrenia suggests that the methodology is underutilized (
16).
Current Efforts and Challenges
Arguably, the technology that has most enabled the development and use of real-time monitoring is the smartphone. Because of its ubiquity and its use in many daily tasks, it serves as an ideal platform for collecting data streams that can be used in models of psychiatric functioning. A nonexhaustive list of the types of data that have been investigated with regard to their utility for building such models includes typing kinematics (
17–
20), acoustic characteristics of speech (
21), number of phone calls (
22,
23), number of text messages (
22,
23), pattern of phone calls to contacts stored on the smartphone (
23), locations visited as measured via global positioning system signal (
23,
24), and patterns of app usage (
23,
25). Stand-alone wearable sensors have also been investigated, including off-the-shelf activity trackers (
26,
27) and custom devices to measure electrodermal activity and galvanic skin response (
28,
29); however, the majority of the research published to date has focused on smartphone-derived sensor data.
Recently, there have been efforts to systematically review findings to date (
30,
31); however, as discussed by Rohani et al. in their review of the concordance between sensor data and depressive symptoms, heterogeneity in measurement and analytic methods makes it difficult to draw generalizable conclusions (
32). This is exacerbated by the use of technology such as smartphones: the functionalities of the sensors and the types of data available are often dependent on the version of the phone and software, thus making it more difficult to create robust, generalizable models.
Although outpatient psychiatry is likely to benefit the most from the integration of real-time monitoring technologies, there is also opportunity for inpatient psychiatry. Given that patients hospitalized on a psychiatric inpatient unit are acutely ill and may have disorders characterized by paranoia, the question becomes whether real-time monitoring would be acceptable to psychiatric inpatients. Ben-Zeev et al. examined this question and found that of 20 inpatients with schizophrenia or schizoaffective disorder approached for enrollment in a study involving real-time monitoring via carrying a special study phone, 13 expressed interest. Notably, two of the 13 were found to have insufficient capacity to consent to participation (
33). Another feasibility and acceptability study in an inpatient adolescent population was conducted by Kleiman et al. In this study, 50 participants were asked to wear a wrist-worn monitor for as often as possible, and the study found that they wore the device an average of 18 hours per day (
34). These are clearly very specific populations, so how well these findings may generalize is not clear; however, these studies provide encouraging early evidence that the use of real-time monitoring technologies with inpatient populations may be a viable approach.
One of the interesting aspects of the application of real-time monitoring to the inpatient setting is that it allows for the use of nonwearable sensors that are incorporated into the environment. One such possibility is the use of closed-circuit video cameras. Tracking people across video frames is a well-studied problem in the field of computer vision, with increasingly sophisticated algorithms being developed (
35–
37). Information derived from such algorithms could be used to quantify psychiatrically relevant measures such as overall activity levels, psychomotor agitation and retardation, stereotyped behaviors such as pacing, and the amount and frequency of interpersonal interactions. Other information that could be derived from video include emotion recognition based on facial expressions (
38) and gait analysis (
39). There are ethical and legal questions to be considered in the deployment of any of these systems, such as whether patients and nonpatients (e.g., staff, visitors) being monitored by the same video system could provide consent in regard to different types of analysis and how any information derived by the algorithms should be handled. Also, the utility of the information would need to be demonstrated and weighed against the information derived from the current practice of behavioral observation reports provided by inpatient staff.
In terms of currently available systems for real-time monitoring, there are probably approximately 10,000 to 15,000 mental health apps available for download from various app stores (
40,
41), but it is not clear how many of these include some type of real-time monitoring. This lack of precision raises the question of what, exactly, constitutes real-time monitoring. If we include apps that allow users to track things such as their mood, sleep, menstrual cycle, and other potentially psychiatrically relevant functions, the number of apps is probably quite large.
To be clinically useful, real-time monitoring data must be distilled into relevant information and made available to psychiatric practitioners. For most psychiatric practitioners today, the primary repository for clinical data is the electronic health record (EHR); however, EHR vendors and academic researchers are only beginning to explore the possibility of importing patient-generated data via other digital systems such as smartphone apps into the EHR. Among the challenges that are raised by this function is the development of data standards that can be used to ensure the integrity and fidelity of the data and track the provenance and veracity of the data. For example, let us consider the example of importing sleep data captured by a patient’s smartphone app. Such data could be used by a clinician to help determine whether a patient is in a mood episode and may be more accurate than the patient’s own retrospective self-reports. A simple list of the number of hours of sleep the patient has had each night since the patient’s last visit would probably not be helpful, so some relevant summary measures would need to be created. It would also be important to know whether the number of hours of sleep that feed into these summary measures are from patient self-report, passively measured via some proxy measure such as phone usage, or derived from an algorithm based on multiple data streams. In the case of an algorithm, it would also be important to know whether the algorithm has changed over time and, if so, what version was used. All of these questions will need to be answered to establish the validity of the metric. Of course, the ultimate goal is for most of this to be hidden from the clinician end user, which is similar to how a clinician can, today, look at a lab value and trust that the assays and algorithms used to calculate it were accurate. The Office of the National Coordinator for Health Information Technology recently published a white paper describing the various issues that would need to be addressed by a framework to incorporate such data into EHRs (
42).
Assessing App Quality
With the proliferation of mental health apps, there has been increasing recognition of the need for assessment of these apps beyond the user-submitted ratings found in app stores.
In January 2019, the Food and Drug Administration (FDA) launched the Software Precertification Pilot Program. The goal of this program is to “help inform the development of a future regulatory model that will provide more streamlined and efficient regulatory oversight of software-based medical devices developed by manufacturers who have demonstrated a robust culture of quality and organizational excellence” (
43). An important feature of this proposed model is that, rather than focusing on vetting individual apps, the FDA would instead focus on vetting the companies that make the apps. Nine companies were selected to participate in the program, including Apple, Fitbit, and Verily (
43). The next steps of the program include a comparison of the results of the proposed model with the traditional medical device clearance process (i.e., vetting individual apps). The FDA is currently soliciting public comments regarding the program (
44).
The American Psychiatric Association (APA) endorses a five-step process for practitioners to follow in determining what advice to provide patients who are considering using an app. This process is based on a hierarchical framework proposed by Torous et al. (
45). These steps include assessing the risks associated with using an app, including the protection afforded to the patient’s private data; the evidence supporting the app’s proposed benefits; the usability of the app; and the app’s interoperability (i.e., the ease with which data collected by the app can be shared with other technologies such as other apps or an EHR system) (
46). As of the time of the writing of this article, the APA is also planning to form a panel that will use this model to rate apps and publish their evaluations online (
47).
Several projects have already published databases providing behavioral health app reviews that are similar to what the APA is proposing (
Table 2). Each of these projects maintains its own framework for assessing apps. Carlo et al. recently published an article analyzing the concordance between these frameworks for the most downloaded apps for the iOS and Android operating systems (
48). They focused their analysis on three projects that have rated the most apps: the Organisation for the Review of Care and Health Applications (ORCHA), PsyberGuide, and MindTools.io. Carlo et al found that the amount of agreement between the projects was highest for the “credibility and evidence base” domain with “fair agreement”; they found “slight agreement” for the domains of “user experience” and “date use and security.” ORCHA had the highest fraction of published reviews for the top 25 most popular apps; however, PsyberGuide was the most popular of the sites with the highest number of visits.
Conclusions and Future Directions
There is still much work to be done on validating the scientific basis of current findings. Additionally, translational and implementation work will need to be conducted to ensure that the information obtained via real-time monitoring systems is both relevant and incorporated into clinical workflows and information systems in such a way that it can affect clinical decision making and improve clinical outcomes. Throughout all of this, respect for principles of privacy and fairness will need to be maintained. For practitioners interested in incorporating real-time monitoring practices into their current clinical practice, the APA’s app evaluation model provides a useful framework for selecting appropriate apps.
Real-time monitoring paradigms, such as digital phenotyping, have great potential to transform the ways in which we study and treat psychiatric disorders. Although we have seen similar promises from past breakthroughs (e.g., genetics, epigenetics, neuroimaging), digital phenotyping distinguishes itself in that it can be much more readily and inexpensively deployed and that its feasibility is being facilitated by changes in our society—namely, the increasing use of digital technologies in more and more aspects of people’s daily lives. With these forces at play, it appears inevitable that we will see rising adoption of real-time monitoring technologies, in which case the relevant issue will become how to ensure that this adoption proceeds safely and effectively.