A teacher of mine was fond of saying that “a man on a fast train can diagnose florid mania, as he speeds by and looks out the window at a patient” (Melvin G. McInnis, M.D., personal communication). Despite the bit of hyperbole, there is truth to the idea that classic mania, and thus bipolar I disorder, often can be straightforward to recognize when it is directly encountered, or even when inquired about after the fact (
1). Other forms of bipolar disorder, such as bipolar II, are more subtle, though careful examination can yield high-reliability diagnoses here as well (
2).
As reported in this issue of the
Journal, Castro et al. (
3) asked whether a man or woman on a fast
computer could diagnose bipolar disorder. They took advantage of the power of the electronic health record (EHR) to identify more than 50,000 potential bipolar disorder cases. Manual review of a subset of these showed that 63% of the individuals could be classified as having bipolar disorder. The researchers then used text features and coded data from the EHR to generate automated algorithms that classified patients as likely to have bipolar disorder. It is important to note that they next conducted a validation study on a selected subset of cases, for which they compared the EHR- and algorithm-derived diagnoses to those made on the basis of direct diagnostic interview by clinicians using the Structured Clinical Interview for DSM-IV. A quite respectable 79%−85% of the patients electronically classified as having bipolar disorder also had the diagnosis on direct interview, while none of those classified as control subjects did. The authors are ultimately interested in using their method to identify samples for genetics studies of bipolar disorder. Their result represents an important step forward in the application of the big data approach to pinpointing genetic susceptibility variants in bipolar disorder.
A little context: we have had evidence since the 1920s that bipolar disorder has a major genetic component, as it runs in families (
4) and is more likely to be shared by identical than fraternal twins. This evidence was solidified in the 1970s and 1980s by further studies, such as the Iowa 500, which confirmed the familial aggregation of the illness (
5). Eventually psychiatric researchers began collecting blood from patients with the idea that DNA from these samples could be examined to finger the genetic culprits that set bipolar disorder in motion. People such as Raymond DePaulo at Johns Hopkins University led groups that carefully assessed family members to determine their clinical picture, or phenotype, with the idea that imprecision in the determination of who did and did not have bipolar disorder would undermine the gene-hunting process, just as being one digit off for a telephone number would render it impossible to connect with the right person on the other end of the line. After a while it began to seem likely that bipolar disorder is the result of an accumulation of many small genetic effects and that to detect any one of them, a large sample would be needed. Initially “large” was thought to be hundreds of patients, but then it became thousands, and now it appears to be tens of thousands.
Where, exactly, does one find tens of thousands of patients? And how do you secure the clinician time to assess them all even if you find them? One approach is to have many individual research groups band together and pool their resources. This concept formed the basis of the National Institute of Mental Health Bipolar Disorder Genetics Initiative, with 12 sites (
6), and then of the Psychiatric GWAS (Genome-Wide Association Study) Consortium bipolar disorder working group, with more than 170 investigators from more than 80 institutions working together (
7). But another approach that is faster and less costly is to make use of the EHR. Researchers across varied medical fields have been working to put this approach into practice through projects such as the Electronic Medical Records and Genomics (eMERGE) Network (
8). Castro et al. (
3) point out that EHR-based methods can lead to a 10-fold reduction in costs compared with the standard way of going about ascertainment and assessment. In these large-scale efforts there is greater tolerance of imprecision in the phenotype. As when trying to control an agitated patient on an inpatient unit, less subtlety is required when you have large numbers, creating an overwhelming amount of power. A key to the success of this approach for genetic research is that previously discarded blood samples within the health system can be linked to the clinical record and retrieved, making DNA available for study. Castro et al. report having collected DNA for 4,500 bipolar disorder cases and 5,000 controls over a 3-year period—an impressive level of productivity.
The use of EHRs is still in its infancy, and genetics research constitutes only one of a vast number of potential applications for this powerful resource. The Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 provides payments to promote the “meaningful use” of EHRs. This has led to wider use among physicians—from 18% in 2001 to 72% in 2012 (
9). Clinicians and investigators are taking advantage of this development to create tools such as registries of patients with particular illnesses for use in health services research and clinical trials. For example, the National Network of Depression Centers (
10) has developed a registry to track outcomes for patients with depression and bipolar disorder across 21 sites.
But questions remain about the use of EHRs, especially questions involving privacy. A number of policies and procedures have been proposed to prevent threats such as the reidentification of de-identified patient data. A recent review described 45 different algorithms aimed at transforming data sets to facilitate privacy protection, while keeping loss of information to a minimum (
11). The concerns about privacy have been particularly acute in the realm of DNA sequence information. For example, one group of investigators showed that de-identified genomic data could be reidentified by using publicly available resources (
12). To reduce these risks, some have argued for new regulatory efforts, such as expansion of protections provided under the Genetic Information Nondiscrimination Act (
13).
Castro et al. describe an approach to bringing phenotyping for psychiatric genetics into the 21st century, as has been done for genotyping over the last 5–10 years with the advent of high-throughput tools, such as microarrays and next-generation sequencing machines. Limitations do exist though. There is the issue of potential imprecision in the diagnosis. Clarity about whether or not this is a meaningful concern will come soon, when Castro and colleagues’ sample is genetically assessed and compared with other large existing bipolar disorder samples. The other biggest consideration is about how much phenotypic detail can be obtained, clinical and otherwise. Of note, Castro et al. report on the predictive power of their algorithms (as compared with direct interview) to determine eight clinical subtypes. Results include 72% accuracy for psychotic features and a robust 92% for attempted suicide. In the EHR-based approach, however, tailored assessments such as, for example, those focused on neuropsychological testing, history of childhood trauma, or hypothalamic-pituitary-axis functioning cannot be systematically obtained. Further, contacting people for follow-up may be problematic.
Technological developments, including the EHR, are helping move bipolar disorder genetics onto the fast track to discovery (
14). Hopefully, this train is bound for translational glory—that is, by making a difference through pointing to the critical pathophysiologic pathways within which novel treatments can exert therapeutic effects.