In this issue of the
Journal, Barack-Corren et al. (
1) use machine learning methods to build a highly predictive model of suicidal behavior using longitudinal electronic health records (EHRs). They do so using a well-established probability-based machine learning algorithm, the naive Bayesian classifier, to mine through approximately 1.7 million patient records, spanning 15 years (1998–2012), from two large Boston hospitals. After training the naive Bayesian classifier model on a randomly selected half of the data, the predictive ability of the model was assessed on the second half, yielding accurate (35%−49% sensitivity at 90%−95% specificity) and, critically, early (3–4 years in advance on average) prediction of patients’ future suicidal behavior. In this, the authors benefitted from access to a large and high-quality EHR database and chose an appropriate, and powerful, analytical method in the naive Bayesian classifier. Furthermore, the research has clear clinical applications in the potential for early detection warnings via physician EHR notices. Beyond such specifics, the study has broader significance in its demonstration of how the atheoretical machine learning approaches popular in Silicon Valley can successfully mine clinical insights from an exponentially growing body of EHR data. It also hints toward a future in which machine learning of big medical data may become a ubiquitous component of clinical research and practice—a prospect that some are uncomfortable with.
While the pace at which machine learning applications diffuse into clinical research and practice remains to be seen, methodological development in the machine learning field continues to accelerate. And this suggests one primary limitation of the current study. That is, while the naive Bayesian classifier is well-suited to the current application, it is an older and remarkably simple method by machine learning standards. Fundamentally, the naive Bayesian classifier is a direct application of Bayes’ theorem, simply calculating the product of the prior probability of the outcome of interest (e.g., suicidal behavior) and the probabilities for each predictor in the data conditional on the outcome of interest (
2). This analytical simplicity contrasts sharply with more advanced machine learning techniques, including neural nets, deep learning, and ensemble methods, which achieve notable increases in prediction compared with naive Bayesian classifier, but are black boxes in terms of estimation, as their models are extremely large, complex, and characterized by “hidden layers” (
3,
4). So, while there is ample room for improved prediction accuracy in Barack-Corren et al.’s approach, such gains would likely come at the expense of interpretability and inference. Thus, their selection of the naive Bayesian classifier has the further, unintended merit of providing an unusually lucid, accessible introduction to machine learning for many researchers and clinicians.
Another limitation, perhaps strategic on Barack-Corren et al.’s part, is the use of a limited set of standard ICD-9 codes and search terms as predictors, versus performing natural language processing of the full semistructured data of the EHR. This analytical decision is a significant limitation, as it drastically reduces the analysis feature space (i.e., the number of predictors considered), which generally results in poorer prediction given data of this size (
5,
6). While the authors do not give a precise number of predictors used in their analysis, we can safely assume it is at least an order of magnitude less than what would be possible using natural language processing techniques. However, this again raises the issue of model interpretability, as natural language processing approaches may identify highly predictive features that offer no clear interpretation or clinical significance (
6). Contrast that opacity with Barack-Corren et al.’s list of the top 100 predictors in their naive Bayesian classifier (see Table S2 in the article’s online data supplement), which summarizes a wealth of clinical insight, and we again see the precision advantages of more sophisticated approaches counterbalanced by the interpretability of simpler models like Barack-Corren et al.’s naive Bayesian classifier. This tradeoff is not specific to the current topic, instead it is a pervasive aspect of machine learning—a continuum of inference versus prediction that is traversed when moving from simpler approaches, like Barack-Corren et al.’s naive Bayesian classifier, to more advanced, opaque approaches, including neural nets and deep learning (
7,
8).
Stepping back from the technical aspects of machine learning, this study provides an opportunity to reflect on the trend of the field toward increasingly data-driven approaches. Regardless of the promise of machine learning of EHR, it would be unwise to endorse the approach without first considering the various professional, ethical, and legal issues accompanying the potential improvements in diagnosis and treatment. From the perspective of praxis, it is noteworthy that the approach, carried to its logical conclusion, is fundamentally atheoretical, which marks a stark departure from conventional clinical paradigms built primarily on evidence-based causal models (
9). Furthermore, for some it may seem like a slippery slope toward ceding power in the clinic to algorithms and devaluing clinician experience and judgment. But I would note that the majority of a clinician’s function would not, and indeed could not, be encroached upon by data-driven analytics. Rather, increasing the role of machine learning applications to EHRs would provide additional inputs for the clinician to consider in making diagnostic and treatment decisions. In this way, the emergence of machine learning EHR prediction may be seen as analogous to the development of imaging, genetic, or any other new source of highly informative medical data. Additionally, there are ethical and legal issues surrounding the mining of EHR, including protecting the patient population from adverse consequences stemming from the analysis of their data. This suggests potentially problematic dynamics if, for instance, EHR data and analytics are accessed by insurance companies, who may use the data to discriminate against patients in the marketplace. This risk is compounded by the possibility of black box machine learning methods inadvertently identifying stratifying criteria that we as a society find unacceptable.
While ethical arguments for the use of participant data often take the form of efforts to limit access to data, as in the well-justified attention paid to patient privacy and nondisclosure, a powerful argument for the opposite exists in regard to enhancing public benefit through the analysis of EHR data. That is, as the data are often collected using some combination of patient permission and government funding, it may be reasonable to consider public benefit as a goal, or even an obligation, in the collection and analysis of the data. Although this does not argue against private sector activity, it does support a concerted effort to consolidate data and analyses funded by federal research dollars into a public resource—and what a tremendous resource a centralized archive of EHR data staffed with a cadre of machine learning analysts could be. Currently, this possibility is prevented by data fragmentation, as most EHR data are presently proprietary (
10,
11), but this could change with leadership from federal entities. And we have good precedent from the National Institutes of Health and Veterans Affairs regarding safeguarding, and maximizing benefit from, comparable archives (e.g., dbGaP [database of Genomics and Phenotypes]).
In summary, as demonstrated by Barack-Corren et al., the application of machine learning methods to EHRs, and the potential of extending such analyses to other sources of big medical data (e.g., genomics and imaging), could generate enormous—yes, even paradigm-shifting—returns in improved diagnosis and treatment. What remains unclear is the pace at which these benefits will be realized, as well as who the primary beneficiaries will be.
Acknowledgments
The author thanks Jason D. Thomas and Anna R. Docherty for assistance and critique.