Electronic Medical Records: Fast Track to Big Data in Bipolar Disorder by Dr. James Potash, NNDC Executive Committee Member

Electronic Medical Records: Fast Track to Big Data in Bipolar Disorder
James B. Potash, M.D., M.P.H.

0313-electronic-med-records-630x420A teacher of mine was fond of saying that “aman on a fast train can diagnose florid mania, as he speeds by and looks out the window at a patient” (Melvin G. McInnis, M.D., personal communication). Despite the bit of hyperbole, there is truth to the idea that classic mania, and thus bipolar I disorder, often can be straightforward to recognize when it is directly encountered, or even when inquired about after the fact (1). Other forms of bipolar disorder, such as bipolar II, are more subtle, though careful examination can yield high-reliability diagnoses here as well (2).

As reported in this issue of the Journal, Castro et al. (3) asked whether a man or woman on a fast computer could diagnose bipolar disorder. They took advantage of the power of the electronic health record (EHR) to identify more than 50,000 potential bipolar disorder cases. Manual review of a subset of these showed that 63% of the individuals could be classified as having bipolar disorder. The researchers then used text features and coded data from the EHR to generate automated algorithms that classified patients as likely to have bipolar disorder. It is important to note that they next conducted a validation study on a selected subset of cases, for which they compared the EHR- and algorithm-derived diagnoses to those made on the basis of direct diagnostic interview by clinicians using the Structured Clinical Interview for DSM-IV. A quite respectable 79%285% of the patients electronically classified as having bipolar disorder also had the diagnosis on direct interview, while none of those classified as control subjects did. The authors are ultimately interested in using their method to identify samples for genetics studies of bipolar disorder. Their result represents an important step forward in the application of the big data approach to pinpointing genetic susceptibility variants in bipolar disorder.

A little context: we have had evidence since the 1920s that bipolar disorder has a major genetic component, as it runs in families (4) and is more likely to be shared by identical than fraternal twins. This evidence was solidified in the 1970s and 1980s by further studies, such as the Iowa 500, which con- firmed the familial aggregation of the illness (5). Eventually psychiatric researchers began collecting blood from patients with the idea that DNA from these samples could be examined to finger the genetic culprits that set bipolar disorder in motion. People such as Raymond DePaulo at Johns Hopkins University led groups that carefully assessed family members to determine their clinical picture, or phenotype, with the idea that imprecision in the determination of who did and did not have bipolar disorder would undermine the gene-hunting process, just as being one digit off for a telephone numberwould renderitimpossible to connect with the right person on the other end of the line. After a while it began to seem likely that bipolar disorder is the result of an accumulation of many small genetic effects and that to detect any one of them, a large sample would be needed. Initially “large” was thought to be hundreds of patients, but then it became thousands, and now it appears to be tens of thousands.

Where, exactly, does one find tens of thousands of patients? And how do you secure the clinician time to assess them all evenif youfind them? One approachis to havemanyindividual research groups band together and pool their resources. This concept formed the basis of theNational Institute of Mental Health Bipolar Disorder Genetics Initiative, with 12 sites (6), and then of the Psychiatric GWAS (Genome-Wide Association Study) Consortium bipolardisorderworkinggroup,withmore than 170investigators from more than 80 institutions working together (7). But another approach that is faster and less costly is to make use of the EHR. Researchers across varied medical fields have been working to put this approach into practice through projects such as the Electronic Medical Records and Genomics (eMERGE) Network (8). Castro et al. (3) point out that EHRbased methods can lead to a 10-fold reduction in costs compared with the standard way of going about ascertainment and assessment. In these large-scale efforts there is greater tolerance ofimprecisionin the phenotype. As when trying to control an agitated patient on an inpatient unit, less subtlety is required when you have large numbers, creating an overwhelming amount of power.A key to the success of thisapproach forgenetic research is that previously discarded blood samples within the health system can be linked to the clinical record and retrieved, making DNA available for study. Castro et al. report having collected DNA for 4,500 bipolar disorder cases and 5,000 controls over a 3-year period—an impressive level of productivity.

The use of EHRs is still inits infancy, and genetics research constitutes only one of a vast number of potential applications Where, exactly, does one find tens of thousands of patients? And how do you secure the clinician time to assess them all even if you find them? 310 ajp.psychiatryonline.org Am J Psychiatry 172:4, April 2015 EDITORIALS for this powerful resource. The Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 provides payments to promote the “meaningful use” of EHRs. This has led to wider use among physicians—from 18% in 2001 to 72% in 2012 (9). Clinicians and investigators are taking advantage of this development to create tools such as registries of patients with particular illnesses for use in health services research and clinical trials. For example, the National Network of Depression Centers (10) has developed a registry to track outcomes for patients with depression and bipolar disorder across 21 sites.

But questions remain about the use of EHRs, especially questions involving privacy. A number of policies and procedures have been proposed to prevent threats such as the reidentification of de-identified patient data. A recent review described 45 different algorithms aimed at transforming data sets to facilitate privacy protection, while keeping loss of information to a minimum (11). The concerns about privacy have been particularly acute in the realm of DNA sequence information. For example, one group of investigators showed that de-identified genomic data could be reidentified by using publicly available resources (12). To reduce these risks, some have argued for new regulatory efforts, such as expansion of protections provided under the Genetic Information Nondiscrimination Act (13).

Castro et al. describe an approach to bringing phenotyping for psychiatric genetics into the 21st century, as has been done for genotyping over the last 5–10 years with the advent of high-throughput tools, such as microarrays and nextgeneration sequencing machines. Limitations do exist though. There is the issue of potential imprecision in the diagnosis. Clarity about whether or not this is a meaningful concern will come soon, when Castro and colleagues’ sample is genetically assessed and compared with other large existing bipolar disorder samples. The other biggest consideration is about how much phenotypic detail can be obtained, clinical and otherwise. Of note, Castro et al. report on the predictive power of their algorithms (as compared with direct interview) to determine eight clinical subtypes. Results include 72% accuracy for psychotic features and a robust 92% for attempted suicide. In the EHR-based approach, however, tailored assessments such as, for example, those focused on neuropsychological testing, history of childhood trauma, or hypothalamic-pituitary-axis functioning cannot be systematically obtained. Further, contacting people for follow-up may be problematic.

Technological developments, including the EHR, are helping move bipolar disorder genetics onto the fast track to discovery (14). Hopefully, this train is bound for translational glory—that is, by making a difference through pointing to the critical pathophysiologic pathways within which novel treatments can exert therapeutic effects.


From the Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City.

Address correspondence to Dr. Potash ([email protected]).

Supported by NIMH grants R01 MH-087979 and R01 MH-090595 and by the United States–Israel Binational Science Foundation.

The author reports no financial relationships with commercial interests. Accepted January 2015.

Am J Psychiatry 2015; 172:310–311; doi: 10.1176/appi.ajp.2015.15010043

REFERENCES 1. Andreasen NC, Grove WM, Shapiro RW, et al: Reliability of lifetime diagnosis: a multicenter collaborative perspective. Arch Gen Psychiatry 1981; 38:400–405 2. Simpson SG, McMahon FJ, McInnis MG, et al: Diagnostic reliability of bipolar II disorder. Arch Gen Psychiatry 2002; 59:736–740 3. Castro VM, Minnier J, Murphy SN, et al: Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am J Psychiatry 2015; 172:363–372 4. Banse D: Zum problem der erbprognosebestimmung: die erkrankungsaussuchten der vettern und basen von manischdepressiven (The problem of empirical hereditary risk: the morbidity risk of cousins of manic depressive patients). Z Gesamte Neurol Psychiatr 1929; 119:576–612 5. Winokur G, TsuangMT, Crowe RR: The Iowa 500: affective disorder in relatives of manic and depressed patients. Am J Psychiatry 1982; 139:209–212 6. Smith EN, Bloss CS, Badner JA, et al: Genome-wide association study of bipolar disorder in European American and African American individuals. Mol Psychiatry 2009; 14:755–763 7. Psychiatric GWAS Consortium Bipolar Disorder Working Group: Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat Genet 2011; 43: 977–983 8. Gottesman O, Kuivaniemi H, Tromp G, et al: The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med 2013; 15:761–771 9. Hsiao CJ, Hing E. Use and Characteristics of Electronic Health Record Systems Among Office-Based Physician Practices: United States, 2001–2012: NCHS Data Brief, number 111. Hyattsville, Md, National Center for Health Statistics, 2012 (http://www.cdc.gov/ nchs/data/databriefs/db111.htm) 10. Greden JF: The National Network of Depression Centers: progress through partnership. Depress Anxiety 2011; 28:615–621 11. Gkoulalas-Divanis A, Loukides G, Sun J: Publishing data from electronic health records while preserving privacy: a survey of algorithms. J Biomed Inform 2014; 50:4–19 12. GymrekM,McGuireAL, Golan D, et al: Identifying personal genomes by surname inference. Science 2013; 339:321–324 13. Altman RB, Clayton EW, Kohane IS, et al: Data re-identification: societal safeguards. Science 2013; 339:1032–1033 14. Shinozaki G, Potash JB: New developments in the genetics