For some health care professionals, big data may still qualify as just another buzzword. But, in other fields, such as marketing, financial analysis and weather forecasting, big data is the subject of enormous study and appreciated as an important tool with amazing power.
What is Big Data?
In health care, we have many examples of big data. When a patient is hospitalized, we record detailed textual information in the electronic medical record and compile statistics to record test results, patient care activities and more. Also, an enormous amount of digital data is recorded by electronic monitors and computerized treatment devices. For example, patients in critical care have temperature, blood pressure, respirations and many other parameters monitored continuously. A single EKG may contain a huge amount of digital information, only a fraction of which is routinely monitored or recorded.
New Terminology & Concepts
Many health care risk managers would readily self-identify as Health IT novices. They are compelled to understand unfamiliar terminology and concepts in order to master big data. The following paragraphs describe and summarize, in non-technical language, some of the terms, which may be necessary to fully appreciate big data.
The process of gathering, analyzing, and interpreting big data generally is called data analytics. While the technical aspects of data analytics are formidable, we might gain a more practical understanding by reviewing how, traditionally, we have dealt with large amounts of information.
In the past, prior to development of computers, microchips, cyber networks and wireless communications, we were forced to rely on sampling to derive meaning from huge collections of data. Post-election, exit polling is sampling where questioning a limited number of voters as they leave the polls is extrapolated into predictions regarding actual voting results. We rely on a relatively small sample because questioning every voter is impractical, and past experience tells us that the sample, if random and large-enough, can provide a reliable prediction of the outcome.
Today, we can forgo sampling because we now have the ability to gather and analyze more, and in some cases, all of the available information. Sampling requires a “theory” about how things work (e.g., that a fraction of voters leaving the polls can reliably predict the ultimate outcome), which can be tested. Big data has no such reliance on theories or testing; the data speaks for itself.
Do risk managers understand the nature and use of algorithms? Does one have to study computer science, math or engineering in order to understand algorithms and their use? The “algorithm” is merely a method (a self-contained sequence of actions to be performed) used to calculate a result or to process and interpret data. One uses algorithms to program a computer by telling the machine what to do, step-by-step, and how to do it, in order to accomplish the goal. The machine then executes the program in a mechanical way.
Predictive analytics is the practice of extracting information from existing data sets in order to determine patterns and to predict future outcomes and trends. It does not say what will happen, but describes the probability of future events with an acceptable level of reliability.
Electronic storage capacity, necessary for gathering and analyzing databases, is expressed in terms of bytes of digital information. The technical description of electronic data storage is formidable but we can gain an appreciation for the scaling of electronic storage capacity by noting that the various categories (kilobytes, megabytes gigabytes, terabytes, petabytes, exabytes) are 1,000x larger than each preceding category.
Digitization and Datafication
Examples of numerical data, converted into electronic form, are ubiquitous. Even non-techie risk managers have tackled Excel spreadsheets and other electronic formats containing numerical data. The term “digitization” refers to the process of converting information (including text, images, and sound) into a digital format, using binary code recognized by machines. Google launched a project in 2004 with the ambitious goal of digitizing the text of all published books into a searchable database, accessible to everyone through the Internet for free. But Google accomplished an additional step, “datafication” of all these books, by using optical character-recognition software that could discern letters, words, sentences and paragraphs, thereby releasing the true value of stored information.
Correlation versus Causation
Big data represents a paradigm shift in terms of how we understand information. Traditional sampling was based on formulating and testing a theory (if this, . . . then that) in a search for causation. But, big data does not depend upon theories or causation – rather, big data identifies correlations. If you have enough information to demonstrate a correlation, you may not need to prove cause-and-effect in order to develop precautions. For example, if analysis of population data were to find a strong correlation between having blue eyes and a 10-fold increase in the risk of going blind (not a true example), would it matter that we still did not know the cause of going blind before trying to warn blue-eyed people?
How Risk Managers Can Harness Big Data
Risk managers access EHR patient information on a daily basis. They also contribute to additional databases including quality metrics, claims & litigation, adverse event reporting, root cause analyses and more. Are risk managers analyzing available databases to extract useful information to minimize errors and to reduce the risk of patient harm?
In order to improve our understanding, it may be helpful to consider the experiences of Google with predicting flu outbreaks. In 2009, a new flu strain (H1N1) was spreading and the Centers for Disease Control and Prevention requested that physicians report new flu cases. But, there was a lag of a week or two between the CDC’s receipt of physician reports and the tabulation/publication of the flu data. Researchers from Google claimed that they could “nowcast” the flu based on searches stored in Google’s database. Google designed a system to examine millions of search terms, comparing them to CDC data about the spread of seasonal flu between 2003 and 2008, and predicting the locations of flu outbreaks in near-real time.
Google Flu Trends sought correlations between the frequency of certain search queries and the spread of the flu, over time and space. Google identified 45 search terms that, when used in a mathematical model, demonstrated a strong correlation between their predictions and official records of actual flu cases. Google Flu Trends illustrates an early application of big data – harnessing digital information in novel ways to furnish valuable insights, not otherwise available.
The experience with predicting the flu based on Google search queries should provoke food for thought for health care risk managers. What valuable information resides in our electronic databases that could improve risk management practices and make the environment safer for patients? Truthfully, we do not know. But, we are becoming more curious about investigating such sources of strategic intelligence.
How might we explore big data possibilities? Risk managers cannot simply email Tech Services and ask for a big data analysis. We need to devise a method to identify which databases may contain valuable information and plan an exploration of the data. Lacking technical expertise, we should not tackle such an exploration on our own. Under circumstances limited by scarce resources and the demands of day-to-day crises, how do we know what to do or how to do it? One thought is to avoid repeating past failures in dealing with electronic data transactions.
Could health care risk managers be sitting on a treasure trove of information stored in electronic databases without extracting valuable strategic intelligence? Can we sense patterns and trends of activities about which detailed data may already be in our possession? Might big data provide the route to exploring these potential sources of valuable intelligence and insights, useful to improve patient safety and risk management performance?
As one illustration of how risk managers might employ big data methods, think of an example of patient harm or a costly lawsuit arising from an error that has occurred over the past five years. If that event revolved around a missed, late or incorrect diagnosis, can we identify existing sources of data that might shed light on the event? Can we pose questions about how the event occurred, the answers to which could guide our efforts to improve safety or methods?
Using misdiagnosis for illustrative purposes, what existing databases might be explored to learn more about the harmful event? If a patient suffered harm, certainly that patient’s electronic medical record is one database for consideration. But, what about records for other patients admitted with the same symptoms or during the same time frame? Were quality metrics collected that pertain to that patient or to other patients similarly situated, that might represent another database? Was there a report of an adverse event and records from a root cause analysis or external reporting to regulatory agencies? Again, the possibilities are myriad.
Whom should a risk manager enlist for help on such a project? Obviously IT expertise may be necessary, but what about other personnel and resources? Is there a logical physician champion who could bring organizational clout as well as medical professional expertise? What clinicians and caregivers are available to provide detailed background and insights about the patient, the event and the facility’s response? These clinicians may be able to articulate more unanswered questions about the patient and the event. Are quality and utilization managers available to share insights? What insights might be gained from analysis of laboratory, imaging or pharmaceutical databases?
Ultimately, the risk manager must provide leadership to articulate the goal of the big data investigation, obtain management buy-in for budgeted time and resources and participate in the design, planning and execution of the project. Remember, a big with big data, there is no need to construct a theory for testing or an intuitive answer to the question. All the data is available for study. The data must be allowed to speak for itself.
Dan Groszkruger, JD, MPH, CPHRM, DFASHRM, leads rskmgmt.inc, a consulting firm serving the patient safety and health care risk management fields. He is a health care attorney, former hospital executive, in-house counsel, compliance officer and risk manager. Groszkruger is a regular presenter, author and faculty member for ASHRM and SCAHRM. He has served on the ASHRM Board of Directors, CPHRM oversight committee, ASHRM Journal of Healthcare Risk Management and ASHRM Education Development Task Force.
 Mayer- Schönberger, Viktor, and Cukier, Kenneth. Big Data – A Revolution That Will Transform How We Live, Work, and Think. Mariner Books/Houghton Mifflin Harcourt, New York 2013. Chapter 4, pp. 59-60.
 Mayer- Schönberger, supra. Chapter 2, pp. 26-30.
 However, the extent to which exit polling mis-forecasted the results of the November 2016 presidential election is a reminder of the inherent limitations of “sampling” that is, e.g., non-random, or poorly-designed.
 Murphy DR, et al. “Application of Electronic Algorithms to Improve Diagnostic Evaluation for Bladder Cancer.” Applied Clinical Informatics 279, Schattauer 2017. (downloaded from www.aci-journal.org on 3.30.17.)
 “How hospitals are using predictive analytics.” Hospitals & Health Networks, Vol 91, no. 2, February 15, 2017
 Siegel E, “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die.” John Wiley & Sons, Jan 2016
 Mayer- Schönberger, supra. Chapter 5, pp. 83-86.
 Ibid. pp. 73-83.
 Ibid. Chapter 4, pp. 50-72.
 Lazar D, Kennedy R, “What We Can Learn from the Epic Failure of Google Flu Trends.” Wired, Oct 1, 2015. (available at: http://www.nature.com/nature/journal/v457/n7232/full/nature07634.html)
 Ginsburg J, et al. “Detecting Influenza Epidemics Using Search Engine Query Data.” Nature 457 (2009) pp. 1012-14.
 Mayer- Schönberger, supra. Chapter 1, pp.1-3.
 Despite a “frenzy” of media attention following publication of the 2009 Nature article, Google Flu Trends suffered a widely publicized failure to accurately predict the 2012-2013 flu season, and subsequently was discontinued. See: Lazar D, supra.
 Permissible without obtaining patient consent, subject to other privacy and security rules, under HIPAA’s treatment, payment, and operations (TPO) exceptions.