Big Data Analysis of Electronic Health Record (EHR) Data


Big Data Analysis of Electronic Health Record (EHR) Data

Roy L. Simpson


This chapter examines data science’s most significant breakthrough—big data—from a nursing perspective. This chapter includes describing the value of big data to nursing practice and research, the role of maintaining a perspective on compassion in today’s society, and how nurses can participate in big data preparation through nursing documentation.

Just hearing the term, “big data,” triggers confusion for many people, especially those in healthcare. Many think electronic health records (EHRs), which contain thousands of data points collected from across the continuum of care, are big data—that is only a small part of given health data. In this chapter, a strategy for using EHRs will be provided, and a foundation for big data will be described.

Additional confusion stems from the fact that big data sits at the intersection of social science and statistics, and information and computer science—disciplines that, for the most part, are outside of nursing. Finally, the nursing profession will require more specifically trained informaticians and researchers to efficiently and effectively mine the massive amounts of data nurses collect on every shift in many places of patient care.


In the early 1980s, the nurses at the National Institutes of Health were grappling with the retrieval of data produced from automating what nurses documented. They calculated from the first electronic medical records at the NIH Clinical Center that a nurse collects more than 106,000 data points in a single shift. These systems did not meet four of the v’s as later described. The systems were based on hierarchical data structures and storage without artificial intelligence programmed in Mumps. While large data amounts, they did not align with today’s terms of a relational database for cleansing and aligning to a data dictionary. This research analysis was pioneering to our build-out of today’s big data and our future in big data.

A seminal paper entitled “Application-controlled demand paging for out-of-core visualization” (Cox & Ellsworth, 1997) is thought to be the first mention of the big data concept. However, it was not until 11 years later that big data mainstreamed its way into healthcare in “Bigdata computing: Creating revolutionary breakthroughs in commerce, science, and society” (Bryant, Katz, & Lazowska, 2008). Both of these important papers warrant a read to understand the underpinnings of today’s push for big data in healthcare.

Over the past decade, there have been numerous papers written on defining the elements of big data. In 2013 Paul Henchey described big data for healthcare providers having components of Volume, Velocity, and Variety. At that time, the healthcare providers were described as producing big data from laboratory results, and Medicare claims data and consumer searches of medical literature. Velocity was predicted to be for predictive analytics for clinical decision support, gaps in care alerts, and prepayments fraud alerts. The variety was predicted to result from the multitude of formats from ambulatory care EHR data, XML, and unstructured text documents, and genomic maps and medical device streams.

In 2015, the HIMSS CNO-CNIO Roundtable prepared a white paper on big data and relied upon the definition by Gaffney and Huckabee in 2014 that also included Veracity (Gaffney & Huckabee, 2014). The nursing group defined the need for Veracity to assure the integrity, accuracy, and trustworthiness of data. They further delineated the volume of data would grow to a massive amount due to the research on genomics. Future models for patients include other ‘omics, i.e., genomics, epigenomics, lipidomic, proteomics, glycomics, foodomics, transcriptomics, metabolomics, pharmacogenomics, and toxicogenomics adding culture and more to the future of EHRS. In addition to the ‘omics, the symptom management science is adding to the volume of data for nursing to analyze and mine. Together with symptom management science and pharmacogenomics are contributing to precision health at the National Institutes of Health (NINR Symptom Science, 2019). It was reported in 2014 by Savage that the combined genome of normal and cancer in a single patient is 1 terabyte (1012 bytes), and 100 genomes and ‘omics in multiple patients would result in 1 exabyte (1018 bytes) of data (Savage, 2014). The estimated cost of storing and analyzing these data was estimated at $100 million per year.

The HIMSS white paper further described four principles that would be required to use big data for nursing. They included privacy and security of health information, data standards including common formats, and interoperability to provide the ability to exchange data in comparable and meaningful ways. The focus of big data in nursing was defined as clinical, pharmaceutical, activity and cost data, and patient behavior and sentiment data. They described that the use of big data analytics would impact nursing’s role in precision health because of the volume of data resulting from genomics across the continuum of care. This group predicted that with standardized data captured, nurses could use big data to improve quality, outcomes, and reduce the cost of care.

The concept of Value was added by nurses who focused on the conversion of data to information to knowledge to wisdom, which would result in Value to big data analytics of quality data, outcomes, and reduced costs (Westra et al., 2017). This excellent review of exemplars of big data analytics describes 17 studies by nurse researchers using EHR data in multiple environments.

Data computing leader IBM’s Big Data and Analytics Hub also sets out four key characteristics to further describe big data from a computing standpoint (IBM Big Data & Analytics Hub, n.d.):

1.   Volume—the scale of data. A quintillion of data is created every day. A quintillion equals a 1 followed by 18 zeroes.

2.   Velocity—the analysis of streaming data. A modern car has more than 100 sensors, each of which collects, analyzes, and compares readings on a nearconstant basis.

3.   Variety—different forms of data. More than 420 million wearable, wireless health monitors are in use today, and each collects different types of data in different formats.

4.   Veracity—the uncertainty of data. Of every three business leaders, one does not trust the information he or she uses to make decisions.

While each of these characteristics is important, Veracity has an incredibly high impact on patient care, which needs to be based on evidence. What may have been a standard best practice a few years ago has likely been advanced through evidence-based research since most nurses received their education. Nurses must stay up-to-date on the latest research and incorporate these findings into the way they care for patients. Nurses must take professional obligations seriously to practice evidence-based nursing.

More recently, Simpson defined big data at ANIA in 2019 keynote speech as: “The slight twists and turns of new and old data create a smorgasbord of new information deriving from a kaleidoscope of actions within mathematics and statistics, delivering new knowledge for applications into precision care for patients” (Delaney, Weaver, Warren, Clancy, & Simpson, 2017; Simpson, 2019). This definition includes a subtle but critical nuance. Looking at data from a different perspective will produce a different result—every time. These differences are not about context as much as they are about the small changes in queries that align the data to a different conclusion. For example, asking how many nurses were working on a service floor, the answer could reflect a total of 20 nurses. If you refined the query to ask how many nurses were providing direct patient care, the answer would likely be less, perhaps only 10.


As big data entered the clinical vernacular, one thing became clear: Big data is traditional data collection on steroids. Think about every electronic device you own or use. These devices continuously collect data about you—even when you are sleeping:

•   Your phone knows whom you call most frequently and how recently you spoke to each of them.

•   Your tablet knows what you made for dinner and which ingredients in your pantry need to be replenished.

•   Your GPS knows where you went yesterday, how long it took to get there if there was a better route available and your average rate of speed.

•   Financial apps calculate your net worth with up-tothe-minute accuracy.

•   Even your bed knows how well you slept during the night and what could be done to improve your sleep.

These expanding collections of data have already overtaken humans’ ability to comprehend or use it all. In all, 90% of the data we now know was created in the past two years. Consider these mind-boggling data stats (Marr, 2018):

•   3.7 billion humans use the Internet every day.

•   More than half of all searches happen on a mobile phone.

•   Every day, Google processes 3.5 billion searches— that’s 40,000 queries a minute.

•   Every minute of every day, 456,000 tweets go out.

•   1.5 billion people spend time on Facebook every day.

•   16 million text messages are sent each minute.

•   Every minute, 103,447,520 spam emails are sent.


Until recently, most Americans had their prescriptions filled at the drug store closest to their home or office. Now, a service company called Good RX has launched a national experiment to use transparent pricing and coupons to change this “closest to me” consumer behavior for pricing. On, you can compare the prices of 70,000 FDA-approved drugs by a drugstore, right down to zip code. Zipcodes historically define pricing in drugs. You can download coupons to save when you have your prescription filled or refilled (, 2019). In this scenario, the consumer leverages big data, engaging in transparency pricing models for the pharmaceutical industry that is using financial incentives to change consumer behavior (Marsh, 2019).


Income has always been a key indicator of health. Individuals in low-income areas are more likely to suffer from environmental, infectious diseases, and nutritional deficiencies. An analysis of more than 50 million U.S. prescriptions filled in 39 of the largest Metropolitan Statistical Areas (MSAs) supports this long-held belief (, 2019).

Analysis has shown that lower-income Americans experienced depression, obesity, and diabetes more often than those with higher incomes. Also, the lower-income individuals self-reported an overall lower level of health than their counterparts from areas with a higher income. Lower-income individuals filled less than 105 prescriptions per 1000 people in 2018 (Marsh, 2019).

At the other end of the economic spectrum, individuals with higher incomes were more likely to fill prescriptions for “lifestyle conditions,” such as eyelash growth, erectile dysfunction, hair loss, rosacea, facial wrinkles, and skin discoloration. In 2018, higher-income people filled approximately 200 prescriptions per 1000 people.

In addition to the difference between the fill rate for lower-income people, which was about half of the rate for the higher-income group, another discrepancy stood out. While mental health conditions such as attention deficiet hyperactivity disorder (ADHD), alcohol addiction, anxiety, bipolar disorder, depression, eating disorders, fatigue, panic disorder, and obsession-compulsion disorder were more prevalent in lower-income populations, the prescription fill rate did not support this assumption. The research team pointed to more limited access to treatment and fewer resources to use in conjunction with prescriptive fills as two reasons for this disconnect.

Only gold members can continue reading. Log In or Register to continue

Jul 29, 2021 | Posted by in NURSING | Comments Off on Big Data Analysis of Electronic Health Record (EHR) Data

Full access? Get Clinical Tree

Get Clinical Tree app for offline access