Data and Data Processing


Data and Data Processing

Irene Joos / Cristina Robles Bahm / Ramona Nelson


In 2012, the White House released a document titled FACT SHEET: Big Data Across the Federal Government that lists big data projects that the Federal Government has undertaken (The White House, 2012). This document described initiatives in a wide range of government agencies from the Department of Veterans Administration, Department of Health and Human Services, Food and Drug Administration, and the National Institutes of Health.

There are also a number of other government agencies such as the Department of Defense, Homeland Security, and the Office of Basic Energy Sciences with big data projects that directly or indirectly impact the healthcare community. These projects demonstrate that the Federal Government is using data and especially the big data revolution to advance scientific discovery and innovation in a number of areas including the delivery of quality healthcare and personalized healthcare. Recently the Health Resources and Services Administration (HRSA) opened a Data Web site ( “dedicated to making high value health data more accessible to entrepreneurs, researchers, and policy makers in the hopes of better health outcomes for all” (US Department of Health and Human Services, 2018, para 1). This site provides data sets for areas such as maternity health, HIV, transplants, as primary care. It provides interactivity where one can interact with the site through query tools, interactive maps, dashboards, and a few more (Marcus, 2018).

There are also several health IT legislative acts directly impacting data, data processing, and data management in healthcare. These generally deal with data security, privacy, transmission, access, data exchanges, and interoperability (Office of National Coordinator for Health Information Technology, 2019). A list of the key legislative initiatives includes the following:

•   21st Century Cures Act (Cures Act)

•   The Medicare Access and CHIP Reauthorization Act of 2015 (MACRA)

•   The Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009

•   Section 618 of the Food and Drug Administration Safety and Innovation Act (FDASIA) of 2012

•   The Health Insurance Portability and Accountability Act (HIPAA) of 1996

•   Affordable Care Act of 2010

Additional information describing how each of these laws directly impacting data, data processing, and data management in healthcare can be accessed at https:/

In modern healthcare, the process of moving from data collection to implementing and evaluating the care provided to individuals, families, and communities is highly dependent on automated database systems and the ability of nurses to effectively use sites such as the HRSA’s Data Web site. The goal of healthcare and the movement to big data and data analytics is to drive quality care at lower costs through reducing overutilization of services, improving coding and billing practices, empowering patients, measuring trends, predicting outcomes, and examining how improved workflow and productivity can influence quality outcomes (Barlow, 2013). This chapter introduces the nurse to basic concepts, theories, models, and issues necessary to understand the effective use of automated database systems and to engagement in dialog regarding data analytics, benchmarks, dashboards, and outcomes.


The Data, Information, Knowledge and Wisdom Model (Nelson D-W) depicting the megastructures and concepts underlying the practice of nursing informatics was included for the first time in the 2008 American Nurses Association (ANA) Scope and Standards of Practice for Nursing Informatics (American Nurses Association, 2008). In this document the model was used to frame the scope of practice for nursing informatics. This change meant that the functionality of a computer and the types of applications processed by a computer no longer defined the scope of practice for nursing informatics. Rather the goals of nursing and nurse–computer interactions in achieving these goals defined the scope of practice. In other words, technology does not define the practice, rather the practitioners’ use of technology to meet the goals of nursing care defines the practice.

The first version of the model was published in 1989 and included only a brief definition of the concepts (Nelson & Joos, 1989). Since that initial publication there have been three additional versions of the model published. Each revision of the model attempted to better illustrate the overlapping nature of the four concepts of data, information, knowledge, and wisdom and the complex interaction between and within each of these four concepts as well as the environment (Nelson, 2018; Ronquillo, Currie, & Rodney, 2016).

The Nelson data to wisdom continuum moves from data to information to knowledge to wisdom with constant interaction within and across these concepts as well as the environment (Joos, Nelson, & Smith, 2014; Nelson, 2018). As shown in Fig. 6.1 data are raw, uninterrupted facts without meaning. For example, the following series of numbers are data, with no meaning: 98, 116, 58, 68, 18. Reordered and labeled as vital signs they have meaning and now represent information: temperature 98.0, pulse 58, respirations 18, and blood pressure116/68. These data provide information on a person’s basic condition. Using the nurse’s knowledge this information is then interpreted. The nurse fits these data into a pattern of prior knowledge about vital signs. For example, if the nurse records these vital signs as part of a physical for a high school athlete, they are in the normal range; however, if these same numbers were part of an assessment on an elderly patient with congestive heart failure, the low pulse and blood pressure could suggest a problem. Context and pattern knowledge allow the nurse to understand the meaning and importance of the data and to make decisions about nursing actions with regard to the information. While data by themselves are meaningless, information and knowledge by definition are meaningful. When the nurse uses knowledge to make appropriate decisions and acts on those decisions the nurse exhibits wisdom. In other words, the nurse demonstrates wisdom when the nurse synthesizes and appropriately uses a variety of knowledge types within nursing actions to meet human needs.


• FIGURE 6.1. The Nelson Data to Wisdom Continuum. Revised Data Information Knowledge Wisdom (DIKW) Model-2013 Version (Copyright © 2013 Ramona Nelson, Ramona Nelson Consulting. All rights reserved. Reprinted with permission.)

To place data in context to allow production information, one must process the data. This means one must label or code and organize the data so that one can identify patterns and relationships between the data thereby producing information. When the user understands and interprets the patterns and relationships in the data, knowledge results. Finally, the user applies the knowledge as the basis for making clinical judgments and decisions and choosing nursing actions for implementation. The “data to information to knowledge to wisdom” progression is predicated on the existence of accurate, pertinent, and properly collected and organized data. This means the data must be generated, stored, curated, retrieved, interpreted, and used.


Data Definition—Context

Data is “a fact represented as an item or event out of context” (Mullins, 2013, p. 686). Data alone do not provide insights. As noted in the previous section, without context it is difficult to make judgments on data alone. It is because of this that data are presented here as a collection of data processes for the storage, curation, retrieval, and interpretation of data with the end goal being to gain wisdom.

Data States

When discussing digital data, it is important to discuss the three states of data—data at rest, data in motion, and data in use (Rouse & Fitzgibbons, 2019). Data states can change quickly and often so it is important to understand these states to ensure that sensitive information is secure. This is especially true in businesses such as healthcare, banking, or businesses with strong compliance requirements. Data at rest generally refer to data on storage devices such as a removable one such as a USB thumb drive, a hard drive, a file server, a cloud sever, or offsite backup servers. This is archived data that rarely change. Patient’s past medical records data are considered data at rest. In today’s cybercrime world, it is important to protect these data from unauthorized access and use. These data are subject to security protocols to protect the confidential nature of these data.

Data in use refer to data that the information system is currently updating, accessing, reading, or processing. This is its most vulnerable state as it becomes open to access or change by others. Some of these data may contain sensitive data-like social security numbers, birth dates, health insurance numbers, results of diagnostic tests, and so forth. One can attempt to secure these data in use through passwords and user IDs, but these are only as secure as the person’s ability to keep that information private, and the nature of the encryption technology used.

Data in motion are data moving between applications, between locations within a computer system (RAM to hard drive, files are moved or copied from one folder to another) , over the network, or over the Internet. Data in motion are an increasing concern in healthcare because streaming data are now available from sensors, monitoring devices, mobile devices, and so forth. Monitoring activities of patients in their home places these data at risk as the data move from the source to the destination database. Increasingly, healthcare providers require access to data at the point of care through mobile devices. It is important that one encrypts these data before moving and while moving to these devices. While data in motion entail security risks, they also provide opportunities that we never imagined. For example, monitoring patients in real time in their homes can lead to improved patient care and compliance.

Data Sources—Including Patient-Generated Data and Population Health Data

Data have always been important part of healthcare. Before digitization, handwritten nurses’ and doctors’ notes, charts, and drawings provided insight when making decisions about health and health trends. Dr. John Snow plotted data to create a map of the Cholera epidemic in 1854 that showed that most of the sicknesses were concentrated around a specific pump (Johnson, 2007). The advent of computer technology powerful enough to store and analyze data has changed the way that we gather, curate, analyze, and present data in order to make the best decisions about patient health.

People and systems generate data in modern healthcare in a number of ways. From medical imagery to devices such as the Fitbit that use the Internet of Things (IoT) to patient portals and population health data, modern health care professionals have the ability to access patient data from many sources. Table 6.1 lists some examples of data sources (Fry & Mukherjee, 2018; Raghupathi & Raghupathi, 2014).

TABLE 6.1. Examples of Patient Data Sources


Data Input Operations

Since data come from a variety of sources and devices, it is important to note that one of the most important aspects of data processing is to carefully define the healthcare processes that relate to the input of data. For example, manually entered data especially in an emergency situation are at much higher risk of random data entry errors. This is the reason there are usually clear step-by-step procedures within healthcare for data entry.

Due to the variety of data sources and the nature of these sources, these data are increasingly unstructured. Data input operations, both technical and non-technical, are important because they ensure that the data going into the system are explicitly defined. Reaching a consensus and then communicating that consensus to interested parties is of crucial importance to data systems. Clearly defined definitions become even more of a challenge when dealing with unstructured data. This challenge is the primary driving force for the development of standard languages and codes in healthcare. According to Kemp, “It is this capturing of ever-greater volume, velocity and variety of data that, if harnessed effectively, provides the organization with its Big Data opportunity” (Kemp, 2014, p. 23)

Big Data

The term Big Data has gained increasing recognition over the last decade. For several decades, nurses have collected and stored data, but the ability to analyze or “do” anything with data has not come to fruition until recently. But how much is “big” exactly? Table 6.2 summarizes different sizes and examples.

TABLE 6.2. Different Sizes of Data


It is estimated that patients generate about 80 MB of data per year and that healthcare data is the source of 30% of the world’s data production (Huesch & Mosher, 2017).

The industry often defines Big Data in terms of the 4 Vs coined by IBM. They are (1) Volume, (2), Variety, (3) Velocity, and (4) Veracity (IBM, 2018). More recently two more Vs were added—Value and Variability (AndreuPerez, Poon, Merrifield, Wong, & Yang, 2015; Rouse, 2018).

Volume When speaking about the volume of big data, this means the amount of data created on a given day. It is estimated that 2.5 Quintillion bytes of data are being created each day.

Variety A second aspect of big data is the variety of data being produced and combined in order to gain insights. In terms of healthcare, this variety of data could be handwritten doctor’s notes that have been digitized, lab results, medical imaging, social media posts, etc.

Velocity The third aspect of big data as defined by IBM is the velocity of data. In short, the velocity aspect of big data describes the trend toward gathering data from sensors or other real-time data sources, such as Fitbits, that are streaming information directly into our data repository.

Veracity One of the potential pitfalls of relying on big data is that the veracity of the data is often not verified. As will be discussed in the next section, massive amounts of data are often being collected, but these data are not being cleaned or curated over time.

Value The fifth aspect of big data is clinically relevant data that bring value to both the patient and healthcare systems. The value of big data is that it can lead to valuebased patient centric care and reduced costs.

Variability Variability addresses the extent and speed that the structure of the data are changing as well as the frequency of the change. In healthcare, seasonal variations in flu strains and outbreaks of epidemics demonstrate the variability of illnesses.


Database Management Systems

A database by definition is an organized collection of data. A database management system (DBMS) is software that contains the database as well as a collection or set of programs for accessing and processing these data in the database thereby identifying relationships between the data. It is important to realize that different databases can manage the same database. A common example of this in healthcare are the many different library-based DBMSs used to access the data in the MEDLINE database. Another obvious example is the variety of electronic health record (EHR) systems that different vendors of healthcare institutions use to manage patient data.

Advantages of Database Management Systems The main advantage of a DBMS is that it imposes a structure onto the data that allows interaction between the end user and the data. In general, a DBMS allows the storage, curation, and retrieval necessary to turn data without context into data that can be used to generate information and knowledge useful in making wise patient care decisions.

The two main components of a DBMS are a “front-end” which provides an application in which a user can view, manipulate, and interpret data and a “back-end” which is where the data area stored. Figure 6.2 shows this relationship. One thing to notice is that data flow between both the front end and the back end.


• FIGURE 6.2. The Front End and the Back End of a DBMS.

This DBMS structure includes the ability to store data in a central repository as well as the ability to manage the data in a central location thereby reducing data redundancy, increasing data consistency, and improving access to data (Mullins, 2013).

Data redundancy occurs when one stores the same data in the database more than once or stores it in more than one interrelated database. In healthcare there are many examples of data redundancy. Patients may be working with several physicians all of whom may store their patient records in their own database that is not accessible by other healthcare providers or healthcare institutions, thereby requiring the patient to either provide that information again or obtain their records from the other doctor or facility. The patient’s active medication list may be in both the electronic medical record that the primary provider maintains, in a pharmacy that fills the medication prescriptions, and in the electronic record at a healthcare institution. A well-designed automated database links these records and updates them in one place, and then allows users access to it from this single location regardless of the location of the end user.

Data inconsistency results as each user working with different databases updates or changes the data. For example, when a doctor admits a patient to a hospital, different caregivers will ask the patient to identify medications he or she is taking at home. Sometimes the patient will list only prescription medications; other times the patient will include over-the-counter drugs the patient takes on a routine basis. Sometimes the patient will forget to include a medication. If caregivers record these different lists in different sections of the medical record, inconsistency occurs. In a well-designed integrated automated database, each caregiver is working with the same list each time data are reviewed. An additional problem occurs if one uses different terms for the same data. For example, sometimes one might use a generic name while other times one might use the brand name for that drug. This is why standards such as standard languages (i.e., SNOMED) are key to the design of EHRs. An automated database design that uses recognized standards as well as consistent input and access to data is imperative to creating databases necessary for the efficient and effective delivery of quality healthcare.

Client-Server Configuration Because a DBMS is a software product that allows you to structure and organize your data, there are several organizational systems that developers have developed. Three things to consider when evaluating these systems are as follows: what does the front end look like, what does the back end look like, and where is the data stored? Most modern DBMSs utilize the client-server model. In this scenario the client contains the front end and talks to the server which houses the data in the back end. The client and the server are often on different computers with the database residing on the server.

•   Cloud vs. In-house: One of the biggest developments in the recent past is cloud computing. In a cloud-hosted DBMS the back end is accessed through the Internet, while in an in-house hosted system the server that houses the database is on site.

•   Distributed vs. Centralized: One of the decisions that needs to be made is whether the data-base is going to be distributed or centralized. A centralized system is one where there is a single, central computer that hosts a database and the DBMS. Many hospitals today are examples of this type of system. The hospital is the “hub” and hosts the system where many users on the network access this database. A distributed system is one where there are multiple database files located at different sites. The main difference between these two options is one of control. In a centralized system, there is a central control mechanism. Conversely, in a distributed system there is no centralized control structure. With the changing direction of healthcare to keeping the patient out of the hospital by monitoring them at home, the digitization of all patient records, patient portals, and so forth, there is a shift to a more distributed system or decentralized system (Wiler, Harish, & Zane, 2017).

Structure of a DBMS In general, a DBMS consists of data that designers structure into tables and join by relationships. Each table consists of attributes and data points associated to those attributes. Table 6.3 shows a sample of a table. The table is named tblPatientInformation and shows the information for four patients. For this table, the attributes would be PatientID, PatientFirstname, PatientLastName, PatientAge, and PatientInsurance. Developers will assign these attributes a data type like integer, real, character, string, Boolean, etc. Data types are important as they exert some controls for preventing data entry errors.

TABLE 6.3. Sample Database Table for Patient Information


Relational Database Models The Relational Database Model is still the most popular form of DBMS, but Non-Relational Databases (e.g. MpSQ) are on the rise. In the Relational Database Model, tables are related to each other through a system of keys. Each table has a primary key which allows the system to request one record at a time. Tables can be combined in such a way to allow the system to generate reports based on all of tables. The main features of this type of a system are tables, attributes, and keys where attributes are the columns in the tables and keys are what allows us to find one record in the table. The functions they provide include creating, updating, or changing data, deleting data, and querying generally by means of Structured Query Language (SQL) statements. Examples of widely used RDBMS include Oracle, MySQL, Microsoft SQL Server, and DB2.

NoSQL Database Models NoSQL is an agile system that easily processes unstructured data and semi-structured data. It is cloud-friendly and a new way of thinking about databases. NoSQL doesn’t adhere to traditional RDMS structure, has a rich query language, and is easily scalable (MongoDB, 2019).

NoSQL includes a range of different database technologies that address the growing need for processing different data types such as unstructured and semi-structure. In a NoSQL Database Model there aren’t traditional primary keys in the system, but rather key-value stores (Mullins, 2013). This can be incredibly useful in Big Data systems where searching through all keys in the database application may take too long.

TABLE 6.4. Comparison between Relational Data and NoSQL Models

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jul 29, 2021 | Posted by in NURSING | Comments Off on Data and Data Processing

Full access? Get Clinical Tree

Get Clinical Tree app for offline access