Collecting and Managing Data

http://evolve.elsevier.com/Grove/practice/

Data collection is one of the most exciting parts of research. After all the planning, writing, and negotiating, you should be eager and well prepared for this active part of research. The passion that comes from wanting to know the answer to your research question brings a sense of excitement and eagerness to start collecting your data. However, before you leap into data collection, you need to spend some time carefully planning this adventure and pilot test each step. Planning data collection begins with identifying all the data to be collected. The data to be collected are determined by the research questions, objectives, or hypotheses of the proposed study. As you develop the data collection plan, be sure that you gather all the data needed to answer the research questions, achieve the study objectives, or test the hypotheses. Chapter 16 includes detailed information about measurement, so the focus in this chapter is on the logistical and pragmatic aspects of quantitative data collection. Data collection strategies for qualitative studies are described in Chapter 12.

To start planning the data collection process, you need to determine the best mode by which the data can be collected. Factors that influence the plan to collect and enter data into a database for analysis include cost, time, the availability of assistance, and the need for consistency. The development of the data collection plan is followed by developing data collection forms and a codebook for data entry. Conducting a pilot test with a small group of subjects is the next recommended step. The pilot test may result in modifications of the plan, and then the actual data collection can begin. During data collection, various problems may arise. Potential situations are described in this chapter along with problem-solving strategies. The chapter concludes with the discussion of data entry and management.

Data Collection Modes

Data can be collected by interview (face-to-face or telephone); observations; focus groups; self-administered questionnaires (online or hard copy); or extraction from existing documents such as patient medical records, motor vehicle department accident records, or state birth records (Figure 20-1). Many factors need to be considered when a researcher is deciding on the mode for collecting data. Harwood and Hutchinson (2009) describe four factors that need to be part of your decision-making process: (1) purpose and complexity of the study, (2) availability of financial and physical resources, (3) characteristics of study participants and how best to gain access to them from the population, and (4) your skills and preferences as a researcher.

Im et al. (2007) conducted a survey in the United States of gender and ethnic differences in the experience of cancer pain. These researchers administered their questionnaire over the Internet and through a paper-and-pencil format based on subject preference. The following excerpt describes the data collection procedure for their study:

“To administer the Internet questionnaire, a Web site conforming to the Health Insurance Portability and Accountability Act standards, the System Administration, Networking, and Security Institute Federal Bureaus of Investigation recommendations, and the Institutional Review Board [IRB] policy of the institution where the researchers were affiliated was developed and published on an independent, dedicated Web site server. When potential participants visited the project Web site, informed consent was obtained by asking them to click a button labeled I agree to participate. After this, questions on specific diagnoses, cancer therapies, and medications were asked, and the appropriateness of answers was checked automatically through a server-side program; participants were connected automatically to the Internet survey web page if the answers were appropriate.

“Upon request, pen-and-pencil questionnaires were provided by mail to the community consultants, who distributed the questionnaires in person only to those who were identified as cancer patients. These questionnaires accompanied hard copies of the same informed consent form included in the Internet format of the questionnaire, and the pen-and-pencil questionnaire included a sentence ‘Filling out this questionnaire means that you are aged over 18 years old and giving your consent to participate in this survey.’ After the self-administered questionnaires were completed, community consultants retrieved all except five (these were mailed directly to the research team by the participants) in person at the community settings and mailed them to the research team. Supplementing pen-and-pencil questionnaires was essential to recruit the target number of ethnic minority cancer patients across the nation who did not have access to the Internet but were interested in participating in the study. Among the 276 participants who were recruited through community settings, 246 … used the pen-and-pencil questionnaires. … There were no statistically significant differences in psychometric properties between the Internet format and the pen-and-pencil format of the questionnaire. … It took an average of 30-40 minutes for the participants to complete either the Internet format or the pen-and-pencil format of the questionnaire.” (Im et al., 2007, pp. 299-300)

Im et al. (2007) maximized their sample size and obtained a more representative sample by giving participants an option to complete their questionnaire on the Internet or using paper-and-pencil format. The researchers took steps to ensure that the data collected by the two formats were comparable by testing for significant differences and finding none. The time to complete the Internet and paper-and-pencil questionnaires did not vary. Im et al. (2007) also ensured that an ethical study was conducted and subjects’ rights were protected.

The additional advantage of Internet data collection is that responses can be time/date stamped. For example, if subjects are instructed to complete the questionnaire before bedtime, the time can be verified. If subjects are instructed to complete a daily diary, date of entry would be documented, and subjects would be discouraged from entering all diary days on the last day just before returning the diary to the researcher (Fukuoka, Kamitani, Dracup, & Jong, 2011).

Computer-Based Data Collection

With the advent of laptop and tablet computers, data collectors can code data directly into an electronic file at the data collection site. If a computer is used for data collection, a program must be written for entering, cleaning, and storing data. A computer enables users to collect large amounts of data with few errors that can be readily analyzed with a variety of statistical software packages. In addition to researchers using technology at the point of data collection to record data, technology has made it possible to interface physiological monitoring systems with computers for data collection. An advantage of using computers for the acquisition and storage of physiological data is the increased accuracy and precision that can be achieved by reducing errors associated with manually recording or transcribing physiological data from a monitor. Another advantage is that more data points can be recorded electronically than could be recorded manually. Computers linked to physiological monitoring systems can store multiple data for multiple indicators, such as blood pressures, oxygen saturation levels, and sleep stages. Because data can be electronically recorded, data collection is less labor intensive, and the data are ready to analyze more quickly. The initial cost of equipment may be high, but it is reasonable when the cost of hiring and training human data collectors is considered.

There are some concerns with the use of computerized data acquisition systems, but physiological data are usually best gathered and stored directly into a computer database to ensure accurate, complete data collection. Physiological data typically require large computer storage space. The computer-equipment interface may require more space in an already crowded clinical setting; when possible, existing equipment should be used to collect data. Purchasing the equipment, setting it up, and installing the software can be time-consuming and expensive at the start of your project. Thus, initial studies usually require substantial funding. Another concern is that the nurse researcher may focus on the machine and technology and neglect observing and interacting with the subject.

The most serious disadvantage of computerized data collection is the possibility of measurement error that can occur with equipment malfunctions and software errors. Regular maintenance and calibrations, or reliability checks of the equipment and software, reduce this problem. The benefits of collecting repeated measures over time may outweigh the risk of missing data because of poor compliance. For example, collecting continuous rectal temperature data from a subject is easier and less burdensome than asking the subject to measure an oral temperature every 1 to 2 hours.

Savian, Paratz, and Davies (2006) conducted a single-blind randomized, crossover study with 14 mechanically ventilated intensive care unit patients.

“[The purpose of the study was to determine the effectiveness of] manual hyperinflation (MHI) and ventilator hyperinflation (VHI) on respiratory mechanics (static compliance [C_st]), oxygenation (arterial oxygen tension [PaO₂]/fraction of inspired oxygen [FIO₂] ratio), and secretion removal (wet weight of sputum and peak expiratory flow rate [PEFR]) at different levels of PEEP [positive end-expiratory pressure] … a secondary aim was to investigate the hemodynamics heart rate [HR], mean arterial pressure [MAP] and metabolic response (carbon dioxide output [VCO₂]) during MHI and VHI.” (Savian et al., 2006, p. 335)

The computerized systems used to collect and record data in the study by Savian et al. (2006) are detailed in the following excerpt:

“PEFR and CO₂ [carbon dioxide] production were measured using a flow and CO₂ sensor connected to the patient’s airways and to the CO₂SMO [carbon dioxide] respiratory mechanics monitor (CO₂SMO Plus Model 8000, Novametrix Medical Systems Inc., Wallingford, CT). All information from the CO₂SMO monitor was simultaneously recorded in the Analysis Plus computer program.

“Static lung compliance was recorded by the static measures function device on the Bennett 7200 ventilator where a plateau pressure was obtained by including an inspiratory pause of 2 seconds into the mandatory breath. …

“PaO₂/FIO₂ ratio was calculated from arterial blood samples taken immediately before and immediately after MHI and VHI. Four milliliters of arterial blood were drawn into a syringe containing heparin and analyzed by a blood gas machine (Bayer Australian Limited 865, Pymble, NSW, CAN 000128 714). This procedure was standardized across subjects.

“HR and MAP were read directly from the monitoring system (Merlin pressure module M1006A Hewlett Packard, Palo Alto, CA) and recorded every minute before, during, and for 5 minutes after MHI and VHI.” (Savian et al., 2006, p. 336)

The use of computerized data collection by Savian et al. (2006) enabled them to collect repeated measures on several physiological variables in an accurate and precise way. The data were collected by sensors and stored in the computer to reduce error and facilitate data analysis.

Phones and Other Electronic Devices

Software applications for mobile phones have evolved from personal digital assistants (PDAs) that allow the researcher to collect and download data directly into the computer from observations as they occur. Healthcare providers load applications that facilitate accurate assessment, diagnosis, and pharmacological and nonpharmacological management of patients. PDAs are also used to store deidentified data from office computers in a form that is easily transportable. PDA software is currently available that may help nurse practitioners collect data for research. Multiple nurse practitioners involved in a research project could forward data electronically from PDAs to a central research site for analysis. Encrypted electronic devices are needed to protect the confidentiality of data during transmission. These electronic devices can be misplaced or stolen, threatening confidentiality. Researchers need to protect the data with a security code to ensure that no one but themselves can access data in these formats.

Mobile phones and computers are becoming more similar with the increased sophistication of applications for mobile phones. Some of these applications can be used to collect various data. Other electronic devices include pill containers that record when pills are accessed and watches with timers to remind participants to take certain health-related actions. However, the use of these devices for research may require considerable preparation. You may need to hire programmers with the needed expertise, and you may need to purchase, rent, or borrow the needed number of devices or monitors.

Factors Influencing Data Collection

When planning data collection, cost, time, the availability of assistance, and the need for consistency are critical factors to consider. The researcher balances these factors with the need to maintain the reliability and validity of the study in the development of the data collection plan.

Cost Factors

Cost is a major consideration when planning a study. Measurement tools, such as continuous electrocardiogram monitors (Holter monitor), wrist activity monitors (accelerometers), spirometers, pulse oximeters, or glucometers, used in physiological studies may need to be rented, purchased, or loaned from the manufacturer or other company. You may need to pay a fee to use instruments or questionnaires. Some instruments and questionnaires are available only if a copy is purchased for each participant. Data collection forms may need to be formatted or developed for electronic use. In some cases, printing costs for materials such as teaching materials or questionnaires that will be used during the study must be considered. Providing the required copy of the signed consent form doubles the expense of consent forms. Small payments to participants in the form of cash or gift cards should be considered as compensation for a subject’s time and effort in providing the data. Sometimes childcare may need to be provided for parents and other caregivers who would not otherwise be able to participate in your study. In some studies, postage is an additional expense. There may be costs involved in coding the data for entry into the computer and for conducting data analyses. Consultation with a statistician early in the development of a research project and during data analysis must also be budgeted. You may need to hire someone who can remain blinded for data entry or analysis or someone who can type the final report, develop graphics or presentations, or type and edit manuscripts for publication.

In addition to the above-described direct costs of a research project, there are costs associated with the researcher’s time and travel to and from the study site. You also must estimate the expense of presenting the research findings at conferences and include those expenses in the budget. To prevent unexpected expenses from delaying the study, examine all costs in an organized manner. A budget is best developed early in the planning process and revised as plans are modified. Seeking funding for at least part of the study costs can facilitate the conduct of a study.

Time Factors

Researchers often underestimate the time required for participants to complete data collection forms and for the research team to recruit and enroll subjects for a study. The first aspect of time—the participant’s time commitment—must be determined early in the process because the time needed for participant involvement must be included in the informed consent process and document. While conducting your pilot study, make note of the time required to collect data from a subject. You may need to revise your timeline and consent form to reflect the expected time commitment accurately.

The second aspect of time—the time needed to complete data collection—is especially challenging to predict because events during the data collection period sometimes are not under the researcher’s control. For example, a sudden heavy staff workload may make data collection temporarily difficult or impossible, or the number of potential subjects might be reduced for a period. In some situations, researchers must obtain permission from each subject’s physician before they are permitted to collect data on that subject. Activities required for this stipulation, such as contacting physicians, explaining the study, and obtaining permission, require extensive time. In some cases, potential subjects are lost before the researcher can obtain the mandatory permission, extending the time required to obtain the necessary number of subjects.

How long will it take to identify potential subjects, explain the study, and obtain consent? How much time will be needed for activities such as completing questionnaires or obtaining physiological measures? Novice researchers have difficulty making reasonable estimates of time and costs related to a study. Validating the time and cost estimates with an experienced researcher can be very informative. Experienced researchers know the challenges of data collection and have learned that data collection may take two to three times longer than predicted. If the cost and time factors are prohibitive, you may need to simplify your study so that fewer variables are measured, fewer instruments are used, or fewer subjects are needed. Make the design less complex, and use fewer data collectors. A blinded intervention study involves more research staff and is generally not feasible for a novice researcher. These are serious modifications, however, with implications for the validity of the findings, so you and your team should thoroughly examine the consequences before making such revisions. If preliminary time or cost estimates go beyond expectations, you can revise the time schedules and budget with new projections for completing the study.

Consistency

Consistency in data collection across subjects is critical. What time of year will data be collected? For example, if you collect data during holiday seasons, data about sleeping, eating, or exercising may vary. Pediatric patients with asthma may experience more symptoms during the winter months than during summer. Planning data collection for a study of symptom management with this population would need to take this possibility into consideration.

The specific days and hours of data collection may influence the consistency of the data collected and must be carefully considered. For example, the energy level and state of mind of subjects from whom data are gathered in the morning may differ from that of subjects from whom data are gathered in the evening. With hospitalized study participants, visitors are more likely to be present at certain times of day and may interfere with data collection or influence participant responses. Patient care routines vary with the time of day. In some studies, the care recently received or the care currently being provided may alter the data you gather. The subjects you approach on Saturday to participate in the study may differ from the subjects you approach on weekday mornings. Subjects seeking care on Saturday may have a full-time job, whereas subjects seeking care on weekday mornings may be either unemployed or too ill to work.

Who will collect the data? If you decide to use data collectors, they must be trained in responsible conduct of research and issues of informed consent, ethics, and confidentiality and anonymity (see Chapter 9). They must be informed about the research project, familiar with the instruments to be used, and have equivalent training in the data collection process. In addition to training, data collectors need written guidelines or protocols that indicate which instruments to use, the order in which to introduce the instruments, how to administer the instruments, and a time frame for the data collection process (Harwood, 2009; Kang, Davis, Habermann, Rice, & Broome, 2005).

If more than one person is collecting the data, consistency among data collectors (interrater reliability) must be ensured through testing (see Chapter 16). The training needs to continue until interrater reliability estimates are at least 85% to 90% agreement between the expert and the trainee or trainees. Waltz, Strickland, and Lenz (2010) suggest that a minimum of 10% of the data needs to be compared across raters before interrater reliability can be adequately reported. The trained data collector’s interrater reliability with the expert trainer should be assessed intermittently throughout data collection to ensure consistency from the first to the last participant in the study. Data collectors also must be encouraged to identify and record any problems or variations in the environment that affect the data collection process. The description of the training of the data collectors is usually reported in the methods section of an article so that others can assess the data collection process (Harwood & Hutchinson, 2009).

Availability of Assistance

Who is going to help you with the study? If you are a student, will your mentor or supervising faculty member participate? Does your mentor or supervising faculty member have research assistants who could assist in your study? Will nurses, physicians, and other health professionals assist with recruitment? Do they have time to do this? Are they willing to help?

Will the researcher collect all the data, or will data collectors be employed for this purpose? Can data collectors be nurses working in the area? Data collection may be delayed when nurses providing patient care are also expected to be data collectors. Even when a nurse agrees to help you with subject recruitment or data collection, patient care takes priority over data collection and increases the risk for missing data or missing the opportunity to enroll eligible subjects.

If clinicians are going to recruit subjects or collect data, the clinicians need to complete training for protection of human subjects during research. An IRB requires documentation of this training for each person involved in recruitment and data collection. If you are going to be doing all the data collection yourself, will you be available every day of the week? What hours will you be available? If others will be involved in collecting data, allow time for training on data collection procedures. You need to be available by telephone or other means for questions and emergencies when others are collecting data for your study. Keeping these factors in mind, you are now ready to plan the data collection process for your study.

Data Collection and Coding Plan

The factors of cost, time, availability of assistance, and need for consistency shape the data collection plan that you develop. A data collection plan details how you will implement your study. The plan for collecting data is specific to the study being conducted and requires that you consider some common elements of research. You need to map out procedures you will use to collect data, anticipate the time and cost of data collection, develop data collection forms that ease data entry, and prepare a codebook that will help you to code the variables to be entered in a database. This extensive planning increases the accuracy of the data collected and the validity of the study findings. The validity and strength of the findings from several carefully planned studies increase the quality of the research evidence that is then available for implementing into clinical practice (Melnyk & Fineout-Overholt, 2010).

Identifying data include variables such as patient record number, home address, and date of birth (see Chapter 9). Avoid collecting these data unless they are essential to answer the research question. For example, collect a patient’s age instead of date of birth. Review regulations by the Health Insurance Portability and Accountability Act about the participant’s private health information (www.hhs.gov/ocr/hipaa).

The methodology of a study may include contacting subjects later for additional data collection. In this case, you will need to obtain the subject’s address and telephone number and protect the information appropriately. Names and phone numbers of family members or friends may also be useful if subjects are likely to move or may be difficult to contact. This information can be obtained only with subjects’ permission as part of their informed consent. Consider the importance of each piece of data and the subject’s time required to collect it. If the data can be obtained from patient records or any other written sources, you do not need to ask the subject to provide this information. To collect data from a patient’s records, make sure to include permission to do this in the consent form, and ensure that the IRB has authorized your team to do this.