Collecting and Managing Data

Collecting and Managing Data

Data collection is one of the most exciting parts of research. After all the planning, writing, and negotiating, you should be eager and well prepared for this active part of research. The passion that comes from wanting to know the answer to your research question brings a sense of excitement and eagerness to start collecting your data. However, before you leap into data collection, you need to spend some time carefully planning this adventure and pilot test each step. Planning data collection begins with identifying all the data to be collected. The data to be collected are determined by the research questions, objectives, or hypotheses of the proposed study. As you develop the data collection plan, be sure that you gather all the data needed to answer the research questions, achieve the study objectives, or test the hypotheses. Chapter 16 includes detailed information about measurement, so the focus in this chapter is on the logistical and pragmatic aspects of quantitative data collection. Data collection strategies for qualitative studies are described in Chapter 12.

To start planning the data collection process, you need to determine the best mode by which the data can be collected. Factors that influence the plan to collect and enter data into a database for analysis include cost, time, the availability of assistance, and the need for consistency. The development of the data collection plan is followed by developing data collection forms and a codebook for data entry. Conducting a pilot test with a small group of subjects is the next recommended step. The pilot test may result in modifications of the plan, and then the actual data collection can begin. During data collection, various problems may arise. Potential situations are described in this chapter along with problem-solving strategies. The chapter concludes with the discussion of data entry and management.

Data Collection Modes

Data can be collected by interview (face-to-face or telephone); observations; focus groups; self-administered questionnaires (online or hard copy); or extraction from existing documents such as patient medical records, motor vehicle department accident records, or state birth records (Figure 20-1). Many factors need to be considered when a researcher is deciding on the mode for collecting data. Harwood and Hutchinson (2009) describe four factors that need to be part of your decision-making process: (1) purpose and complexity of the study, (2) availability of financial and physical resources, (3) characteristics of study participants and how best to gain access to them from the population, and (4) your skills and preferences as a researcher.

Researcher-Administered or Participant-Administered Instruments

If you need a subject’s accurate blood pressure or height and weight, a self-report measure may be neither valid nor reliable for the purpose of your study. However, if the purpose of your study can be accomplished with a self-report survey method, you must decide whether the format will be researcher-administered or self-administered. It may be best for the researcher to administer self-report paper-and-pencil instruments if the potential subjects have minimal language or literacy ability, whereas it may be best to consider electronic data collection or medical record extraction if the subjects are likely to have hearing impairments, transportation problems, or physical difficulties.

If the researcher is administering the survey, will it be in person or by telephone? If self-administered, will the participant complete a pencil-and-paper copy or an online electronic copy? Internet survey centers specialize in this mode of data collection and have expert help or tutorials for assessing the best mode for your study purpose. For example, in deciding on a telephone survey, how many times will you try to reach a potential subject before you give up, what days of the week or hours of the day will you call and how might that bias your sample or their responses, and how will you accurately determine the response rate (Harwood, 2009)? If you decide on a mailed paper-and-pencil survey, what will you do with undelivered or incomplete returns? Will you search for correct mailing addresses and try again? Will you send a reminder if the survey is not received within a particular time frame, and, if so, what time frame will you give a respondent, and how many reminders will you send (Harwood, 2009)?

Electronic Data Collection

When you are using an existing instrument, you may need permission to convert the questions into an online format, a special type of form that allows the data to be scanned into a database, or into an application for a phone or other electronic device. Each of these modes of data collection may require special hardware and software. Universities, schools of nursing, and funded researchers are purchasing these sometimes expensive products because the costs of acquiring the hardware and software are considerably less than the costs of entering data manually.

Scannable Forms

Other software allows the preparation of special data collection forms that rely on optical character recognition (OCR), which requires exact placement on the page for each potential response. To maintain the precise location of each response on print copies of these instruments, careful attention must be given to printing or copying these forms. The complete form is scanned, and the answers (data) are automatically recorded in a database. Additional features include data accuracy verification, selective data extraction and analysis, auditing and tracking, and flexible export interfaces. Figure 20-2 shows the scannable version of the Parents and Newborn Screening Survey developed by Patricia Newcomb, PhD, RN, CPNP, and Barbara True, MSN, CNS. Subjects completing the survey fill in the circle that corresponds to the appropriate option for each question.

Online Data Collection

Computer software packages developed by a variety of companies (e.g., Zoomerang and SurveyMonkey) enable researchers to provide an online copy of instruments and other data collection forms. These types of software programs have unique features that allow the researcher to develop point-and-click automated forms that can be distributed electronically. The following questions need to be considered with use of these programs. For an online survey, is it a secure site for the purposes of confidentiality and anonymity? How will you ensure that only eligible participants complete the survey? Will potential subjects receive a personalized email from you with a link to a website? How will you obtain the email addresses? Can you offer help if the subjects have any questions about your study?

Online services can be easy to use for both the researcher and study participants but may be costly and require specific assurances about confidentiality of data and anonymity of subjects. The National Institutes of Health (NIH) supports a secure Internet environment for building online data surveys and data management packages (Harris et al., 2009). This service, developed by experts at Vanderbilt University, is called REDCap (Research Electronic Data Capture) and may be available at your university research site (

Im et al. (2007) conducted a survey in the United States of gender and ethnic differences in the experience of cancer pain. These researchers administered their questionnaire over the Internet and through a paper-and-pencil format based on subject preference. The following excerpt describes the data collection procedure for their study:

“To administer the Internet questionnaire, a Web site conforming to the Health Insurance Portability and Accountability Act standards, the System Administration, Networking, and Security Institute Federal Bureaus of Investigation recommendations, and the Institutional Review Board [IRB] policy of the institution where the researchers were affiliated was developed and published on an independent, dedicated Web site server. When potential participants visited the project Web site, informed consent was obtained by asking them to click a button labeled I agree to participate. After this, questions on specific diagnoses, cancer therapies, and medications were asked, and the appropriateness of answers was checked automatically through a server-side program; participants were connected automatically to the Internet survey web page if the answers were appropriate.

“Upon request, pen-and-pencil questionnaires were provided by mail to the community consultants, who distributed the questionnaires in person only to those who were identified as cancer patients. These questionnaires accompanied hard copies of the same informed consent form included in the Internet format of the questionnaire, and the pen-and-pencil questionnaire included a sentence ‘Filling out this questionnaire means that you are aged over 18 years old and giving your consent to participate in this survey.’ After the self-administered questionnaires were completed, community consultants retrieved all except five (these were mailed directly to the research team by the participants) in person at the community settings and mailed them to the research team. Supplementing pen-and-pencil questionnaires was essential to recruit the target number of ethnic minority cancer patients across the nation who did not have access to the Internet but were interested in participating in the study. Among the 276 participants who were recruited through community settings, 246 … used the pen-and-pencil questionnaires. … There were no statistically significant differences in psychometric properties between the Internet format and the pen-and-pencil format of the questionnaire. … It took an average of 30-40 minutes for the participants to complete either the Internet format or the pen-and-pencil format of the questionnaire.” (Im et al., 2007, pp. 299-300)

Im et al. (2007) maximized their sample size and obtained a more representative sample by giving participants an option to complete their questionnaire on the Internet or using paper-and-pencil format. The researchers took steps to ensure that the data collected by the two formats were comparable by testing for significant differences and finding none. The time to complete the Internet and paper-and-pencil questionnaires did not vary. Im et al. (2007) also ensured that an ethical study was conducted and subjects’ rights were protected.

The additional advantage of Internet data collection is that responses can be time/date stamped. For example, if subjects are instructed to complete the questionnaire before bedtime, the time can be verified. If subjects are instructed to complete a daily diary, date of entry would be documented, and subjects would be discouraged from entering all diary days on the last day just before returning the diary to the researcher (Fukuoka, Kamitani, Dracup, & Jong, 2011).

Computer-Based Data Collection

With the advent of laptop and tablet computers, data collectors can code data directly into an electronic file at the data collection site. If a computer is used for data collection, a program must be written for entering, cleaning, and storing data. A computer enables users to collect large amounts of data with few errors that can be readily analyzed with a variety of statistical software packages. In addition to researchers using technology at the point of data collection to record data, technology has made it possible to interface physiological monitoring systems with computers for data collection. An advantage of using computers for the acquisition and storage of physiological data is the increased accuracy and precision that can be achieved by reducing errors associated with manually recording or transcribing physiological data from a monitor. Another advantage is that more data points can be recorded electronically than could be recorded manually. Computers linked to physiological monitoring systems can store multiple data for multiple indicators, such as blood pressures, oxygen saturation levels, and sleep stages. Because data can be electronically recorded, data collection is less labor intensive, and the data are ready to analyze more quickly. The initial cost of equipment may be high, but it is reasonable when the cost of hiring and training human data collectors is considered.

There are some concerns with the use of computerized data acquisition systems, but physiological data are usually best gathered and stored directly into a computer database to ensure accurate, complete data collection. Physiological data typically require large computer storage space. The computer-equipment interface may require more space in an already crowded clinical setting; when possible, existing equipment should be used to collect data. Purchasing the equipment, setting it up, and installing the software can be time-consuming and expensive at the start of your project. Thus, initial studies usually require substantial funding. Another concern is that the nurse researcher may focus on the machine and technology and neglect observing and interacting with the subject.

The most serious disadvantage of computerized data collection is the possibility of measurement error that can occur with equipment malfunctions and software errors. Regular maintenance and calibrations, or reliability checks of the equipment and software, reduce this problem. The benefits of collecting repeated measures over time may outweigh the risk of missing data because of poor compliance. For example, collecting continuous rectal temperature data from a subject is easier and less burdensome than asking the subject to measure an oral temperature every 1 to 2 hours.

Savian, Paratz, and Davies (2006) conducted a single-blind randomized, crossover study with 14 mechanically ventilated intensive care unit patients.

The computerized systems used to collect and record data in the study by Savian et al. (2006) are detailed in the following excerpt:

The use of computerized data collection by Savian et al. (2006) enabled them to collect repeated measures on several physiological variables in an accurate and precise way. The data were collected by sensors and stored in the computer to reduce error and facilitate data analysis.

Phones and Other Electronic Devices

Software applications for mobile phones have evolved from personal digital assistants (PDAs) that allow the researcher to collect and download data directly into the computer from observations as they occur. Healthcare providers load applications that facilitate accurate assessment, diagnosis, and pharmacological and nonpharmacological management of patients. PDAs are also used to store deidentified data from office computers in a form that is easily transportable. PDA software is currently available that may help nurse practitioners collect data for research. Multiple nurse practitioners involved in a research project could forward data electronically from PDAs to a central research site for analysis. Encrypted electronic devices are needed to protect the confidentiality of data during transmission. These electronic devices can be misplaced or stolen, threatening confidentiality. Researchers need to protect the data with a security code to ensure that no one but themselves can access data in these formats.

Mobile phones and computers are becoming more similar with the increased sophistication of applications for mobile phones. Some of these applications can be used to collect various data. Other electronic devices include pill containers that record when pills are accessed and watches with timers to remind participants to take certain health-related actions. However, the use of these devices for research may require considerable preparation. You may need to hire programmers with the needed expertise, and you may need to purchase, rent, or borrow the needed number of devices or monitors.

Factors Influencing Data Collection

When planning data collection, cost, time, the availability of assistance, and the need for consistency are critical factors to consider. The researcher balances these factors with the need to maintain the reliability and validity of the study in the development of the data collection plan.

Cost Factors

Cost is a major consideration when planning a study. Measurement tools, such as continuous electrocardiogram monitors (Holter monitor), wrist activity monitors (accelerometers), spirometers, pulse oximeters, or glucometers, used in physiological studies may need to be rented, purchased, or loaned from the manufacturer or other company. You may need to pay a fee to use instruments or questionnaires. Some instruments and questionnaires are available only if a copy is purchased for each participant. Data collection forms may need to be formatted or developed for electronic use. In some cases, printing costs for materials such as teaching materials or questionnaires that will be used during the study must be considered. Providing the required copy of the signed consent form doubles the expense of consent forms. Small payments to participants in the form of cash or gift cards should be considered as compensation for a subject’s time and effort in providing the data. Sometimes childcare may need to be provided for parents and other caregivers who would not otherwise be able to participate in your study. In some studies, postage is an additional expense. There may be costs involved in coding the data for entry into the computer and for conducting data analyses. Consultation with a statistician early in the development of a research project and during data analysis must also be budgeted. You may need to hire someone who can remain blinded for data entry or analysis or someone who can type the final report, develop graphics or presentations, or type and edit manuscripts for publication.

In addition to the above-described direct costs of a research project, there are costs associated with the researcher’s time and travel to and from the study site. You also must estimate the expense of presenting the research findings at conferences and include those expenses in the budget. To prevent unexpected expenses from delaying the study, examine all costs in an organized manner. A budget is best developed early in the planning process and revised as plans are modified. Seeking funding for at least part of the study costs can facilitate the conduct of a study.

Time Factors

Researchers often underestimate the time required for participants to complete data collection forms and for the research team to recruit and enroll subjects for a study. The first aspect of time—the participant’s time commitment—must be determined early in the process because the time needed for participant involvement must be included in the informed consent process and document. While conducting your pilot study, make note of the time required to collect data from a subject. You may need to revise your timeline and consent form to reflect the expected time commitment accurately.

The second aspect of time—the time needed to complete data collection—is especially challenging to predict because events during the data collection period sometimes are not under the researcher’s control. For example, a sudden heavy staff workload may make data collection temporarily difficult or impossible, or the number of potential subjects might be reduced for a period. In some situations, researchers must obtain permission from each subject’s physician before they are permitted to collect data on that subject. Activities required for this stipulation, such as contacting physicians, explaining the study, and obtaining permission, require extensive time. In some cases, potential subjects are lost before the researcher can obtain the mandatory permission, extending the time required to obtain the necessary number of subjects.

How long will it take to identify potential subjects, explain the study, and obtain consent? How much time will be needed for activities such as completing questionnaires or obtaining physiological measures? Novice researchers have difficulty making reasonable estimates of time and costs related to a study. Validating the time and cost estimates with an experienced researcher can be very informative. Experienced researchers know the challenges of data collection and have learned that data collection may take two to three times longer than predicted. If the cost and time factors are prohibitive, you may need to simplify your study so that fewer variables are measured, fewer instruments are used, or fewer subjects are needed. Make the design less complex, and use fewer data collectors. A blinded intervention study involves more research staff and is generally not feasible for a novice researcher. These are serious modifications, however, with implications for the validity of the findings, so you and your team should thoroughly examine the consequences before making such revisions. If preliminary time or cost estimates go beyond expectations, you can revise the time schedules and budget with new projections for completing the study.


Consistency in data collection across subjects is critical. What time of year will data be collected? For example, if you collect data during holiday seasons, data about sleeping, eating, or exercising may vary. Pediatric patients with asthma may experience more symptoms during the winter months than during summer. Planning data collection for a study of symptom management with this population would need to take this possibility into consideration.

The specific days and hours of data collection may influence the consistency of the data collected and must be carefully considered. For example, the energy level and state of mind of subjects from whom data are gathered in the morning may differ from that of subjects from whom data are gathered in the evening. With hospitalized study participants, visitors are more likely to be present at certain times of day and may interfere with data collection or influence participant responses. Patient care routines vary with the time of day. In some studies, the care recently received or the care currently being provided may alter the data you gather. The subjects you approach on Saturday to participate in the study may differ from the subjects you approach on weekday mornings. Subjects seeking care on Saturday may have a full-time job, whereas subjects seeking care on weekday mornings may be either unemployed or too ill to work.

Who will collect the data? If you decide to use data collectors, they must be trained in responsible conduct of research and issues of informed consent, ethics, and confidentiality and anonymity (see Chapter 9). They must be informed about the research project, familiar with the instruments to be used, and have equivalent training in the data collection process. In addition to training, data collectors need written guidelines or protocols that indicate which instruments to use, the order in which to introduce the instruments, how to administer the instruments, and a time frame for the data collection process (Harwood, 2009; Kang, Davis, Habermann, Rice, & Broome, 2005).

If more than one person is collecting the data, consistency among data collectors (interrater reliability) must be ensured through testing (see Chapter 16). The training needs to continue until interrater reliability estimates are at least 85% to 90% agreement between the expert and the trainee or trainees. Waltz, Strickland, and Lenz (2010) suggest that a minimum of 10% of the data needs to be compared across raters before interrater reliability can be adequately reported. The trained data collector’s interrater reliability with the expert trainer should be assessed intermittently throughout data collection to ensure consistency from the first to the last participant in the study. Data collectors also must be encouraged to identify and record any problems or variations in the environment that affect the data collection process. The description of the training of the data collectors is usually reported in the methods section of an article so that others can assess the data collection process (Harwood & Hutchinson, 2009).

Availability of Assistance

Who is going to help you with the study? If you are a student, will your mentor or supervising faculty member participate? Does your mentor or supervising faculty member have research assistants who could assist in your study? Will nurses, physicians, and other health professionals assist with recruitment? Do they have time to do this? Are they willing to help?

Will the researcher collect all the data, or will data collectors be employed for this purpose? Can data collectors be nurses working in the area? Data collection may be delayed when nurses providing patient care are also expected to be data collectors. Even when a nurse agrees to help you with subject recruitment or data collection, patient care takes priority over data collection and increases the risk for missing data or missing the opportunity to enroll eligible subjects.

If clinicians are going to recruit subjects or collect data, the clinicians need to complete training for protection of human subjects during research. An IRB requires documentation of this training for each person involved in recruitment and data collection. If you are going to be doing all the data collection yourself, will you be available every day of the week? What hours will you be available? If others will be involved in collecting data, allow time for training on data collection procedures. You need to be available by telephone or other means for questions and emergencies when others are collecting data for your study. Keeping these factors in mind, you are now ready to plan the data collection process for your study.

Data Collection and Coding Plan

The factors of cost, time, availability of assistance, and need for consistency shape the data collection plan that you develop. A data collection plan details how you will implement your study. The plan for collecting data is specific to the study being conducted and requires that you consider some common elements of research. You need to map out procedures you will use to collect data, anticipate the time and cost of data collection, develop data collection forms that ease data entry, and prepare a codebook that will help you to code the variables to be entered in a database. This extensive planning increases the accuracy of the data collected and the validity of the study findings. The validity and strength of the findings from several carefully planned studies increase the quality of the research evidence that is then available for implementing into clinical practice (Melnyk & Fineout-Overholt, 2010).

Identifying data include variables such as patient record number, home address, and date of birth (see Chapter 9). Avoid collecting these data unless they are essential to answer the research question. For example, collect a patient’s age instead of date of birth. Review regulations by the Health Insurance Portability and Accountability Act about the participant’s private health information (

The methodology of a study may include contacting subjects later for additional data collection. In this case, you will need to obtain the subject’s address and telephone number and protect the information appropriately. Names and phone numbers of family members or friends may also be useful if subjects are likely to move or may be difficult to contact. This information can be obtained only with subjects’ permission as part of their informed consent. Consider the importance of each piece of data and the subject’s time required to collect it. If the data can be obtained from patient records or any other written sources, you do not need to ask the subject to provide this information. To collect data from a patient’s records, make sure to include permission to do this in the consent form, and ensure that the IRB has authorized your team to do this.

Data Collection Forms

Before data collection begins, you may need to develop or modify forms on which to record data. These forms can be used to record demographic data, information from the patient record, observations, or values from physiological measures. The demographic variables commonly collected in nursing studies include age, gender, race, education, income, employment status, diagnosis, and marital status. You may want to collect additional demographic data if researchers have identified participant characteristics that affect the study variables. You also might need to collect other data that may be extraneous or confounding variables, such as the subject’s physician, stage of illness, length of illness or hospitalization, complications, date of data collection, time of day and day of week of data collection, and any untoward events that occur during the data collection period. If there are only women in your sample, the subject’s age and reproductive status, parity, and number of children in the home may be confounding variables. In a study of patients with ventilator-associated pneumonia, the researcher needs to record the length of time between when the patient was intubated and when ventilator-associated pneumonia was diagnosed. The researcher for this study also needs to record whether the patient had a preexisting pulmonary disease.

Data collection forms must be designed so that the data are easily recorded, coded, and entered into the computer. You need to decide whether data will be collected in raw form or coded at the time of collection. Coding in quantitative studies is the process of transforming data into numerical symbols that can be entered easily into the computer. For example, variables such as race, gender, ethnicity, and diagnoses can be categorized and given numerical labels. For gender, the male category could be identified by a “1” and the female category by a “2.” You may also want to include an “other” category (coded “3”) for participants who are transgendered or transsexual. To be able to compare your sample with samples in federally funded studies, you may need to separate the questions about ethnicity and race. In 2003, the Office of Management and Budget of the U.S. government directed researchers and others collecting data for federal purposes or at federal expense to separate the questions of race and ethnicity (Office of Minority Health, 2010). At the same time, the Office of Management and Budget specified the categories for each. The following questions are correct according to these federal guidelines. How would a subject who is biracial or multiracial complete the form? You may want to word the question to ask the participant’s primary race or allow multiple responses.

Feb 17, 2017 | Posted by in NURSING | Comments Off on Collecting and Managing Data

Full access? Get Clinical Tree

Get Clinical Tree app for offline access