19711 SAMPLING METHODS MARY D. BONDMASS ■ INTRODUCTION One of the first and most frequent questions asked in any discussion of sampling is, “How many participants do I need in my sample?” The first and most frequent answer to this question usually is, “It depends.” Many contemporary authors agree that sampling decisions contribute critically to a study’s internal validity (truthfulness or accuracy) and external validity (generalizability or applicability; Fain, 2017; Gray et al., 2017; Marshall, 2020; Polit & Beck, 2020; Torre & Picho, 2016). Asking or answering the question of sample size as the first discussion point about sampling is analogous to the adage of “putting the cart before the horse,” that is, reversing the order of addressing an issue or problem. Although it is essential, sampling is about more than just about how many participants you need; decisions and processes such as a study’s design and a study’s data source(s) are related and need to be considered sequentially in planning research (Amatya & Bhaumik, 2018; Beavers & Stamey, 2018; Show-Li & Shieh, 2018; Tam et al., 2020; Vanderlaan et al., 2019). A study’s design is its general structure, and its data source is the actual sample. More specifically, the individual elements or basic units that make up that sample are not necessarily people, although in healthcare research, this is often the case. Ethical issues in sampling also need to be considered and must be part of any discussion and/or decisions on a type or number included in a sample (Hunter et al., 2018; Sasso et al., 2018). These factors are all considered in an a priori approach to answering “How many participants do I need in my study?” This chapter is primarily intended for those advanced practice registered nurses (APRNs) and novice researchers who need basic information to critically appraise the sample section of a research article and for those who may be contemplating conducting research or a quality improvement project, either of whom may need clarification on the earlier “it depends” answer. However, before addressing the sample size question, for either of the two intended audiences of this chapter, it is essential to understand the logic and fundamental principles and concepts related to sampling. Therefore, a description of basic terminology and definitions of terms used in the subsequent discussions of the underlying theory and logic of sampling are presented. This basic terminology will hopefully provide you with some context to answer the question of sample size question. A summary of the central limits theorem, as well as of the concept of power and power analysis, will also aid in addressing sample size determination. A section on the critical appraisal of sampling in published research has been added to address the crucial need for APRNs or anyone involved in advanced nursing practice to thrive in an evidence-based practice (EBP) environment. Suggested activities are also included at the end of the chapter for you to self-assess your understanding of the content. 198■ TERMINOLOGY Exhibit 11.1 defines terms that are used throughout this chapter. Many of you have probably heard these terms used in research discussions but might not have fully understood them. Whether you are conducting research or critically appraising the research of others, an understanding of research terminology is foundational for EBP. You are referred back to this section and the exhibit for clarification when a term is used, but the context is not fully understood. Population Versus Sample Other terms critical to the understanding of sampling and sampling methods include target population, accessible population, sampling frame, and sample. The theoretical or target population (often simply just referred to as the population or population of interest) is an aggregate of people, groups, objects, or things that meet a designated set of criteria. The population of a particular study is the group of interest that a researcher wishes to make generalizations about. The individual units of a population are referred to as elements. Since it is generally impossible to reach an entire population, an accessible population is delineated. An accessible population is a subset of the population that is reasonably accessible to a researcher. A sampling frame is the listing of people, groups, objects/things, or a procedure developed for drawing a sample from the accessible population. The sampling frame becomes the methodological “how to’” related to the actual drawing of your sample. Lastly, the sample consists of those people, groups, or objects that a researcher selects from the accessible population using the sampling frame. Theoretically, a study’s actual or true sample usually ends up being a subsample due to non-respondents and attrition; however, for a more practical discussion, the subsample will simply be referred to here as the study sample. In theory, if error and bias are eliminated or minimized, the study sample should be representative of the population, and thereby generalizations can be made about the population from the sample (Fain, 2017; Gray et al., 2017; Marshall, 2020; Polit & Beck, 2020). As an example, let’s say you plan to conduct a study involving an intervention that would decrease salt in the diets of hypertensive African American women; the desired result of your study is blood pressure control. In your study design, you already have in mind that your target population will be African American women with hypertension, but this could include millions of women; there is no way to intervene and collect data on this entire population. You will need to limit your participant search to those hypertensive African American women that you can reasonably access yet still have them belong to, and be representative of, your original target population. Your accessible population may depend heavily on the logistics of where you plan to carry out your study. For this example, let’s say you work at an urban medical center in Chicago, wherein many hypertensive African American women are treated. You may, therefore, define your accessible population as hypertensive African American women attending a particular clinic(s) in Chicago. Lastly, your sample is the number of consenting participants selected (randomly or otherwise) from your accessible population to be included in your study. The study sample, in theory, then serves as a surrogate or proxy for your originally targeted population of hypertensive African 201American women. Figure 11.1 graphically depicts the relationship between the theoretical/target population, accessible population, and the study sample. A sample is drawn from the accessible population that is derived from somewhere within the theoretical or target population. The sample always has a number associated with it, often represented as a lower or uppercase N. The lowercase n usually refers to groups within a sample and the uppercase N to the total sample; however, generally within a well-written article, the reader can easily surmise which value the author(s) is referring to without further explanation, be it a lowercase or uppercase value. The uppercase N has also been used to represent the number of cases in the sampling frame. Sampling Theory and Logic The definition of sampling has not changed over the decades; many recent references can be found that similarly define sampling as the process of selecting a subset of observations from an entire population, such that the characteristics of the subset (e.g., the sample) will be representative enough to draw conclusions or make inferences about the population (Fain, 2017; Gray et al., 2017; Marshall, 2020; Polit & Beck, 2020). Central Limit Theorem The central limit theorem (CLT), credited initially to Simon–Pierre Laplace in the early 1800s, provides the theoretical foundation for sampling and probability theory. The CLT states that given certain conditions, the mean of a sufficiently large number of independent random variables will be approximately normally distributed (Burgess, 2019; Marshall, 2020). Put simply, the CLT tells us that if we take the mean of multiple samples and plot the frequencies of the means, we will get a normal distribution (e.g., a bell curve). The logic of sampling, then, is rather simple; it is efficient and accurate. The efficiency of sampling relates to gaining information about a large group from a small group; that is, your population of interest (large group) can be studied via a sample (small group), thereby obtaining the information that is sought at an acceptable cost. Accuracy is assumed because of the CLT, but only when sampling errors are minimized (Burgess, 2019; Marshall, 2020). 202Random error, also called sampling error, is incidental and has to do with expected fluctuations among samples from a given population and/or unpredictable fluctuations in the readings of a measurement apparatus or the interpretation of the instrumental reading made by the researcher. Considering that all measurements are prone to random error, precision in any instrumentation used and human attention to making precise measurements with those instruments would be a way to decrease random error (Fain, 2017; Marshall, 2020; Terry & ProQuest, 2018). Conversely, nonrandom error (e.g., conscious or unconscious bias) should be avoided or minimized to decrease the likelihood of erroneous conclusions. A small sampling distribution could be constructed to demonstrate sampling error. Still, in theory, there are infinite numbers of sampling distributions that could be created, so we never really see a true sampling distribution. Sampling error can be calculated even though we never actually see the sampling distribution; the calculation is based on the standard deviation of the sample (standard error of the mean [SEM]). The standard deviation of the sampling distribution of the mean is called the standard error; the standard error is called sampling error (Fain, 2017; Marshall, 2020; Terry & ProQuest, 2018). Rather than try to reteach the whole of sampling theory here, when reading a research article, keep the following two statistical ‘pearls’ in mind related to error: ■The greater the sample’s standard deviation, the greater the standard error (and the sampling error). ■The standard error is also related to the sample size; therefore, the greater the sample size, the smaller the standard error (because, the greater the sample size, the closer the sample is to the actual population itself). Another way of looking at this last bullet point is that when you know that values are likely to be different, increase your sample size. This last point should be useful, regardless if you are conducting the research or critically appraising a study. Again using the example of the study of hypertensive African American women, if you were to include only 15 women in the study sample, the standard error would theoretically be much larger than if your sample included 150 hypertensive African American women. The likelihood of 150 versus 15 participants being representative of the target population increases with the sample size and therefore decreases the standard error. ■ METHODS: SAMPLING PROCEDURES Before beginning a specific discussion on the types of samples, it is important to know the procedures involved; this varies depending on whether your research or project is quantitative or qualitative. Quantitative researchers are interested in statistical conclusions about validity and generalizability, and they start with a sampling plan. Conversely, qualitative researchers often do not begin with a sampling plan, but rather make sampling decisions at the same time as they collect data (Fain, 2017; Marshall, 2020; Polit & Beck, 2020; Terry & ProQuest, 2018). This chapter focuses primarily on sampling procedures for a quantitative study or project, but will also reference qualitative issues at times. Fain (2017) and Marshall (2020) both opine that initial decisions about the sample plan are often related to how much error can be tolerated and the cost involved. An interventional drug study or a study related to biomarkers, for example, would have little room for error. The cost of data collection (personnel and/or testing procedures) also needs careful consideration during the 203sample planning stage. A combination of the sampling process and the estimation or inferences made to the entire group from the sample data may need careful consideration when obtaining the sample. Polit and Beck (2020) also note that a key issue for EBP is information about the population of interest, which should be determined early on as part of the sample plan. Sampling units (elements or groups forming the basis of sample selection) and sampling lists (inventories of the units in a population) are also important as you plan your sample. Eligibility or inclusion criteria also need to be considered in your sample plan. These criteria specify defining population characteristics and, whenever possible, should be driven by theoretical considerations with implications for the interpretation of the results and external validity of the findings. Conversely, exclusion criteria will also need to be decided a priori and often mirror the inclusion criteria (Fain, 2017; Marshall, 2020; Polit & Beck, 2020; Terry & ProQuest, 2018). The goal of sampling in the quantitative world is to select cases that will represent an entire population, thereby allowing one to make population inferences from a study’s results. It is important to remember at all times that the sample of a study is simply a subset of the population of interest (see Figure 11.1). Among other issues involved in the sampling plan that may affect the researcher, the participants, and the critical appraiser, are recruitment, incentives, benefits (to participants and society), convenience, endorsements, and assurances to the participants (Fain, 2017; Polit & Beck, 2020). Sampling designs are discussed in the following sections, and critical appraisal or evaluation of sampling methods in published research is addressed at the end of this chapter. ■ PROBABILITY AND NONPROBABILITY SAMPLING There are two classifications of sampling designs: probability and nonprobability sampling. Each method can have several different sample types within the respective classification. While probability sampling could theoretically be used in both quantitative and qualitative research, nonprobability sampling is more frequent with qualitative research (Fain, 2017; Polit & Beck, 2020). Probability Sampling Probability samples are those samples selected utilizing some component of probability theory, typically involving some sort of randomization features such as random selection or random assignment. While addressing the general concept of randomization, I am always reminded of a quote that I heard in my very first statistics course that is, “If you don’t believe in random sampling, the next time you have a blood test, tell the phlebotomist to take it all” (unknown author). I have seen this quote attributed to several people, including former presidential candidate Thomas E. Dewey, a U.S. Census deputy, and Confucius. While I doubt it is the latter or any of the former for that matter, I will leave the credit to an unknown author, and ask that you just take a moment to reflect on the underlying meaning of the quote. In probability samples, all the elements of the population theoretically have an equal chance of being included in a particular sample. While some may argue the blood test example could be interpreted as convenience sampling, I contend it is a random sample and representative of the whole population of blood components in the donor’s blood vessels. When research demands precise, statistical descriptions of large populations, probability sampling is used. Generally, all large-scale survey research and clinical trials use probability sampling methods. The fundamental premise of probability sampling is to provide useful descriptions of the total population. Therefore 204a sample from that population must demonstrate the same variation as in that population, yet this concept is not as straightforward as it may appear. If you recall the previous discussion on error, you can probably imagine the multiple ways that either random or nonrandom error can affect the ability of a probability sample to perform as expected, that is, to be a representative sample of the population. The major advantage of probability sampling is fairness; the major disadvantage is the possibility of flaws in the randomness model. Some common probability samples include simple random, stratified, cluster, and systemic samples. As a self-assessment of your understanding of probability samples, design a sampling method for each of the common probability sample types listed here using the hypothetical study of hypertensive African American women. See Exhibit 11.1 for the description of each of the probability sample types to assist you in this learning activity. Nonprobability Sampling When nonprobability sampling methods are used, every element in the population does not have an equal chance of being in a study sample; therefore, nonprobability samples are less likely than probability samples to be representative of the population. Moreover, nonprobability samples may be more predisposed to error. There are advantages, however, to nonprobability sampling, primarily the ease of implementation. Additionally, with control strategies, nonprobability methods can produce credible samples. Also, keep in mind that the purpose and design of a particular study using nonprobability methods might not be to demonstrate population representativeness. Many qualitative studies are not interested in representativeness to a population, but rather an in-depth description of the lived experience of the individual elements of a sample. The four common types of nonprobability sampling include purposive (judgmental), snowball (network), quota (strata based), and convenience (accidental) samples; the latter of the four is the most commonly used. See Exhibit 11.1, Glossary of Terms, for a description of each of the nonprobability sample types. ■ RELATED CONCEPTS Sample Size At the beginning of this chapter, the question “How many participants do I need in my sample?” was posed. The answer was given as “it depends,” and this last portion of this chapter discusses why. The number of participants in a study is part of the overall design process as well as the sampling plan; it should be determined before beginning any research, recognizing that there are advantages to both large and small sample sizes. Aside from methodological design issues and conclusion validity, a priori determination of sample size may have both economic and ethical considerations. Research is a costly enterprise; the more participants recruited for and ultimately retained in a sample, the more money your research will generally cost in dollars and institutional and human resources. Moreover, the researcher has an ethical responsibility to the participants of human and even animal research not to expose them to any more procedural processes, which may include, pain or possible inconvenience, than needed. A larger sample is not necessarily better and may be unethical when statistical significance can be demonstrated by a predetermined and possibly smaller sample size than an arbitrarily larger number that just seems large enough (Hunter et al., 2018; Sasso et al., 2018). Despite the importance of sample size determinations, there is no universally agreed-upon method for this within the healthcare professions; however, APRNs who lead or participate in 205research teams, as well as nurses who critically appraise research for practice, require a basic understanding of the factors involved in sample size determination. Study and population characteristics, measurement issues, effect size, and practical issues are all factors affecting sample size determination (Amatya & Bhaumik, 2018; Beavers & Stamey, 2018; Copsey et al., 2018; Show-Li & Shieh, 2018; Tam et al., 2020; Vanderlaan et al., 2019). Generally, more complex studies, with multiple variables and relationships being explored, require larger samples. While there are others, one relatively simple method of determining sample size determination is presented below with a discussion of statistical power and power analysis. Statistical Power and Power Analysis Statistical power is the probability of a statistical test finding a significant difference if such a difference indeed exists; it is the probability of rejecting the null hypothesis when it is false (Aberson & ProQuest, 2019; Nuzzo, 2016). Simply put, it is the probability of not committing a Type II error, or a false-negative result (β). As power increases, the likelihood of making a false-negative decision related to your study’s results decreases. Power is statistically represented as 1 − β and 0.80 is generally accepted as adequate statistical power for a study (Aberson & ProQuest, 2019; Nuzzo, 2016). Others have opined that statistical power is analogous to the sensitivity of a diagnostic test, and one may mentally substitute the word sensitivity for the word power to assist in understanding the concept (Miciak et al., 2016; Tavernier et al., 2016). Power is affected by four major factors (Aberson & ProQuest, 2019; Anderson et al., 2017; Nuzzo, 2016), including the significance criterion (α), the magnitude of the effect (effect size), the sample size, and the study design. Two clinical ‘pearls’ related to power and power analysis include: ■As the effect size increases, you may be able to decrease your sample size, because if the effect of the intervention is large, it should be able to be detected easily in a smaller sample. ■Conversely, if you have a small effect size, you would need to increase your sample size to be able to detect that effect. Power analysis can be used to calculate the minimum sample size required to reasonably detect the effect of a given size. Power analysis can also be used to calculate the minimum effect size that is likely to be detected in a study using a given sample size. More and more publishers now encourage researchers to calculate and publish the effect size for each variable in their study, so others can be more exact in their power analysis calculation when utilizing a similar intervention or treatment. However, if hand calculating your power and/or effect size is not something you choose to do (not sure why you would want to), there are open (free) online power calculators that can decrease work and stress levels for the average healthcare practitioner tasked with determining sample size for a study or quality improvement project. The best online power analysis tool I have found, available for both Mac and Windows, is called G*Power 3.1.9.6 (Faul et al., 2009). This application is available from the Department of Experimental Psychology, Heinrich-Heine-University in Düsseldorf, Germany (www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower.html). Much of the website is in German, but instructions are also in English. Once again, using the hypothetical study of African American women, let’s assume you plan to have two groups of randomly assigned participants to receive or not receive (i.e., a control group) your intervention. For this example, your outcome or dependent variable is systolic blood 206pressure only; a literature search indicates that interventions similar to yours have demonstrated small (0.20) to medium (0.50) effect sizes. In designing your study, you want to determine how many participants will reasonably be needed (power of 0.80) to demonstrate a statistical difference between the intervention and control groups, given a specific effect size (Aberson & ProQuest, 2019; Nuzzo, 2016). The following are examples of the results of a priori power analysis to make this determination. Since the effect size is not definitive, you choose to do a power analysis using multiple effect sizes to cover the range of small to medium. You are asked to pay particular attention to the sample size requirement (to achieve 0.80 power) when the effect size changes. What does this tell you about the desired effect size? ■Using a two-tailed independent t-test, with an α error probability of 0.05, and a medium effect size of 0.50, 64 participants per group (N = 128) would be needed to achieve 0.80 power to detect statistical differences between the groups if such differences exist. ■Using a two-tailed independent t-test, with an alpha error probability of 0.05, and an effect size of 0.35, 130 participants per group (N = 260) would be needed to achieve 0.80 power to detect statistical differences between the groups if such differences exist. ■Using a two-tailed independent t-test, with an alpha error probability of 0.05, and a small effect size of 0.20, 394 participants per group (N = 788) would be needed to achieve 0.80 power to detect statistical differences between the groups if such differences exist. See Exhibit 11.2 for G*Power 3.1.9.6’s five different types of statistical power analysis offered.