15: Outcome Evaluation: Methods


CHAPTER 15
Outcome Evaluation: Methods


The selection of a design is the first step in devising the plan for a study aimed to evaluate the outcomes of health interventions. A design specifies the overall schema of the study, pointing to the treatment groups to be included, the way in which participants are to be assigned to these groups, the timing for outcome assessment relative to treatment delivery, and whether or not a process evaluation is to be embedded within the outcome evaluation study. More information is required to guide the conduct of the study. It is essential to delineate the type of comparison treatment, as well as the methods for sampling participants, allocating participants to the health intervention under evaluation and the comparison treatment, collecting outcome (and process, if planned) data, and analyzing the data to determine the effectiveness of the intervention. If not carefully selected and applied, these methods can introduce biases that threaten the validity of inferences regarding the intervention’s effects. The selection of methods and procedures is based on their relevance to the study question and designs, on a critical appraisal of their strengths and limitations, and most importantly, on their acceptability to stakeholder groups including the target client population and their feasibility.


In this chapter, the range of comparison treatments (alluded to in Chapter 14) is presented, and their advantages and disadvantages are discussed. Methods and procedures for sampling, treatment allocation, and outcome data collection and analysis are described, and their importance in maintaining validity is reviewed. Where available, evidence supporting the usefulness of particular methods or procedures is presented.


15.1 COMPARISON TREATMENT


15.1.1 Importance


As mentioned in Chapters 10 and 14, a comparison treatment is included in outcome evaluation studies to make valid inferences in attributing changes in posttest outcomes to the health intervention under evaluation. A valid attribution is supported by empirical evidence showing: (1) no significant differences in the baseline characteristics of participants assigned to the intervention group and to the comparison group; (2) significant improvement in the outcomes in the intervention group from pretest to posttest, maintained at follow‐up; (3) no significant changes in the outcomes in the comparison treatment group from pretest to posttest; and (4) significant differences between the two groups in the outcomes measured following treatment completion.


It is possible that participants in the comparison group report changes in the posttest outcomes. The changes may reflect worsening or improvement (to varying degree) in the outcomes over time. The changes may be related to life events (history), naturally expected changes in their condition (maturation or spontaneous recovery), learning that occurs with repeated completion of the outcome measures (testing), as well as participants’ perceptions and reactions, favorable or unfavorable, to the allocated comparison treatment (as explained in Chapter 11). Alternatively, the changes could reflect the effects of the comparison treatment. The pattern of change, operationalized in the direction (i.e. improvement, no change, worsening) and magnitude (size or extent), in the outcomes reported for participants in the comparison group affects the estimates of the intervention effects. Participants exposed to no‐treatment at all may experience worsening (of any magnitude) of outcomes. The worsening (quantified in posttest outcome scores of a direction opposite to hypothesized) yields large between‐group differences in the posttest outcomes which result in overestimated intervention effects. Participants receiving a comparison treatment that incorporates some components (content, activities) of the experimental intervention may experience minimal change, in the hypothesized direction, in the posttest outcomes; this change translates into small between‐group differences in the posttest outcomes. The small between‐group differences potentially lead to underestimated intervention effects. Participants exposed to an alternative active treatment may experience moderate‐to‐large improvement in the outcomes. Large improvement results in no significant between‐group differences in the posttest outcomes. The interpretation of nonsignificant between‐group differences differs with the purpose of the study: in efficacy and effectiveness studies, they weaken the confidence in attributing the outcomes to the intervention, whereas in comparative effectiveness studies, they indicate that the intervention is as beneficial as the comparison treatment.


The influence of the type of comparison treatment on the estimates of the intervention effects is illustrated in the findings of a recent systematic review. Frost et al. (2018) reviewed the results of studies that evaluated motivational interviewing. They reported significant effects where motivational interviewing was compared to no‐treatment and no beneficial effects where it was compared to alternative active treatments, that is, treatments with active components that differ from those comprising the experimental intervention.


The selection of a comparison treatment therefore, is informed by the overall study purpose and an understanding of the different types of comparison treatment. In general, studies aimed to examine the efficacy of health interventions include a no‐treatment condition, placebo treatment, or treatment‐as‐usual. Studies concerned with demonstrating the effectiveness of interventions consider comparison treatments that are relevant to practice, such as usual care or treatment‐as‐usual. Studies focusing on comparative effectiveness select alternative active treatments.


The different types of comparison treatments that are commonly used in the evaluation of health interventions are described next; their strengths and limitations are highlighted to assist in deciding which one to select. There are two general points to consider regarding comparison treatments:



  1. It is widely acknowledged that the comparison treatment should not incorporate the components reflecting the active ingredients of the health intervention under evaluation. With overlap in components (content, activities, and/or treatment recommendations), participants exposed to the comparison treatment may experience improvement in the outcomes similar to the improvement reported by participants receiving the health intervention. Comparable levels of improvement are quantified in posttest outcome scores that do not differ much between the experimental intervention and the comparison treatment groups. Small between‐group differences reduce the size of the intervention effects and increase the probability of type II error in efficacy and effectiveness studies. The intervention is claimed ineffective when it is successful in initiating the mechanism of action and in inducing the hypothesized improvement in the ultimate outcomes.
  2. It is important to develop a manual to guide the delivery of the selected comparison treatment and to monitor the fidelity with which the comparison treatment is provided, as part of the process evaluation embedded in the outcome evaluation study (Chapter 13). Adherence to the manual and assessment of fidelity help in preventing overlap and in determining the extent of variations in providing the comparison treatment; these variations could have led, intentionally or unintentionally, to the integration of some intervention components into the comparison treatment which, as explained previously, reduces the size of the health intervention’s effects. Therefore, fidelity data are useful in the interpretation of findings, especially when no significant between‐group differences in the posttest outcomes are observed.

15.1.2 No‐Treatment Control Condition


In a no‐treatment control condition, participants do not receive any treatment for the health problem, as part of the evaluation study. Participants who may be taking treatment are requested to withhold it or to maintain its dose constant for the duration of the study.


The advantage of a no‐treatment control condition is that it creates a situation that generates the evidence for demonstrating the criterion of covariation for inferring causality (Chapter 10). The evidence shows no change in the outcomes at posttest in participants assigned to the no‐treatment control condition and improvement in the outcomes in participants assigned to the intervention.


There are disadvantages to the no‐treatment control condition:



  1. It presents an ethical dilemma because treatment is withheld for participants who may have a pressing need for treatment, potentially jeopardizing their health.
  2. It generates unfavorable reactions among participants assigned to the no treatment control condition. These participants demand an explanation for withholding treatment to which they are entitled. They lose motivation and withdraw from the study to seek treatment elsewhere (De Moat et al., 2007). Those who complete the study may experience worsening of the health problem over the study duration.
  3. Participants’ reactions introduce biases. Differential attrition is highly likely when a larger number of participants in the no treatment control group than in the intervention group withdraw. Differential attrition is a major threat to internal (i.e. confounding) and statistical (i.e. low power) validity (Chapter 10). Participants who seek treatment outside the trial (concurrent treatment) and complete the evaluation study may report improvement in the outcomes, which may reduce the size of the between‐group differences in the posttest outcomes and increase the chance of type II error.

15.1.3 Placebo Treatment


Placebo refers to an inert, innocuous treatment that has no inherent power or capacity to induce changes in the health problem. To be included in intervention evaluation studies, placebo treatments have to be credible (Foster, 2012) so that they appear meaningful to participants. Accordingly, placebo treatments should be comparable to the health intervention under evaluation in all respects except the active ingredients that characterize the experimental intervention. Thus, placebo treatments are designed to be structurally equivalent to the experimental intervention.


Structural equivalence means that the placebo treatment has the same nonspecific components as those planned for the delivery of the experimental intervention. Therefore, placebo treatments are delivered by trained interventionists, in the same mode (e.g. individual or group, face‐to‐face sessions), dose (e.g. number of sessions) and setting (e.g. facility and room). The sessions or modules are provided in a similar format as planned for the experimental intervention. Participants are (1) informed of the treatment rationale at the beginning of treatment; the rationale maintains credibility of the placebo treatment and avoids disappointment associated with the receipt of a less desirable treatment; (2) provided information about general health or topics that are not directly related to the health problem and its management; this is done to minimize overlap with the content covered in the experimental intervention; and (3) are encouraged to engage in a discussion of health‐related topics and to carry out homework such as problem solving. Interventionists providing placebo treatment are also encouraged to develop a working alliance with participants, as is also anticipated with the delivery of the experimental intervention.


The advantage of using a placebo treatment in an outcome evaluation study is the “control” for the nonspecific components of treatment delivery on the outcomes. A well‐designed placebo treatment that is structurally equivalent to the experimental intervention incorporates the same nonspecific components as those comprising the intervention. It is believed that participants exposed to the same nonspecific components respond to these components in the same way, whether assigned to the placebo treatment or the intervention. The same responses are reflected in the same pattern (direction and magnitude) of changes in the outcomes observed at posttest. When participants exposed to placebo treatment and to the intervention exhibit the same responses to the nonspecific components, then differences between the two groups in the posttest outcomes are attributable solely and uniquely to the intervention’s active ingredients, thereby enhancing the validity of the causal inferences (van Die et al., 2009).


The disadvantages of the placebo treatment are associated with the placebo response (also called placebo effects) it induces. Although placebo treatments are theoretically expected to be inert, they have been found to produce favorable outcomes such as improvement in the experience of the health problem, or unfavorable outcomes such as the development of side effects. Several mechanisms have been proposed to explain the placebo response: (1) the natural fluctuation in the experience or the natural resolution of the health problem over the evaluation study period; (2) participants’ motivation to apply the homework or placebo treatment recommendations; the motivation results from the perceptions of good rapport and/or working alliance with the interventionist; (3) expectancy of improvement associated with the belief that the placebo treatment is credible and useful; (4) classical conditioning, where improvement is anticipated with the mere fact of receiving a treatment; and (5) neurobiological mechanisms reflected in endogenous opioids (Autret et al., 2012; Finnis et al., 2010; van Die et al., 2009).


Developing a placebo treatment that is structurally equivalent to health interventions presents challenges. Placebo treatments addressing general health topics that are not directly or obviously related to the health problem targeted by the intervention may not be perceived favorably. Participants become aware of the experimental intervention and the placebo treatment during recruitment and the consent process; research ethics may require informing participants of the risks involved in assignment to placebo treatment (Wandile, 2018). Therefore, participants may view the placebo treatment as less credible and less desirable; they may be unwilling to be randomized and withdraw from the study, leading to differential attrition. Participants who enroll in the study and are exposed to the placebo treatment may not experience improvement in the health problem. Some withdraw from the study and others may attempt to please the researchers by reporting socially desirable responses, resulting in response bias (Younge et al., 2015).


In general, placebo responses affect the size of differences in the posttest outcomes between the placebo treatment and the intervention groups, yielding biased (under or over) estimates of the intervention effects as indicated in the results of meta‐analyses. A few meta‐analytic studies reported no or weak placebo effects (Finnis et al., 2010; Kaptchuk et al., 2010), whereas others found a high prevalence of the placebo response and a high magnitude of the placebo effects. Large placebo effects were reported for subjective outcomes (e.g. symptoms), placebo treatments that were not structurally equivalent to the experimental intervention, and placebo treatments involving frequent encounters with empathetic and supportive interventionists (Autret et al., 2012; Baskin et al., 2000; Finnis et al., 2010; Kaptchuk et al., 2010; van Die et al., 2009).


15.1.4 Treatment‐as‐Usual


In treatment‐as‐usual, also called usual care or usual treatment, participants continue to receive the treatment that is prescribed by their healthcare providers, for the management of the health problem addressed by the experimental intervention. In some evaluation studies, participants allocated to the comparison group only are asked to continue with treatment‐as‐usual and those in the experimental intervention group are not offered or are requested to stop the application of usual treatment. In this situation, between‐group differences in the posttest outcomes are assumed to reflect the unique effects of the experimental intervention. In other evaluation studies, participants in both the comparison treatment group and the experimental intervention group continue with treatment‐as‐usual. In this situation, the experimental intervention is provided along with usual care; between‐group differences in the posttest outcomes indicate the contribution of the experimental intervention above and beyond treatment‐as‐usual.


Treatment‐as‐usual is commonly resorted to in situations where it is unethical and unacceptable to stakeholder groups (participants, health professionals, researchers, decision‐makers) to withhold usual care. However, the use of treatment‐as‐usual as a comparison treatment generates methodological challenges that should be carefully addressed during the conduct of the study in order to enhance validity. The challenges stem from the variability in the definition and delivery of treatment‐as‐usual across participants recruited from the same or different practice settings, over the study duration. Whether informed by best practice guidelines adopted in some but not all participating practice settings, or by individual health professionals’ judgment, usual care is often individualized. Accordingly, participants assigned to the experimental intervention or the comparison treatment, receive different types of usual treatments. Usual treatments are given in different modes and doses that are responsive to participants’ initial experience of the health problem, characteristics, preferences and life circumstances, and adapted to progress in their experience of the health problem and overall health condition, over time. It is possible that usual treatments contain the specific components reflecting the active ingredients of the experimental intervention and/or its nonspecific components. The variability in the types and delivery of the treatment‐as‐usual, and the possible overlap of its components with those of the experimental intervention, increases the within‐group variance in the responses (levels of improvement or scores in the posttest outcomes) to the allocated treatment. High within‐group variance decreases the power to detect significant experimental intervention effects (Younge et al., 2015).


It, therefore, is important to address the methodological challenges associated with the use of treatment‐as‐usual in an outcome evaluation study. This can be achieved by: (1) selecting into the study, practice settings that provide standardized usual care; standardized usual care adheres to clearly defined best practice guidelines; (2) obtaining agreement from health professionals and decision‐makers at the selected practice settings, to maintain treatment‐as‐usual (as much as ethically and clinically appropriate) consistent across participants and constant over the study duration; and (3) monitoring the fidelity with which treatment‐as‐usual is delivered. Alternatively, provision of “devised care” has been suggested to address the challenges with treatment‐as‐usual. Devised care involves the development of treatment‐as‐usual from available best practice guidelines; it is delivered consistently by interventionists hired for the evaluation study (Barkauskas et al., 2005). An additional strategy to address the methodological challenges is to collect data on what treatment‐as‐usual is given to what participant. Analysis of these data is helpful in clarifying the distinction between the experimental intervention and the comparison treatment in an evaluation study, and in interpreting the findings.


15.1.5 Active Treatment


Alternative active treatments include theory‐informed or evidence‐based interventions designed to manage the health problem addressed by the health intervention under evaluation. Different types of active treatments are available and can be used in an outcome evaluation study.


Active treatments can comprise selected components operationalizing some active ingredients of a complex health intervention. This type of active treatment is often used in dismantling studies. These studies focus on determining the contribution of each component or a combination of components relative to that of the complex intervention as‐a‐whole (e.g. Epstein et al., 2012). In other words, the goals are to identify which components are most and least effective, and to refine the design of the complex intervention so that it includes only the combination of components found most beneficial. This revision of the complex intervention is believed to optimize its effectiveness and efficiency.


Despite its utility, this type of active treatment may generate challenges in the interpretation of the outcome evaluation study’s findings. With an active treatment that contains selected components of the same intervention, there is an overlap in the components comprising the experimental intervention and those comprising the comparison treatment. If the components comprising the comparison treatment are beneficial, then participants in both groups experience comparable levels of improvement in the outcomes. As explained previously, comparable levels of improvement reduce the magnitude of the between‐group differences in the posttest outcomes, and decrease the likelihood of detecting significant intervention effects (i.e. type II error of inference). It is also possible that the components included in the comparison treatment interact with each other. The components may weaken each other’s effects, resulting in worsening or no change in the outcomes; this yields large between‐group differences in the posttest outcomes. Alternatively, the components may strengthen or enhance each other’s effects, resulting in large improvement in the posttest outcomes; this leads to small between‐group differences, or findings that support the superiority of the comparison treatment relative to the experimental intervention.


Active treatments may consist of the experimental intervention given at a low dose or in a different mode or format. This type of active treatment is perceived favorably by participants as they are given a credible, acceptable, and potentially effective treatment that addresses their pressing need to manage the presenting health problem. This favorable perception has the potential to enhance participants’ enrollment, willingness to be randomized, as well as engagement, enactment, and completion of treatment. Participants’ perceptions and behaviors yield improvement in the outcomes that is comparable to that reported for the experimental intervention, thereby obscuring the intervention’s effects.


Alternative treatments may consist of interventions or therapies with active ingredients that differ from those characterizing the experimental intervention. These treatments induce different mechanisms of action responsible for the effective management of the health problem. Alternative treatments are commonly used in comparative effectiveness research. The experimental intervention’s effects are compared relative to those of the alternative treatment. The goal is to determine that the experimental intervention is as effective (i.e. non‐inferiority trial) or more beneficial (i.e. superiority trial) as the comparison treatment.


The inclusion of an active treatment in an outcome evaluation study is advantageous. It has the potential of enhancing recruitment, enrollment, and retention of participants. Comparing the effectiveness of the experimental health intervention to alternative active treatments provides evidence to guide treatment decision‐making in practice.


15.2 SAMPLING


15.2.1 Importance


Sampling involves the application of methods for accruing an adequate number of participants who are representative of the target client population, excluding client subgroups with characteristics known to confound the intervention’s effects. The methods are applied to recruit clients, screen for eligibility, determine the required sample size, and prevent attrition or retain participants in the study. The choice and application of these methods is critical for maintaining or enhancing the validity of inferences regarding the effects of the health intervention under evaluation.


Recruitment and screening contribute to the composition of the sample in an evaluation study. The sample should represent the target client population in order to enhance the generalizability of the study findings (issue of external validity). Therefore, participants should experience the health problem addressed by the experimental intervention and have the personal and health or clinical characteristics that are comparable to those defining the target client population. Recruitment is expanded to reach the various subgroups (defined in terms of personal and health characteristics) comprising the target client population and avoid, intentionally or unintentionally, the exclusion of a specific subgroup (Fayler et al., 2007). Exclusion of client subgroups may yield sampling selection bias which limits the applicability of the evaluation study findings to the subgroups of the target population represented in the sample. Simultaneously, the sample should exclude participants with personal and health characteristics that are known, based on the intervention theory, empirical evidence, and clinical observation, to influence their capacity and ability to engage and enact treatment, to confound the intervention effects, or to increase the risk of untoward consequences. Therefore, screening for eligibility is done to generate a representative sample (and hence, maintain external validity) and to minimize the potential for confounding bias (and hence, maintain internal validity).


Determination of the required sample size indicates the number of eligible participants to enroll in the outcome evaluation study. The sample size is a determinant of the study’s statistical power to detect significant intervention effects (issue of statistical conclusion validity). The sample size should be adequate, that is, not too large and not too small. With large samples, there is a general tendency to find “statistically” significant differences in the posttest outcomes between the experimental intervention and the comparison treatment groups, potentially leading to type I error (claiming that the intervention is effective when it is not). With small samples, there is a tendency to observe “statistically” nonsignificant differences, potentially leading to type II error (claiming that the intervention is ineffective when it is) (Lipsey, 1990). Therefore, the sample size required for an evaluation study is best determined on the basis of power analysis (Cohen, 1988).


Retention is critical to maintain the composition and the size of the study sample, and to minimize the potential for confounding. If a large number of participants assigned to the intervention and the comparison treatment groups withdraw from the study, then the sample size is decreased, which jeopardizes the power to detect significant intervention effects and increases the chance of type II error. If a larger number of participants assigned to one group drop out, then the groups’ sizes, and the within‐group variances in the outcomes are unbalanced. This imbalance affects the estimates of the intervention’s effects when it is not accounted for in the statistical analysis (issue of statistical conclusion validity). Differential attrition can also be encountered when participants who withdraw from one group differ in their personal and health characteristic than those who drop out of another group. Differential attrition introduces confounding (issue of internal validity). Thus, the findings are based on the subgroups of the target client population that completed the study; they may not be applicable and replicated across different subgroups of the target client population (Sidani, 2015).


There are general points to consider in planning for screening, recruitment, and determination of the sample size for an outcome evaluation study. These are:



  1. Screening assesses for the prespecified eligibility (inclusion and exclusion) criteria, using relevant validated measures. It is done in the early stages of an evaluation study, with clients’ oral/verbal agreement and/or written consent, based on the level of intrusiveness/invasiveness of the screening tests, and the requirements of the research ethics boards at participating settings. Early screening reduces the burden of extensive assessment for clients who do not meet the general eligibility criteria (such as language or experience of the health problem).
  2. Multiple recruitment strategies are planned to reach various subgroups of the target client population. It is useful to consult key informants or representatives of the target client population regarding the most appropriate and acceptable recruitment strategies. Recruitment is done at the start of the evaluation study and at regularly scheduled intervals over the study period to coincide with the planned waves for delivering the intervention. Monitoring the effectiveness of the recruitment strategies helps in optimizing recruitment within available resources.
  3. Determination of the sample size has to account for the accessibility and size of the sampling pool (i.e. number of potentially eligible clients at the participating practice settings, geographic or catchment area, or community) and the anticipated (based on previous research involving the same target client population) enrollment and attrition rates. The goal is to accrue a final sample (i.e. participants who complete the study) of the size required to detect significant intervention effects.
  4. Multiple retention strategies are incorporated in a study. The strategies need to be relevant and attractive to the target client population, and not perceived as coercive. The strategies can be provided at different time points throughout participants’ involvement in the evaluation study.
  5. The relevance, feasibility, and effectiveness of the methods for recruitment, screening and retention are examined in pilot studies (Chapter 12) and optimized prior to use in the large‐scale outcome evaluation study. But, be cognizant of Murphy’s law and prepare for it by having multiple methods and alternative ways or procedures for applying the methods (Streiner & Sidani, 2010).

15.2.2 Screening


Screening aims to determine if clients referred to the outcome evaluation study meet the eligibility criteria. It is informed by the prespecified inclusion and exclusion criteria and conducted with relevant measures that can be administered by health professionals or service providers involved in client referral, and by the research personnel, at the time of enrollment. The eligibility criteria are specified on the basis of (1) the intervention theory: the theory clarifies the nature of the health problem, describes its indicators, and identifies its determinants as well as aspects of the problem addressed by the intervention (forming inclusion criteria). The theory also highlights client characteristics that may influence engagement, enactment, and response to the intervention (forming exclusion criteria); (2) available empirical evidence and clinical observations: these point to subgroups of clients who do not benefit from the intervention or are at risk of developing discomfort or side effects associated with the health intervention. Evidence and observation may highlight possible interaction between the intervention and the treatment‐as‐usual prescribed to clients (forming exclusion criteria).


It is worth reiterating that the prespecification of restrictive or stringent criteria limits the number of potentially eligible clients (sampling pool). A limited sampling pool, in combination with the number of clients who do not consent (for any reason), decrease enrollment rate and thus, the accrued sample size. The end result is reduced statistical power to detect significant intervention effects and limited applicability of the findings to the range of clients seen in practice (see Chapter 14).


Well‐specified eligibility criteria are foundational for the selection of practice settings for recruitment and measures for screening. Recruitment settings (e.g. hospital, clinic, community centers, online, or social media) that have large pools of potentially eligible clients are selected. Information on the size of the sampling pool is gathered from clinical or administrative managers, relevant community leaders, available public health records, or other databases; the information point to the number, within the respective settings, of clients experiencing the health problem addressed by the intervention under evaluation, and the personal profile of clients. A review of this information indicates not only the sampling pool available at each setting, but also if the available pool is restricted to a particular subgroup of the target population (e.g. defined by ethnicity, economic affluence). The information gives direction for determining the number and diversity of settings to be selected in order to recruit a representative sample of adequate size.


The eligibility criteria are clearly delineated to inform the selection of respective measures for screening. Each criterion is defined at the conceptual (what it is) level and operational (what are its indicators) level. The operational definition guides the selection of a measure: the measure should be content valid and capture all indicators of the criterion. Whether containing one question (e.g. How old are you?) or multiple items (e.g. Mini‐Mental State Exam [MMSE] assessing cognitive status), the measure should have validated cutoff scores as well as excellent sensitivity and specificity to correctly identify eligible clients. Failure of screening measures could lead to the inclusion of participants who present with potentially confounding characteristics or who may not benefit from the intervention. Inclusion of these participants results in high variability in their response to the allocated treatment and biased estimates of the intervention effects.


It is useful to develop a protocol that delineates the screening procedures. The protocol provides an overview of the eligibility criteria, the rationale for the specified inclusion and exclusion criteria, the conceptual and operational definitions of each criterion, the measure to be used in the assessment of each criterion, and relevant cutoff scores. The protocol describes:



  1. The equipment or material (e.g. sphygmomanometer to assess blood pressure) and supplies (e.g. paper on which the command “close your eyes” is written in large font for administering an item of the MMSE) needed to administer the screening measures.
  2. The instructions and script for obtaining oral agreement to administer the measures assessing general nonintrusive eligibility criteria such as age or language proficiency.
  3. Instructions for securing written consent to administer the measures assessing specific, possibly intrusive eligibility criteria such as cognitive status or ethnicity.
  4. The logistics for administering the screening measures, including: the personnel responsible and the appropriate time for conducting the screening. For instance, the general eligibility criteria, including medical diagnosis, can be determined by health professionals at the recruitment settings who are involved in referring clients to the evaluation study. Alternatively, the general eligibility criteria can be assessed via telephone conferencing by research personnel involved in recruitment.
  5. Clients meeting the general eligibility criteria are then referred to trained research personnel responsible for obtaining written consent and administering measures of additional, potentially intrusive, or invasive eligibility criteria under conditions that ensure privacy and confidentiality.
  6. The step‐by‐step procedures for administering the measures, appropriately. The sequence for administering the screening measures is specified. Details are provided for computing the total scores for multi‐item measures; and for interpreting the participants’ responses or total scores correctly, relative to the cutoff score, and hence, for accurately determining whether or not a participant is eligible to enroll in the study.
  7. The script to be followed in informing clients of their eligibility.

After training the research personnel in the screening protocol, it is important to monitor their performance of the screening procedures and provide remedial strategies as needed. It is equally important to periodically review the results of screening to assist in identifying the number of eligible participants. A low number suggests the need to extend recruitment to additional settings. Review of the screening results helps in delineating the exclusion criteria most frequently reported; these criteria may be revised to increase the sampling pool and the accrual of the required sample size. McDonald et al. (2006) reported that fewer than expected number of clients met their eligibility criteria, a phenomenon they called as the “Lasagna Law.” Refer to Streiner and Sidani (2010) for real‐life examples of this phenomenon.


15.2.3 Recruitment


Recruitment involves the application of strategies to disseminate information on the outcome evaluation study to the accessible client population (i.e. available at participating settings). The goal is to invite potentially eligible clients to enroll in the study. There are two broad categories of recruitment strategies: active and passive (Cooley et al., 2003). Within each category, there are different specific recruitment strategies that are appropriate to use with, and effective in enrolling different subgroups of the client population in different contexts. Because of these differences, it is useful to select and use multiple strategies in an evaluation study.


15.2.3.1 Active Recruitment Strategies


Active, also called proactive, strategies consist of direct contacts between the recruiters and potentially eligible clients. Recruiters include research staff, and health professionals, or community leaders involved in referring clients to the study. In the direct contact, the recruiters introduce the study’s purpose; briefly describe the health intervention and the comparison treatment that address the health problem of interest; provide an overview of the research activities in which clients will engage; and mention the potential benefits and risks of participation in the study, and the incentives offered. The direct contact can take the form of:



  1. face‐to‐face meeting, scheduled during regular clients’ visits, with individual clients receiving usual services at the participating practice settings, which is often used by health professionals or service providers involved in recruitment;
  2. face‐to‐face meeting with individual clients attending a clinic or hospitalized, which is often used by research personnel responsible for recruitment; in this situation, the meeting should be held privately; and
  3. presentation, given by research staff, to individual or group of clients attending a health (e.g. health fair) or social (e.g. community gathering) event.

The advantages of active recruitment strategies are related to the interactions that take place between recruiters and clients. The interactions offer opportunities to provide detailed information about the study; clarify any misperception of the treatments under evaluation and the planned research activities; discuss the benefits and risks of participation; and address any other concern that clients may have. Clarification of information promotes clients’ understanding of the study and development of realistic expectations of their participation in the study, which are necessary to support their enrollment decision. Through these interactions, clients become familiar with the research staff and begin to develop trust and good rapport. Trust and rapport are important in promoting enrollment in a study, in particular for clients of different ethnic or cultural backgrounds (Timraz et al., 2017). Integrating recruitment closely with client care (Mattingly et al., 2015) and using direct contact (Bower et al., 2014) are reported as effective recruitment strategies.


The active recruitment strategies have limitations. They are time consuming and resource intensive. The recruiters need to be trained. They are required to arrange for the planned contact at the settings’ and clients’ convenience, travel to the settings, and be present throughout the health or social event serving as the context for recruitment. Despite extensive efforts, a rather small percentage of clients can be reached, confined to those available at the event. Further, the available clients may represent a select subgroup of the target client population; for example, those who attend a health fair are likely to be health‐conscious persons. Thus, there is the potential for limited representativeness of the accrued sample. Active strategies can be complemented by passive strategies to reach a wider range of clients.


15.2.3.2 Passive Recruitment Strategies


Passive, also called reactive, strategies involve the use of different media to disseminate information on the study to clients. The information identifies the overall purpose of the study and the general eligibility criteria; highlights the nature of the treatments (experimental intervention and comparison treatment) under evaluation; and instructs interested clients to contact the research staff to learn more about the study. The content is presented in simple, easy to understand terms; using short sentences or phrases written in an attractive, interactive and uncluttered format. The interactive format is illustrated with the statement of a question such as: Who can take part in the study. The question is followed by answers given in point format such as: people who are: (1) 18 years of age or older; (2) have difficulty falling asleep, that is, it takes more than 20 minutes to fall asleep. The amount of information to cover depends on the medium to be used. For example, flyers can afford short key messages or points whereas brochures can expand on the main message. The funding agency and affiliation of the research team are added to the recruitment materials, which contributes to the perceived credibility of the study.


The information is available in printed material (brochures, flyers, advertisement) or in verbal script (announcement, short video). It can be disseminated through a wide range of media, including:



  1. distributing brochures or pamphlets in areas within the participating settings that are frequently visited by potentially eligible clients (e.g. waiting area in outpatient clinics or community health centers);
  2. posting flyers in strategic locations in the participating settings (e.g. bulletin board) or other locations frequented by the target population (e.g. ethnic food stores, sites of worship);
  3. placing advertisements in newspapers, newsletters (distributed door‐to‐door in local communities) or magazines with wide distribution or in those targeting the client population of interest (such as newsletters for specific immigrant or ethnic groups printed in their respective language, or magazines focusing on topics of relevance to older persons), or in social media such as freely and commonly accessed websites for the general public (e.g. Kijjiji) or those maintained by relevant associations or organizations (e.g. Sleep Society or Alzheimer’s Society);
  4. making announcements on television or radio; these are aired in stations and at time slots carefully selected to reach the target population (e.g. classical music radio station to reach older people) at the most opportune time (e.g. lunch time);
  5. sending an electronic message to clients belonging to an association (through the association’s listserv) such as an association for the caregivers of persons with Alzheimer’ disease;
  6. uploading written material or short recruitment video on the website created for the study, or other online recruitment strategies (e.g. Juraschek et al., 2018).
  7. snowballing, where health professionals, service providers, community leaders, and participants “spread the word” about the study (Williams et al., 2017).

The advantages of passive recruitment strategies relate to the wide dissemination of the information about the study, which increases the likelihood of reaching a large number of diverse subgroups of the target client population. This wide reach has the potential to accrue the required sample size within the study time line and to enhance the representativeness of the sample. Compared to active strategies, the use of passive strategies reduces research staff time incurred with recruitment. However, passive strategies may increase the need and cost of other resources associated with printing of materials, travel to distribute and replenish these materials, and placing advertisements in newspapers (e.g. the cost of a business‐card size advertisement is $400, at the least) or announcement in media (e.g. cost is $800, at least). Additional expenses are associated with research staff time spent in responding to a large number of inquiries and explaining the study to clients, who may end up being ineligible.


15.2.3.3 Recruitment Process


Developing a recruitment plan is useful in directing the selection and application of the recruitment strategies at different time intervals throughout the evaluation study. The selection of strategies is guided by evidence on their effectiveness in combination with “knowledge” of the target client population and/or input from key informants or representatives of the population. Effectiveness of a recruitment strategy is indicated by the number of clients informed of the study and showing interest in learning more about it (Sidani, 2015). It is assessed by the number of clients who contact the research staff. Results of systematic reviews were inconsistent in identifying the most promising strategies. For instance, Leach (2003) reported face‐to‐face, referral by health professionals, and use of media as most effective. McDonald et al. (2006) found advertisement in newspapers, mail shots sent to clients or to their healthcare providers, and having dedicated research staff spearheading recruitment as most effective. Ibrahim and Sidani (2013) concluded that active strategies are more successful than passive ones in recruiting clients of diverse ethno‐cultural background. Caldwell et al. (2010) found that strategies to increase potential participants’ awareness of the health problem are useful in enhancing recruitment, whereas Treweek et al. (2018) reported the use of open‐label trials and telephone reminders as most useful.


The inconsistency in results suggests that different recruitment strategies may be effective for different client populations, and that multiple strategies may be used in an evaluation study to reach a large number of clients. Accordingly, it is important to “know” the target population not only in terms of the general characteristics but also the location (Williams et al., 2017) of clients; the health, social, and recreational services they frequently use; the media they commonly access; and possible variation in their ability or motivation to participate in the health intervention. The importance of “knowing” the population is illustrated with these findings: a slightly larger number of smokers entered a study evaluating a web‐based smoking cessation program around the new year (resolution) period than in the summer or fall period (Graham et al., 2013). Such information, gathered in formal or informal consultation with representatives of the target client population, assists in selecting the most appropriate strategy and timing for recruitment. For example, if the target client population such as persons with insomnia, is widely dispersed, then passive strategies would reach a large proportion of the population. Also awareness that older persons with insomnia read the hard copy (more so than electronic copy) of the daily newspapers and avoid going out in the winter time (because of fear of slipping on icy sidewalks and breaking their hips) assists in selecting the newspapers for advertisement and in planning to intensify recruitment efforts in the fall, spring, and summer. In contrast, it may be more appropriate to use social media to recruit young persons with insomnia. This example highlights the importance of selecting multiple strategies to recruit various subgroups of the target client population.


Spacing the implementation of the selected recruitment strategies, where they are applied at different time intervals throughout the study duration serves two purposes. First and foremost is that this scheduling provides the opportunity to evaluate the effectiveness of each recruitment strategy. This requires documentation of the specific recruitment strategy used (e.g. advertisement in a particular newspaper or presentation at an event), the date it was carried out, and the number of clients contacting the research staff to inquire about the study, within a prespecified time interval (e.g. one week) following the implementation of the strategy. The number of inquiries indicates the effectiveness of the strategy. The recruitment data are discussed at regularly scheduled research team meetings. The discussion involves the identification of: possible challenges with the use of a particular recruitment strategy (e.g. recruiters share their perceptions of what may have or have not worked; Williams et al., 2017); the need to modify any aspect of the strategy’s implementation (e.g. timing of an advertisement to maximize its reach); and ways to increase the efficiency of the recruitment plan, within available resources (e.g. discontinuing ineffective strategies).


The second purpose for spacing the implementation of recruitment strategies has to do with the practicality of responding to inquiries promptly, within the constraints of available human resources. For instance, it is important to contact interested clients within 24–48 hours of their inquiry (i.e. leaving a voice mail message or sending an electronic message) as a means of showing respect and appreciation, and of developing a good rapport with them. This prompt response may demand the availability of an adequate number of research staff who are responsible to contact clients, at a convenient time; explain the study; address their concerns; and invite them to participate; all of which take time and must not be rushed. Interactions that demonstrate respect and patience are essential for developing a good rapport, which is a recommended strategy to promote enrollment.


The importance of developing a good rapport underscores the necessity to prepare a recruitment protocol that details the steps in responding to clients’ inquiries. The protocol specifies the timing of contact, ways to ascertain convenience of the time, the description of the study and of techniques to inquire about clients’ concerns. It is also necessary to train all personnel responsible for recruitment in the skills and procedures delineated in the protocol, to monitor their performance and to give feedback as needed, to promote the development of a good rapport with clients.


In the extant literature, enrollment is discussed alongside recruitment because the ultimate goal of recruitment is to enroll clients in the study. Multiple recruitment strategies have been increasingly used in outcome evaluation studies. However, the enrollment rates (i.e. percentage of clients recruited that consent to participate in the study) have been and still are consistently low. Low enrollment rates result in smaller (than required) sample size, and a in sample that is not representative of all subgroups of the target population, thereby jeopardizing statistical conclusion and external validity, respectively (Bower et al., 2014; Horwood et al., 2016; Thoma et al., 2010). Cumulative evidence shows that less than half of the funded RCTs achieve the required sample size (Butler et al., 2015; Califf et al., 2012; Hughes‐Morley et al., 2015; Treweek et al., 2013). To understand what contributes to low enrollment, despite extensive and effective recruitment, assessment of reasons for declining entry into an intervention evaluation study has been (and should be done in any study) integrated as part of the recruitment or the consent process. The assessment involves inquiring about the clients’ willingness to enroll in the study and if not, the reasons for declining. The questions eliciting reasons for nonenrollment can be generic or general (e.g. what were the reasons or what led you to not wanting to take part in the study). Additional more specific questions are used to ask about reasons for nonenrollment. The questions inquire about factors known to affect enrollment. The factors are usually related to: personal life circumstances (e.g. Is it for personal reasons like lack of time or transportation issues?); the health intervention and the comparison treatment (e.g. Does your decision have to do with the type of treatments or with the way the treatments are given?); and the research methods (e.g. Is there any particular aspect of the study, such as invasiveness of the test, that led you to not take part in the study?).


Participants’ responses are content analyzed to identify frequent barriers to enrollment. Awareness of the barriers informs necessary modifications in the following:



  1. the recruitment message: for example, adding points that may address some reasons for nonenrollment such as “you will receive an incentive” or “transportation costs will be covered”;
  2. the study methods: for example, the design of the study can be revised from a traditional RCT to a comprehensive cohort design, which is increasingly being reported as a means to enhance enrollment; and
  3. the mode of intervention delivery: for instance, later intervention sessions can be offered by telephone instead of face‐to‐face individual format, to address transportation barriers.

Results of several evaluation studies suggest that, in addition to personal client characteristics such as older age, general health status, perceptions of randomization and of the treatments under evaluation including possible side effects are frequently mentioned reasons for nonenrollment (Costenbader et al., 2007; Moorcraft et al., 2016; Murphy et al., 2012; Thoma et al., 2010). Modifications in the study design and methods are made to address client‐reported barriers to enrollment such as using an open‐label design (Treweek et al., 2018) where blinding participants could not be applied or is not well received, and allowing flexibility in research methods. The effectiveness of these strategies in enhancing recruitment and enrollment was examined. Results of systematic reviews identified promising strategies in improving enrollment. The strategies were: applying active recruitment strategies (i.e. direct contact with clients); providing incentives to recruiters (i.e. health professionals or service providers) and to participants; using open‐label design; planning the study methods in a way that reduces burden on participants and that maintains flexibility; and building and maintaining good rapport between research personnel and participants (Bower et al., 2014).


15.2.4 Determination of Sample Size


Determination of sample size involves calculations of the number of participants to include in the evaluation study, in order to detect significant intervention effects. The sample size should be adequate to reach valid conclusions regarding the intervention’s effects, while minimizing the chance of type I error (i.e. false conclusion that the intervention is effective) and type II error (i.e. false conclusion that the intervention is ineffective). Thus, the calculations are done to optimize the sample size so that it is not too small to the point it is unable to detect existing effects, and not large to the point it is able detect any effect even if not theoretically and/or clinically meaningful (Noordzij et al., 2010).


Sample size calculations are based on power analysis. Power analysis consists of applying formulae that take into consideration three components and that vary with the design planned for the outcome evaluation study. The components are:


Alpha (α) level: The alpha level represents the rate of type I error. It is commonly set at 0.05, which implies a desire for less than 5% chance of drawing a false conclusion that the intervention is effective. Conventionally, a more liberal alpha level (e.g. 0.10) can be set for pilot studies aimed to explore the effects of a new intervention or the effect of an evidence‐based intervention in a new client population in a new context. A more conservative alpha level (e.g. 0.01) can be set for full scale studies aimed to confirm the efficacy, effectiveness and safety of the intervention.


Beta (β) level and Power: The beta (β) level reflects the rate of type II error. It is usually set at 0.20, which implies a desire for less than 20% chance of drawing a false conclusion that the intervention is ineffective. Power reflects the ability to detect intervention effects that are present in the client population, based on the sample’s estimates of these effects. Power is the complement of β and is computed as 1 − β. It is conventionally set at 0.80 or 80%, which represents the probability of avoiding type II error (Noordzij et al., 2010).


Magnitude of the intervention effect: The magnitude of the intervention effect is the anticipated size of the difference in the primary outcome. The effect size represents the magnitude of the difference between the intervention and the comparison treatment groups, in the primary outcome measured at posttest in a between‐subject design. The effect size can also quantify the difference between the pretest and the posttest outcome scores, within the intervention group, in a within‐subject design. The size of the difference is quantified in either of two ways. The first is the minimal, clinically relevant, difference that is anticipated to be detected. The difference is estimated from previous research (Noordzij et al., 2010

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Nov 28, 2021 | Posted by in NURSING | Comments Off on 15: Outcome Evaluation: Methods

Full access? Get Clinical Tree

Get Clinical Tree app for offline access