Reliability and validity
Geri LoBiondo-Wood and Judith Haber
After reading this chapter, you should be able to do the following:
• Discuss how measurement error can affect the outcomes of a research study.
• Discuss the purposes of reliability and validity.
• Discuss the concepts of stability, equivalence, and homogeneity as they relate to reliability.
• Compare and contrast the estimates of reliability.
• Compare and contrast content, criterion-related, and construct validity.
• Identify the criteria for critiquing the reliability and validity of measurement tools.
• Use the critiquing criteria to evaluate the reliability and validity of measurement tools.
• Discuss how evidence related to reliability and validity contributes to the strength and quality of evidence provided by the findings of a research study and applicability to practice.
Go to Evolve at http://evolve.elsevier.com/LoBiondo/ for review questions, critiquing exercises, and additional research articles for practice in reviewing and critiquing.
Nurse investigators use instruments that have been developed by researchers in nursing and other disciplines. When reading studies, you must assess the reliability and validity of the instruments to determine the soundness of these selections in relation to the concepts (concepts are often called constructs in instrument development studies) or variables under study. The appropriateness of instruments and the extent to which reliability and validity are demonstrated have a profound influence on the strength of the findings and the extent to which bias is present. Invalid measures produce invalid estimates of the relationships between variables, thus introducing bias, which affects the study’s internal and external validity. As such, the assessment of reliability and validity is an extremely important critical appraisal skill for assessing the strength and quality of evidence provided by the design and findings of a study and its applicability to practice.
Regardless of whether a new or already developed instrument is used in a study, evidence of reliability and validity is of crucial importance. This chapter examines the major types of reliability and validity and demonstrates the applicability of these concepts to the evaluation of instruments in nursing research and evidence-based practice.
Reliability, validity, and measurement error
Reliability is the ability of an instrument to measure the attributes of a variable or construct consistently. Validity is the extent to which an instrument measures the attributes of a concept accurately. Each of these properties will be discussed later in the chapter. To understand reliability and validity, you need to understand potential errors related to instruments. Researchers may be concerned about whether the scores that were obtained for a sample of subjects were consistent, true measures of the behaviors and thus an accurate reflection of the differences between individuals. The extent of variability in test scores that is attributable to error rather than a true measure of the behaviors is the error variance. Error in measurement can occur in multiple ways.
An observed test score that is derived from a set of items actually consists of the true score plus error (Figure 15-1). The error may be either chance error or random error, or it may be systematic or constant error. Validity is concerned with systematic error, whereas reliability is concerned with random error. Chance or random errors are errors that are difficult to control (e.g., a respondent’s anxiety level at the time of testing). Random errors are unsystematic in nature. Random errors are a result of a transient state in the subject, the context of the study, or the administration of an instrument. For example, perceptions or behaviors that occur at a specific point in time (e.g., anxiety) are known as a state or transient characteristic and are often beyond the awareness and control of the examiner. Another example of random error is in a study that measures blood pressure. Random error resulting in different blood pressure readings could occur by misplacement of the cuff, not waiting for a specific time period before taking the blood pressure, or placing the arm randomly in relationship to the heart while measuring blood pressure.
Systematic or constant error is measurement error that is attributable to relatively stable characteristics of the study sample that may bias their behavior and/or cause incorrect instrument calibration. Such error has a systematic biasing influence on the subjects’ responses and thereby influences the validity of the instruments. For instance, level of education, socioeconomic status, social desirability, response set, or other characteristics may influence the validity of the instrument by altering measurement of the “true” responses in a systematic way. For example, a subject is completing a survey examining attitudes about caring for elderly patients. If the subject wants to please the investigator, items may constantly be answered in a socially desirable way rather than how the individual actually feels, thus making the estimate of validity inaccurate. Systematic error occurs also when an instrument is improperly calibrated. Consider a scale that consistently gives a person’s weight at 2 pounds less than the actual body weight. The scale could be quite reliable (i.e., capable of reproducing the precise measurement), but the result is consistently invalid.
The concept of error is important when appraising instruments in a study. The information regarding the instruments’ reliability and validity is found in the instrument or measures section of a study, which can be separately titled or appear as a subsection of the methods section of a research report, unless the study is a psychometric or instrument development study (see Chapter 10).
Validity
Validity is the extent to which an instrument measures the attributes of a concept accurately. When an instrument is valid, it truly reflects the concept it is supposed to measure. A valid instrument that is supposed to measure anxiety does so; it does not measure some other concept, such as stress. A measure can be reliable but not valid. Let us say that a researcher wanted to measure anxiety in patients by measuring their body temperatures. The researcher could obtain highly accurate, consistent, and precise temperature recordings, but such a measure may not be a valid indicator of anxiety. Thus the high reliability of an instrument is not necessarily congruent with evidence of validity. A valid instrument, however, is reliable. An instrument cannot validly measure the attribute of interest if it is erratic, inconsistent, or inaccurate. There are three types of validity that vary according to the kind of information provided and the purpose of the instrument (i.e., content, criterion-related, and construct validity). As you appraise research articles you will want to evaluate whether sufficient evidence of validity is present and whether the type of validity is appropriate to the study’s design and instruments used in the study.
As you read the instruments or measures sections of studies, you will notice that validity data are reported much less frequently than reliability data. DeVon and colleagues (2007) note that adequate validity is frequently claimed, but rarely is the method specified. This lack of reporting, largely due to publication space constraints, shows the importance of critiquing the quality of the instruments and the conclusions (see Chapters 14 and 17).
Content validity
Content validity represents the universe of content, or the domain of a given variable/construct. The universe of content provides the basis for developing the items that will adequately represent the content. When an investigator is developing an instrument and issues of content validity arise, the concern is whether the measurement instrument and the items it contains are representative of the content domain that the researcher intends to measure. The researcher begins by defining the concept and identifying the attributes or dimensions of the concept. The items that reflect the concept and its domain are developed.
When the researcher has completed this task, the items are submitted to a panel of judges considered to be experts about the concept. For example, researchers typically request that the judges indicate their agreement with the scope of the items and the extent to which the items reflect the concept under consideration. Box 15-1 provides an example of content validity.
Another method used to establish content validity is the content validity index (CVI). The content validity index moves beyond the level of agreement of a panel of expert judges and calculates an index of interrater agreement or relevance. This calculation gives a researcher more confidence or evidence that the instrument truly reflects the concept or construct. When reading the instrument section of a research article, note that the authors will comment if a CVI was used to assess the content validity of an instrument. When reading a psychometric study that reports the development of an instrument, you will find great detail and a much longer section of how exactly the researchers calculated the CVI and the acceptable item cut-offs. In the scientific literature there has been discussion of accepting a CVI of .78 to 1.0 depending on the number of experts (DeVon et al., 2007; Lynn, 1986). An example from a study that used CVI is presented in Box 15-1. A subtype of content validity is face validity, which is a rudimentary type of validity that basically verifies that the instrument gives the appearance of measuring the concept. It is an intuitive type of validity in which colleagues or subjects are asked to read the instrument and evaluate the content in terms of whether it appears to reflect the concept the researcher intends to measure.
Criterion-related validity
Criterion-related validity indicates to what degree the subject’s performance on the instrument and the subject’s actual behavior are related. The criterion is usually the second measure, which assesses the same concept under study. For example, in a study by Sherman and colleagues (2012) investigating the effects of psychoeducation and telephone counseling on the adjustment of women with early-stage breast cancer, criterion-related validity was supported by correlating amount of distress experienced (ADE) scores measured by the Breast Cancer Treatment Response Inventory (BCTRI) and total scores from the Symptom Distress Scale (r = .86; p < .000). Two forms of criterion-related validity are concurrent and predictive.
Predictive validity refers to the degree of correlation between the measure of the concept and some future measure of the same concept. Because of the passage of time, the correlation coefficients are likely to be lower for predictive validity studies. Examples of concurrent and predictive validity as they appear in research articles are illustrated in Box 15-2.
Construct validity
Construct validity is based on the extent to which a test measures a theoretical construct, attribute, or trait. It attempts to validate the theory underlying the measurement by testing of the hypothesized relationships. Testing confirms or fails to confirm the relationships that are predicted between and/or among concepts and, as such, provides more or less support for the construct validity of the instruments measuring those concepts. The establishment of construct validity is complex, often involving several studies and approaches. The hypothesis-testing, factor analytical, convergent and divergent, and contrasted-groups approaches are discussed below. Box 15-3 provides examples of different types of construct validity as it is reported in published research articles.
Hypothesis-testing approach
When the hypothesis-testing approach is used, the investigator uses the theory or concept underlying the measurement instruments to validate the instrument. The investigator does this by developing hypotheses regarding the behavior of individuals with varying scores on the measurement instrument, collecting data to test the hypotheses, and making inferences on the basis of the findings concerning whether the rationale underlying the instrument’s construction is adequate to explain the findings and thereby provide support for evidence of construct validity.

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

