Developing and using classroom tests

Prudence Twigg, PhD(c), APRN-BC

Using teacher-made tests is one more strategy that nurse educators can use to assess outcomes of learning in didactic and clinical courses. Although developing classroom tests seems like a relatively straightforward task, it is a very involved process. The purpose of this chapter is to offer a step-by-step approach to planning, developing, administering, analyzing, and revising classroom tests.

Planning the test

Developing a test that is valid (representative) and reliable (consistent) requires much thought and planning. The following section describes how to develop a test. This section covers the purpose of the test, criterion-referenced versus norm-referenced tests, development of a table of specifications, and how to improve the reliability and validity of a test.

Purpose of the test

The first question that must be addressed is: “What purpose will the test serve?” If the test is to be given before instruction, it may be used to determine readiness (the grasp of prerequisite skills needed to be successful) or placement (the level of mastery of instructional objectives). During instruction, the test may be used as a formative evaluation of learning or as a diagnostic tool to identify learning problems. With the wide availability of test banks, such as those that accompany textbooks, and the ease of creating tests in a test-authoring component of a learning management system, faculty can also use tests as a way for students to practice and assess their own learning. As measures of learning outcomes, tests provide summative evaluation of learning on which progression and grading decisions may be based. See Table 26-1 for a summary of test measures based on timing of administration.

Table 26-1

Test Measures Based on Timing of Administration

Timing	Type of Test	Measure
Before	Readiness	Prerequisite skills
	Placement	Previous learning
During	Formative	Learning progress
	Diagnostic	Learning problems
After	Summative	Terminal performance

Tests may serve a variety of additional functions. For example, testing may provide the structure (e.g., deadlines) that some students need to direct their learning activities or faculty may use testing as one means of evaluating teaching effectiveness by measuring the outcomes of student learning. Faculty may also use posttest reviews as a learning opportunity for students to discuss rationales for answers (Morrison & Free, 2001) and gain insight into their own strengths and weaknesses (Flannelly, 2001).

Types of tests

Criterion-referenced tests

Criterion-referenced tests are those that are constructed and interpreted according to a specific set of learning outcomes (McDonald, 2008). This type of test is useful for measuring mastery of subject matter. An absolute standard of performance is set for grading purposes.

Norm-referenced tests are those that are constructed and interpreted to provide a relative ranking of students (McDonald, 2008). Norm-referenced tests are based on measurement of content according to a table of specifications (test map, test plan, test blueprint). This type of test is useful for measuring differential performance among students. A relative standard of performance is used for grading purposes.

Both criterion- and norm-referenced tests may be used in nursing education. Criterion-referenced tests are frequently used to ensure safety in areas such as drug dosage calculation, in which the absolute standard of performance may be set as high as 100%, regardless of the performance of other students.

Table of specifications

The purpose of developing a table of specifications (test map, test grid, test blueprint) is to ensure that the test serves its intended purpose by representatively sampling the intended learning outcomes and instructional content. The first step in developing a table of specifications is to define the specific learning outcomes to be measured. Specific learning outcomes, which are derived from more general instructional outcomes (e.g., course and unit objectives), specify tasks that students should be able to perform on completion of instruction (Miller, Linn, & Gronlund, 2009).

Bloom’s taxonomy (1956) has been used as a guide for developing and leveling general instructional and specific learning outcomes (see Chapter 11 for details). Although the cognitive components of the affective and psychomotor domains can be evaluated with classroom tests, tests have most often been used to determine achievement of outcomes in the six levels of Bloom’s cognitive domain (Table 26-2). Anderson and Krathwohl (2001) have revised Bloom’s taxonomy, defining knowledge dimensions and cognitive processes. The knowledge dimensions are factual, conceptual, procedural, and metacognitive. The cognitive processes are remembering, understanding, applying, analyzing, evaluating, and creating (Table 26-3). Any of the six cognitive processes may be applied to the various dimensions of knowledge.

Table 26-2

Bloom’s Taxonomy with Action Verbs for the Cognitive Levels

Cognitive Level	Action Verbs
Knowledge	Define, identify, list
Comprehension	Describe, explain, summarize
Application	Apply, demonstrate, use
Analysis	Compare, contrast, differentiate
Synthesis	Construct, develop, formulate
Evaluation	Critique, evaluate, judge

TABLE 26-3

Anderson and Krathwohl’s Taxonomy with Action Verbs for the Cognitive Processes

Cognitive Processes	Action Verbs
Remember	Retrieve, recognize, recall
Understand	Interpret, classify, summarize, infer, compare, explain, exemplify
Apply	Execute, implement
Analyze	Differentiate, organize, attribute
Evaluate	Check, critique
Create	Generate, plan, produce, reorganize

Additional attention is being given to those cognitive processing skills used by nurses, such as critical thinking, clinical judgment, and clinical decision making (Wendt & Harmes, 2009a). Test items should address these processes as well, and a mixture of cognitive processes should be evaluated at each stage of instruction, placing an increasing emphasis (or weight) on higher-level skills. This is vital because higher-level skills are more likely to result in retention and transfer of knowledge. In addition, this will assist in preparing students for the licensing and certification examinations that test primarily at the levels of application and analysis (National Council of State Boards of Nursing, 2010).

The second step in developing a table of specifications involves determining the instructional content to be evaluated and the weight to be assigned to each area. This can be accomplished by developing a content outline and using the amount of time spent teaching the material as an indicator for weighting (Table 26-4).

TABLE 26-4

Content Outline and Relative Teaching Time

	Content	Teaching Time (%)	No. of Items/Section*
I.	Antipsychotic agents	25	10
II.	Antianxiety agents	25	10
III.	Antidepressant agents	25	10
IV.	Antimanic agents	12.5	5
V.	Antiparkinson agents	12.5	5
Totals		100	40

^*Percentage of teaching time × Total no. of items = No. of items/section.

Finally, a two-way grid is developed, with content areas being listed down the left side and learning outcomes being listed across the top of the grid (Table 26-5). Each cell is assigned a number of questions according to the weighting of content and cognitive processes of learning outcomes.

TABLE 26-5

Two-Way Table of Specifications

Outcomes^* (Content^†)	Remember (20%)	Understand (40%)	Apply (40%)	Totals
Antipsychotic agents (25%)	2	4	4	10
Antianxiety agents (25%)	2	4	4	10
Antidepressant agents (25%)	2	4	4	10
Antimanic agents (12.5%)	1	2	2	5
Antiparkinson agents (12.5%)	1	2	2	5
Totals	8	16	16	40

^*Arbitrarily determined by level of instruction.

^†Percent determined by teaching time.

Some faculty prefer to use a three-way table of specifications. With a three-way grid, the five steps of the nursing process are listed on the left side, outcomes are listed across the top, and the number of items or specific content areas is listed within each cell. Weighting of the steps of the nursing process again depends on the level of instruction. For example, early in the instructional process, assessment and diagnosis might carry the most weight, whereas all stages may be tested equally by the end of instruction. Tables 26-6 and 26-7 are examples of three-way tables of specifications.

TABLE 26-6

Three-Way Table of Specifications: Number of Items per Cell

Outcomes^*(Content^†)	Remember (20%)	Understand (40%)	Apply (40%)	Totals
Assessment (40%)	6	6	4	16
Diagnosis (10%)	1	3	—	4
Planning (10%)	1	2	1	4
Intervention (20%)	—	2	6	8
Evaluation (20%)	—	3	5	8
Totals	8	16	16	40

^*Arbitrarily determined by level of instruction.

^†Percent determined by teaching time.

TABLE 26-7

Three-Way Table of Specifications: Content to Be Tested per Cell *

Outcomes^*(Content^†)	Remember (20%)	Understand (40%)	Apply (40%)	Totals
Assessment (40%)	P, A, A, D, D, M	P, P, A, D, M, PA	P, A, D, PA	16
Diagnosis (10%)	P	A, D, M	—	4
Planning (10%)	PA	A, D	P	4
Intervention (20%)	—	P, A	P, A, A, D, M, PA	8
Evaluation (20%)	—	P, PA, D	P, A, D, D, M	8
Totals	8	16	16	40

A, Antianxiety agents; D, antidepressant agents; M, antimanic agents; P, antipsychotic agents; PA, antiparkinson agents.

^*No. of each type of content item determined by teaching time.

^†Percent determined by teaching time.

Alternatively, or additionally, a table of specifications can be created by using the test plan of the current licensure examination (NCLEX-RN). The NCLEX-RN tests content in four categories of client needs as follows (National Council of State Boards of Nursing, 2010):

1. Safe, effective care environment

2. Health promotion and maintenance

3. Psychosocial integrity

4. Physiological integrity

In addition to the category of client needs, the NCLEX-RN integrates concepts and processes of nursing practice (nursing process, caring, communication and documentation, self-care, teaching and learning) throughout the questions. One possible disadvantage of using the NCLEX-RN test plan in a table of specifications for development of a classroom test is that the content of most nursing courses is not organized according to these categories.

Other considerations in the planning stage

Selecting item types

Items may be selection-type, providing a set of responses from which to choose, or supply-type, or a constructed response type requiring the student to provide an answer. Common selection-type items include true–false, matching, ordered-response, and multiple-choice questions. Supply-type items include fill-in-the blank (usually requiring an absolute answer derived from a mathematical calculation), short-answer, multiple-response, hotspot, and essay questions (Wendt & Kenny, 2009).

The primary reason for choosing one type of item over another can be determined by answering the question: “Which type of item most directly measures the intended learning outcome?” Both selection-type and supply-type questions can be developed for all levels of the cognitive domain (Su, Osisek, Montgomery, & Pellar, 2009) and to test critical thinking, problem solving, and clinical decision-making skills. Other factors may also influence the item-type selection. For example, a large class size may prohibit the use of supply-type items because of the time required for grading.

In addition to multiple-choice items with one correct answer, the NCLEX-RN uses alternative format questions, which include fill-in-the-blank questions, multiple-response questions, drag-and-drop or ordered-response questions, picture or graphic questions, as well as questions that use audio files; using video clips in test questions is under consideration (Wendt & Harmes, 2009a). Current information on the examination format can be obtained at the website for the National Council of State Boards of Nursing (www.ncsbn.org).

Matching items

Definition

A matching item consists of two parallel lists of words that require the student to match according to certain associations between the two lists. One column consists of problems (premises); the other contains answers (responses) (Oermann & Gaberson, 2009).

Advantages

1. A matching item is compact so a great deal of information can be tested on a single page.

2. The items can be scored quickly and objectively.

3. Students can respond to a large number of these items because reading time is short.

4. Students are required to integrate knowledge to discover the relationship.

Disadvantages

1. These items can be difficult to write without including unintended clues.

2. Questions of this nature tend to be restricted to lower-level cognitive processes.

3. It is difficult to generate enough plausible responses.

Guidelines for writing matching items

1. The items being matched should be homogeneous. Otherwise the student can quickly find the correct responses.

2. The entire set of matching items should be on a single page to eliminate page turning.

3. Use a larger or smaller set of responses and permit them to be used more than once or not at all.

4. Arrange items in a systematic order to make the selection process quicker.

5. State in the directions the basis of relationship for matching.

6. Place the stimulus column on the left with each item numbered and the response column on the right with each item lettered.

Example

Column A contains statements by the nurse. Column B is the type of communication. On the line to the left of the items in column A, write the letter of the item in column B that best matches. Responses in column B may be used once, more than once, or not at all.

Column A: Statements	Column B: Communication Type
_____ 1. “I’m here if you need me.”	A Checking perceptions
_____ 2. “What do you think about it?”	B Defending
_____ 3. “I’m sure he knows that.”	C Exploring
_____ 4. “Tell me more about that.”	D Offering self
	E Probing

Interpretive items

Definition

An interpretive item requires a response based on introductory material such as a paragraph, table, chart, map, or picture (Miller, Linn, & Gronlund, 2009). The student must make a judgment about the material presented. These items are often used on nursing examinations to evaluate a student’s ability to interpret laboratory data, electrocardiogram or fetal monitor strips, or other pictorial items. The questions of the NCLEX-RN examination use charts and graphic or pictorial items that may require interpretation.