Effective Online Testing With Multiple-Choice Questions
Tests and quizzes have been the mainstay of summative assessment in nursing education for decades. Using quizzes for formative assessment is, perhaps, more in line with a constructivist paradigm and exploits the testing effect. This chapter is about testing with multiple-choice questions (MCQs), how to write MCQs at various levels of Bloom’s taxonomy, and the value of frequent formative assessment from a cognitive science perspective. Although assessing higher cognitive functions and abilities is key to promoting deep learning and transforming nursing education, assessing facts and understandings is of value as well to identify students’ zone of proximal development (ZPD).
The goal of higher education is transfer of knowledge from the classroom to new domains that include dealing with life’s challenges and functioning effectively in the workplace (Carpenter, 2012; Mayer, 1998; Merriam & Leahy, 2005). This requires that students attend to the lesson, remember what they have learned, and are able to recall the information when needed. Our goal as educators should be to teach with long-term retention and transfer in mind, and the type of teaching and assessment strategies that best accomplish this may surprise you.
MCQs—FORM AND FORMAT
Anatomy of a Test Item
An MCQ contains a stem and options, which include the correct answer and two or more distractors. The stem, which can range in length from a few words to a case containing relevant and extraneous information, comprises the question or what requires an answer (Haladyna, 2004).
182Writing distractors is challenging and undoubtedly the most difficult part of creating an MCQ. Haladyna (2004) recommends that distractors should not stand out as different from the correct answer in terms of length, grammatical form, style, or tone. They must all be plausible in that they represent errors in learning or common mistakes made. Distractors should not be arbitrary or humorous, and they must be clearly incorrect answers. From a psychometric perspective, distractors are discussed in terms of functioning and nonfunctioning. A functioning distractor is one that is plausible and chosen by more than 5% of the examinees (Wakefield, 1958, as cited in Rodriguez, 2005). The value of having well-written distractors is that they serve to increase the difficulty of the item and discriminate among students (Kilgour & Tayyaba, 2015).
Controversy exists as to the best number of distractors for an MCQ. However, given that functioning distractors are difficult to write, the typical four-option approach, which consists of the correct answer plus three distractors, often leaves the item writer struggling to come up with a plausible third distractor. Thus, the recommendation based on decades of research is for only two distractors (Haladyna, 2004; Haladyna & Downing, 1993; Kilgour & Tayyaba, 2015; Rodriquez, 2005).
Controversy also exists as to whether every question on the test must have the same number of options. Some feel that the number can vary (Nitko & Brookhart, 2011, as cited in Oermann & Gaberson, 2014). Their perspective focuses on writing plausible distractors for each question; if only two can be written, then the item should contain three options. Conversely, if three plausible distractors can be written, the item will include four options. I have learned from experience in giving exams online that if most questions include four options and suddenly a question appears with three, students become concerned that an option has been inadvertently omitted from the test. Although this issue can be readily cleared up in a classroom, it becomes a problem online. If the test is timed, students have only one opportunity to take it, and if they have a question at 2 a.m. when they are taking the test, most likely you will not available by phone or e-mail to answer. I think the best practice is to follow the research recommendations and use three options for all questions. Alternatively, if questions in a test contain varied number of options, this can be mentioned in the instructions at the beginning of the test or at the end of the stem on that question. Although some may argue that a three-option question does not reflect certification exam format or contribute to increased reliability and validity, it does allow for a greater number of questions to be included in the test as students have less to read, thus potentially negating the concerns.
MCQs can take various forms, but the most widely accepted and easiest for students to comprehend is considered the conventional format. In this format, a complete question or statement is made in the stem, ending in a period or question mark. No blank spaces exist within the question stem that students must fill in by selecting one of the options, nor does the stem require one of the options to complete the thought. The latter format, called the completion format, requires that students remember the stem while they read through the options. If the stem is lengthy and their memory fails, they must return to the stem, reread it, and reread the options (Haladyna, 2004). This slows down the test-taking process, which faculty must take into consideration when determining the number of questions to include on a test. Keep in mind that the more questions an exam contains, the more reliable and potentially valid the test will prove to be (Schneid, Armour, Park, Yudkowsky, & Bordage, 2014). Although learning management system (LMS) software will allow various question formats, writing MCQs in the conventional format will create a more efficient test.
Another type of MCQ that often appears on certification exams is the context-dependent set, which consists of a case and several questions that relate to the case. This type of question has the ability to measure higher cognitive reasoning, which is so important in nursing education (Oermann & Gaberson, 2014).
Cases for context-dependent sets should include a setting (the context) that is relevant to the students’ future role. Multiple questions can be based on the case and can run the gambit of the domains and levels of the three taxonomies depending upon what is to be assessed. The more irrelevant information included in the case, the greater the cognitive activity required to identify salient data and the more time students will need to read them.
Although the context-dependent item is rather easy to accommodate on a paper-and-pencil test, challenges exist when quizzes are placed online in LMS software. The software must be set so that students have the ability to scroll through the entire test, that is they can go back and forth to review questions. If the software is set so that students are able to see only one question at a time, the case will appear with the first question only. The student will be required to remember the nuances of the case while answering the other questions, which may result in the test assessing something it was not intended to assess—short-term memory. In addition, the test cannot be set to scramble the questions. Although this option is preferred especially 184when classroom testing is done in order to make cheating more difficult, scrambling the questions will separate the case from the questions that pertain to it. Scrambling the options is the only possibility.
Test blueprints have been used in nursing education as a means to associate the number of questions on a test with the content taught and the domain and level of objectives all listed in a table format. From my perspective, the term blueprint is a bit of a misnomer, as only one aspect of it is completed prior to writing the questions. That aspect is knowing how many questions you plan to include on the test. However, even that decision may be delayed if questions are difficult to write or you include context-dependent questions that will take longer for students to read.
Instead of making a decision about how many questions from specific levels should be written, the focus should be on assessing the objectives. The approach that I use is to print out the blueprint table (see Exhibit 8.1) and make hash marks in each cell to indicate the number of questions that have been written for each objective at the specified level as I write the questions. This will avoid writing too many questions at the lower levels wof Bloom’s taxonomy that are the easiest to write.
Many blueprint formats are available that include different types of information. Oermann and Gaberson (2014) and Zimmaro (2010) associated questions with specific course content, the domain and level of verb, and the number of points for each question in order to see how the exam was weighted by content. Bristol and Brett (2015) have developed fairly complex blueprints that specify the text used, NCLEX (National Council Licensure Examination) categories, QSEN (Quality and Safety Education for Nurses) competencies, the domain and level of verb, the portion of the nursing process assessed, and the question type (MCQ, fill in the blank, etc.).
The blueprint format that I have found most helpful is similar to that developed by Tarrant and Ware (2012) with the information flipped on different axes. The example of the blueprint in Exhibit 8.1 includes the levels of taxonomy (far left column), the objectives by number (second row), and the number of questions written for each objective and cognitive level (cells). A blank blueprint can also be found at www.springerpub.com/kennedy.
Choosing a specific format for the blueprint depends upon the type of information wanted and needed. The blueprint I find useful allows me to see how many questions I have developed for each objective and at what cognitive level, thus avoiding the heavy tilt toward questions from the lower 185cognitive levels. The same type of blueprint could be created for objectives from Bloom’s affective and psychomotor domains. If your course involves objectives from multiple domains, the descriptors from those domains can be added to the template in Exhibit 8.1 so a snapshot of the entire test is on one page. I would encourage you to copy/paste your objectives for the course at the top of the blueprint to maintain focus on what is to be assessed and at what level as you write the questions.
Sample Test Blueprint
Why Question Level Matters
Because students in online nursing programs have already completed a basic nursing program and many of them have experience in clinical practice, as educators in RN to bachelor of science in nursing (BSN) and graduate nursing programs, we are in a unique position to identify misunderstandings, scaffold learning to correct them, and promote not only higher level learning, but also the development of more complex cognitive structures. Formative testing with mindfully developed MCQs that measure all aspects of Bloom’s cognitive taxonomy and well-organized tests is key to achieving this. As Wiggins and McTighe (2005) summarized:
Many students, even the best and most advanced, can seem to understand their work (as revealed by tests and in-class discussion) only to later reveal significant misunderstanding of what they “learned” when 186follow-up questions to probe understanding are asked or application of learning is required. (pp. 51–52)
Unfortunately, research on test item banks (Masters et al., 2001) and faculty written questions (Hoseini, Shakour, Dehaghani, & Abdolmaleki, 2016; Jozefowicz et al., 2002; Vanderbilt, Feldman, & Wood, 2013; Wankhede & Kiwelekar, 2016) revealed that MCQs are often written at the lower cognitive levels of knowledge and comprehension. Unless we check for student understanding by asking higher level cognitive questions, we will never know what is understood and what is not, what is being transferred correctly and what is not. For example, when students in an introductory biology course were tested throughout the semester with higher level questions, understanding was deeper and retention longer (Jensen, McDaniel, Woodard, & Kummer, 2014).
Keep in mind that misconceptions and misunderstandings are not necessarily a failure of the educator, as students learn based on what they already know or think they know. Faculty cannot expect to understand where each student’s learning begins. Consequently, development in cognitive structures can go awry and, unless we ask specific questions to assess knowledge and build additional questions to assess higher level learning, we may never know where the misunderstanding occurred. Now, realistically, this cannot be accomplished for everything we want students to understand, but doing so for the big ideas (Wiggins & McTighe, 2005) is a good place to start (see Chapter 3).
Task Analysis for Writing MCQs
MCQs must be purposely written to ensure they assess what you intend for them to assess. Thus, the desired outcome of the test you are developing should be clear in your mind before starting to write questions. If the test is formative and its purpose is for students to activate prior knowledge and review foundational content in order to prepare for the discussion, writing questions at a specific level to assess an objective is not necessary. Another purpose for a formative quiz is to make students aware of what content is important and what understandings are necessary, which will help to guide their study. Again, these questions may not assess an objective, but should be written at various levels of Bloom as is necessary for learning.
Task Analysis of an Objective
If MCQs are being written for summative assessment, which will assess one or more of the objectives, then the level of the verb used in the objective to 187be assessed indicates the highest level of question you can write. For example, if you have an objective that contains an application-level verb, it is unfair to students to write questions at the analysis, synthesis, or evaluation levels. Keep in mind that one purpose of objectives is to communicate to students what content should be learned and at which level. Thus, your test questions cannot betray that. The same is true for the other taxonomic domains. Thus, taking time to think about a task analysis pertaining to each objective is wise. This exercise will help maintain focus on the desired learning outcomes and not on the minute details of content, which is consistent with Backward Design (Wiggins & McTighe, 2005).
To illustrate how a task analysis of an objective guides the writing of test questions, we will return to the objective used in Chapter 4 for the advanced health assessment course. Both the objective and the task analysis of that objective are shown in Box 8.1. The verb correlate is from the synthesis level of Bloom’s cognitive domain and that is what will be assessed—not only students’ ability to come up with the right diagnosis, but also to demonstrate how the history guided what was assessed on the exam and how students put the pieces together to arrive at a diagnosis.