Conducting defensible and fair assessments

Chapter 5


Conducting defensible and fair assessments




Introduction


Generally, in pre-registration health care education students spend up to 50% of the course time in clinical practice. Learners’ lived experience in authentic practice contexts and their competence in dealing with the care situations they face should therefore form the basis of assessment (Gopee 2008, Phillips et al 2000). Competency-based assessment in the professions therefore needs to be based on realistic, complex workplace problems to generate the range of evidence of competence required to make valid and reliable assessments (Masters & McCurry 1990). This can be done by the careful selection and combination of methods of assessment that will best assess the particular component of competence (e.g. using observation to assess psychomotor skills and questioning to assess cognitive skills).


It is not possible or desirable to assess everything a person might need to know or be able to do. Assessment of clinical practice is inevitably based on a sample of the student’s performance in assessment tasks perceived to be relevant. An inference of competence is then made from the student’s performance on the set of arranged tasks. Competence is a construct that is not directly observable; rather, it is inferred from performance. Most typical assessments involve making inferences (e.g. tests of knowledge usually sample only a fraction of the required knowledge). On the basis of that score, an inference is made as to whether or not a student knows enough to be assessed as satisfactory (Rowntree 1987). Grades and degree classifications are made on that basis; hence assessment of clinical practice, in common with other types of assessment, involves inference – and inferences are subject to error (Gonczi et al 1993). This is perhaps one major weakness of most, if not all, of our assessment systems.


As I see it, two major expectations are made of practice educators. First, they are required to make professional judgements in interpreting what the minimum acceptable levels of competence with respect to professional standards. These judgements are frequently made within the role relationship of that of a mentor cum assessor to a student – this role relationship may very well influence assessment judgements. The reader is directed to a discussion of the issues and dilemma of the mentor–assessor interface in Chapter 2. Secondly, assessment evidence obtained through the methods used requires to be ‘subjectively scored’. Reliability, and perhaps even validity, may be compromised. Students may then be assessed unfairly or incorrect assessment decisions are made – and what could be worse than passing a student who has not achieved the goal of professional health care education, which is to be ‘fit for purpose’ and ‘fit for practice’.


Our assessments need to be defensible to the public and fair in that they distinguish between good and under performers correctly (Schuwirth et al 2002). Assessments are relied upon to make some quite specific, but also far-ranging, judgements about students’ future competence as registered practitioners. In deciding whether the assessments are sufficiently robust to enable sound judgements to be made, clear criteria should be used for deciding if the assessments are both defensible and fair. A question to be asked is this: ‘Do our assessments enable us to make such judgements soundly?’ Deciding whether or not an assessment lives up to this task is not straightforward.


There is now an examination of those factors to be considered in order to make sound judgements in assessments. Measures to attain objective assessments and avoid subjective assessments are suggested.



The key concepts of conducting defensible and fair assessments


Justice is a basic part of the functioning of a civilized society: we believe in justice not only for the accuser and the accused in criminal trials, but also for parties in any dispute. When involved with any situation when a decision about fair play has to be made, a person with a sense of justice is likely to think that the decision made to carry out a certain action is just and fair, and the other person has had fair treatment. As an assessor you have to make assessment decisions continually and these decisions about student performance must be just and fair. How can you ensure that this is so? Furthermore, assessment for certification, so-called ‘high-stakes’ assessment, as in professional health care education, should offer sufficient reliability and validity for public scrutiny. High-stakes assessments at national levels, as is the case with pre-registration health care education, must offer comparability (Downing 2004, Gipps 1994a). In its document Making a Difference, the Department of Health (1999) stated that the health service of the country needs to know that nurses and midwives, and by inference all health care professionals, are trained to broadly the same standards and have the same skills. How can we achieve these ‘orders’ through our assessment activities? Or are these orders too tall?


What do the words ‘fair’ and ‘defensible’ mean to you? According to the Collins Pocket Dictionary and Thesaurus (1993), to be fair is to act ‘according to rules’. To ‘defend’ your position, you generally have to justify yourself with sound reasons. The next question then may be: What are the rules of defensible and fair assessments?Stoker & Hull (1994) state that four attributes need to be fulfilled to make assessment defensible and fair – these attributes may thus form the rules of defensible and fair assessments. Quinn & Hughes (2007) refer to these as the four ‘cardinal criteria’ of every effective test. The four cardinal criteria or attributes are:




Validity


Gonczi et al (1993) maintain that the most important issue in competency-based assessment is that of validity. The traditional definition of validity is the extent to which a test measures what it was designed to measure. If it does not measure what it purports to measure, then its use is misleading (Gipps 1994a). There are two key issues here that are important to the assessment of clinical practice: how measurements are made and what measurements are made. The use of the strategy of triangulation will help ensure that a more complete picture of the student’s competence is obtained, thereby enhancing validity. The reader is referred to Chapter 4 for a discussion of the use of appropriate methods of assessment and the strategy of triangulation to achieve validity of assessment. Valid assessment in clinical practice depends on methods of assessment used that are appropriate to the attribute of the competence being assessed (e.g. valid assessments of psychomotor skills are unlikely to be provided by the use of questioning). In health care education, what we purport to measure must be that of ‘the ability to actually care for patients’ (Gerrish et al 1997:70). Assessment for accountability purposes, as in health care professions, should aim for high validity, as health care professionals must be fit for the purpose of caring for patients and clients. We therefore have to be clear about what we want to measure. Rowntree (1987) tells us to ‘articulate as clearly as possible the criteria by which we assess, the aims and objectives we espouse, what qualities we look for in students’.


When assessing in clinical practice, validity is inferred. It is difficult to measure validity (Gonczi et al 1993). The way to infer validity is to collect evidence of the different types of validity that matter to us in clinical practice. The types of validity to be discussed here are:




Content validity


This concerns the coverage of appropriate and necessary content (Gipps 1994a). Newble (2004:38) states that content validity is the ‘most fundamental requirement in ensuring the quality of a competency test’. The following questions can be asked in association with this concept:



• Has the assessment sampled adequately the content of the course? In the case of pre-registration health care students, has the assessment sampled adequately the NMC and HCPC standards of proficiency? Newble (2004) and Crossley et al (2002) recommend using a ‘blueprint’, and Fraser et al (1997) an ‘assessment matrix’, for adequate sampling. This is a way of including the sample of items to be included in the assessment. The simplest form is a two-dimensional matrix with one axis representing the generic competencies such as history taking, communication skills, management skills and care-planning skills. The other axis represents the problems or conditions or clinical tasks or types of patients/clients on which the competencies will be demonstrated. A sample of practice representing a valid selection from each dimension can then be identified.


• Has the assessment sampled across the range of context for a particular competency? Ability to perform in one situation is a very poor predictor of performance in another, even similar, situation; wide sampling across a competency is therefore required to achieve an adequate level of content validity and reliability (Newble 2004).


• Is the item being assessed within the content of the course?


• Does the item being assessed require to be assessed at this stage of the student’s training?


It is of course necessary to have knowledge of the content and structure of the course of students being assessed in order to achieve content validity.



Face validity


This is the extent to which an assessment appears to be testing what we want students to be able to do. Gerrish et al (1997) point out that many pre-registration nursing programmes require students to produce written evidence of the achievement in practice, but that this does not necessarily indicate their ability to actually care for patients (i.e. this form of testing for clinical competence lacks face validity). Another example of a test lacking in face validity comes from Wolf (1995), who gives the example of the use of multiple-choice tests of the type often used to license professionals in the USA. For example, in the case of a physician after qualification, the multiple-choice questions that have to be taken do not test the physician’s competence to practise (McGaghie 1991, in Wolf 1995). Masters & McCurry (1990) believe that face validity is likely to be enhanced by making set tasks resemble those encountered in day-to-day practice in a profession. During the clinical practice of students, careful planning of learning experiences will help students participate in the day-to-day activities of their profession to facilitate the achievement of face validity.



Predictive validity


This relates to whether the assessment predicts accurately or well some future performance (Gipps 1994a). Rowntree (1987:189) gives this warning about predictive validity:



In the health care professions, it is important to try to predict students’ competence in the future so that, as a minimum, at the point of qualification, they are competent to practise. Whether they remain competent in the future is of course beyond our control. What can be done in attempts to achieve predictive validity? The use of continuous assessment may help here. The constant and regular supervision, guidance and feedback given to students will reinforce learning and hence their development and achievements. Keeping an evidence log, as shown in Figure 7.5, will assist in keeping track of the progress of students and the types of experiences they are having. Ensuring that students have a range of clinical experiences, with repetitions if necessary, will help them acquire the skills to practise with confidence and perhaps predictability. Stoker (1994:v) says that:



If there is consistency of performance, there is a higher chance of predictive validity in assessment as ‘the best indicator of future performance is past performance’ (Wolf 1995:44). Correspondingly, the best predictor measure will be to incorporate and assess those future behaviours that are of interest.



Construct validity


Constructs are the qualities, abilities and traits that explain aspects of human behaviour; these cannot be observed directly (Rowntree 1987). Honesty, maturity, kindness and intelligence are some examples of constructs. Construct validity is the extent to which assessment reveals the construct being assessed. We are required to make value judgements about certain aspects of the student’s behaviour or personality. Rowntree (1987:84) asks whether we are deluding ourselves when we make those judgements. ‘Is what we see (or not see) in students a figment of our imagination – a fabrication of the mind of the beholder to some extent?’ To what extent do our personal constructs influence construct validity? Abstract concepts such as attitudes and values are notoriously difficult to measure (Nolan & Behi 1995, Ashworth & Morrison 1991). Again, the use of continuous assessment may be helpful here. Working with and assessing the student over a period of time in conjunction with other practitioners, and utilizing a range of care-giving situations, will give the practice educator more opportunities to assess the student’s attitudes and values, thus making assessment in this area of learning more accurate. It will also allow students more opportunities to demonstrate the attitudes and values they hold. For example, imagine you are trying to assess a student’s attitudes to the elderly and you work with him/her on one occasion with one elderly person. Your student appears to have difficulty demonstrating sensitivity to the needs of the patient, even though questioning reveals that he/she is theoretically well aware of those needs. Do you then assume that the student’s attitudes towards the elderly are not good? No, of course you don’t. Your student’s performance on that occasion might be due to a number of factors; for example, she may find it particularly difficult to relate to that individual patient, or she may simply be very tired or is distracted by some pressing problems.



Concurrent validity


This is concerned with whether an assessment in an aspect of performance correlates with, or gives substantially the same results as, another assessment in a related area of performance (Gipps 1994a, Davis 1986). In other words, does the assessment predict performance in related areas? How far can we generalize from the ability to perform one task to an ability to perform other tasks in the same domain? For example, if you are working with a student who is able to assess and give immediate nursing care to asthmatic patients in relation to their dyspnoea – that of nursing them at rest in a comfortably supported and upright position – can you conclude from the student’s care of these patients that the student is able to assess and give the necessary immediate nursing care to patients with dyspnoea from other causes? I would suggest that you could, because the principle of care is the same for all patients with dyspnoea. Your assessment has concurrent validity. Consider this other example: there is assessment evidence to say that the student is able to give intramuscular injections safely into the gluteus muscle. Can you assume that the student is able to give intramuscular injections safely into other sites of the body? I would suggest that you could not make this assumption because, even though the principles of giving an intramuscular injection remain the same, there are different dangers associated with the use of these alternative sites.


To achieve concurrent validity, the assessor needs to be aware of the range of context of practice (see Ch. 6) to achieve competence in an area of care so that the required experiences can be arranged. Specifying the range in the assessment plan is therefore necessary to achieve concurrent validity. In other words, the number of tasks should be increased to ensure comprehensive coverage of the domain to improve generalizability (Linn 1993).


There are two final points to make about validity. First, the more measures there are in agreement, and the more closely they agree, the more likely it is that they do actually measure what they claim to (Crossley et al 2002). Secondly, a general principle underlying validity in competency-based assessment is that the narrower the base of evidence for the inference of competence, the less generalizable it will be to the performance of other tasks (Gonczi et al 1993). Generalizability is a particular problem for performance assessment, as direct assessments of complex performance do not generalize well from one task to another (Gipps 1994a). This is because competent performance is heavily task-dependent.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Oct 17, 2016 | Posted by in MIDWIFERY | Comments Off on Conducting defensible and fair assessments

Full access? Get Clinical Tree

Get Clinical Tree app for offline access