© Springer International Publishing Switzerland 2017
Juan A. Sanchez, Paul Barach, Julie K. Johnson and Jeffrey P. Jacobs (eds.)Surgical Patient Care10.1007/978-3-319-44010-1_33. Concepts and Models of Safety, Resilience, and Reliability
(1)
School of Humanities, Languages and Social Science, Safety Science Innovation Lab, Griffith University, Macrossan Building (N16), 170 Kessels Road, Brisbane, QLD, 4111, Australia
Keywords
Normal accident theoryComplexityDriftResilienceHigh reliabilitySafety“This place would be a lot safer if I could just get rid of the nurses who make mistakes”
—Nurse Manager
Introduction
Approaches to safety have often considered the “human” factor in an organisation or operation as a major contributor to unwanted outcomes. Most responses to this “problem” involve trying to exert more control over people [1]. This can happen through the generation of policies, guidelines, and prescriptions, and of course the enforcement of procedures. While these may make intuitive sense for some, research suggests that such a view may not be valid as an extensive focus on failures creates the erroneous impression of humans as a liability, and ignores the many other instances of humans contributing to success and resilience [2]. Not only are people crucial in the creation of safety in the messy details of everyday work, there are also an enormous number of other factors (many of which are beyond control of the human at the sharp end) that are behind the creation of success and the occasional failures.
Normal Accident Theory
With the rapid advancement of technology, many organisations today are complex systems, and these systems interact with an equally (if not more) complex environment [3, 4]. Complexity has been argued to render these organisations accident prone in two ways. First, minor failures between multiple components within a system can interact in incomprehensible or difficult-to-follow ways to produce a larger failure. Second, the complexity of these systems makes it difficult for any one individual to fully comprehend every single process involved in keeping the system functional [4, 5]. Therefore, when an accident occurs, operators within the system may find it difficult to remedy the situation. Most retrospective responses to such issues rely on adding more components or layers of defences, such as an extra alarm or another backup power generator. However, this only adds to the system’s complexity and might lead to even more unintended interactions and consequences. Given that failures involving complex component interactions are unusual and often unforeseen, they are not considered when we attempt to determine the probability of an accident occurring. Therefore, it is likely that the actual probability is much higher than we think.
Of course, not all organisations or surgical operations may encounter accidents since they are loosely coupled [3]. In such systems, the continued functioning of a component is rarely dependent on the functioning of other components [3, 6]. For instance, the performance of a medical faculty in a university is rarely dependent on the performance of the business faculty. This is not the case for tightly coupled systems such as the operating room, where the function of the surgeon depends greatly on the function of another component such as the anaesthesiologist, and thus an issue with one of them is likely to lead to an issue with the other. In turn, other personnel (e.g. nursing and the recovery room staff) who rely on them will experience disruption to their work as well. These disruptions and issues may interact with one another in an unforeseeable manner, causing an accident. In sum, organisations that operate using systems that are both complex and tightly coupled will likely experience an accident and numerous near misses at some point in time [3, 7]. These accidents are an expected by-product of a complex and tightly coupled system, and therefore seen as “normal”. Hence the term normal accident theory.
Complexity Science
Some might still argue that accidents are a result of human error [8, 9]. This section discusses complexity and explains why blaming accidents on human error alone may be a simplistic approach that misses the bigger picture. We will look at the underlying assumptions, and argue why these assumptions may not be realistic, especially in a medical or surgical setting.
The perception of accidents as the simple product of human error usually contains at least four underlying assumptions. First, it assumes that the system involved solely operates in a linear manner [10]. In other words, A only causes B, B only causes C, and so on. Second, it assumes that since the system operates in a linear manner, it therefore follows that with sufficient knowledge, an operator within the system can or should be able to predict the outcome of their actions. Therefore, when an adverse event occurs, such as a wrong-sided surgery, the surgeon is often blamed for not having anticipated the outcome. Third, it assumes that the linear manner in which the system operates means that it is possible for one to reverse the linear process to discover the cause of an accident. In other words, since C is only caused by B and B is only caused by A, this means that A is the source (or root cause) of the problem. Fourth, it assumes that it is possible for investigators to collect all the information necessary to form a true story of what exactly happened to give rise to the adverse event.
However, these assumptions may not be realistic, especially in the domain of healthcare and in highly complex surgical microsystems [11]. There are many examples which indicate that not all systems operate purely in a linear manner. For instance, the performance of a nurse in a hospital is potentially influenced by a plethora of factors like the nurse’s case load, whether there is a staff shortage, the type of observation charts used, the noise level and lighting within the wards, and whether the nurse is interrupted [12–16]. Likewise, the performance of a surgeon can be affected by factors such as disruptions, fatigue, and stress levels [17–19].
Since the healthcare system operates in a complex manner, it stands to reason that the second assumption of outcomes being predictable is likely to be false. A complex system like healthcare is likely to experience a huge amount of interactions, some of which are non-linear, among all of its components [20–22]. These interactions can take a range of forms, such as the interactions between staffs across multiple disciplines or small physiological changes within a patient interacting to cause major disruptions in the patient’s health. Systems of such complexity mean that it is impossible for any one individual to fully comprehend all the tasks necessary to keep it functional [4, 5]. Given the complexity and interactivity involved, outcome prediction is near impossible.
Following from the above, the third assumption is likely to be false as well. Since the healthcare system is immensely complex and highly interactive, finding out the factors contributing to an accident is not as easy as simply reconstructing a linear process [10]. Moreover, not all accidents have a cause, as discovered during the investigation into the accidental shooting of two US Black Hawk helicopters by two US fighter jets. This shooting is thought to have happened due to the many local units each developing their own procedures and routines to manage local demands. The development of local procedures and routines is a normal occurrence, as the original plans do not always suit the local situation. However, the differences in procedures and routines among the various units made it difficult for these units to act smoothly and successfully in a tightly coupled situation, leading to the shooting [23, 24]. Lastly, this assumption also depends on the accident investigator being given full access and the ability to gather all the necessary information to reconstruct an accurate picture of the accident. As will be argued below, it is highly unlikely for that to happen.
The fourth assumption regarding an investigator being able to gather all the necessary information to reconstruct an accurate picture of the adverse event is likely to be an invalid assumption, for the following reasons. First, systems that are highly complex and interactive tend to continuously evolve, thereby retarding any attempts at retrospective analysis especially for an outsider unfamiliar with the nuance changes in complex systems [25]. Second, a huge amount of information might be lost or difficult to obtain in the course of accident investigations since one’s behaviour can be influenced by a multitude of factors, such as unwritten routines or subtle oral or behaviour influences by other supervisors or staff members [26].
Third, research has shown that memory is unreliable and highly context dependent [27–30]. The way in which a question is phrased has the capacity to alter answers and memories. Furthermore, people are also susceptible to incorporating misinformation from various sources into their memory of an accident. Thus, this might hinder or at least affect attempts at information gathering and increase the chance of hindsight bias [31].
Lastly, the process of reconstructing a representation of an accident is at risk of succumbing to the hindsight bias [31]. Given that the outcome of an accident is already known, it is easy for accident investigators to determine which behaviour or decision led to the accident and wonder why the people involved failed to notice the same things. In doing so, the challenges that these people faced are trivialised and the bigger picture, that such accidents are mostly the product of complex and interactive systems, is missed.
In summary, attributing adverse events to human error hinges on the four assumptions being valid. However, these assumptions are unrealistic in complex and interactive systems like healthcare. Rather than looking at accidents using a linear approach, we should perhaps follow in the footsteps of high-reliability organisations (see section “Principles of High Reliability ”) and adopt a systems approach instead, which is well suited for complex settings such as in surgical setting. Essentially, this approach takes the view that an individual failure is a symptom of a larger problem within the system, which enables organisations to learn from their mistakes and improve the system [32–34].
It should be noted that such an approach does not mean that humans are entirely blameless, as there are scenarios in which pursuing individual responsibility might be necessary [35]. However, most errors are arguably committed by proficient and well-meaning operators who possess a finite capacity (as do all humans) and who face numerous challenges when carrying out their duties [31, 36]. Thus, the focus here should not be on punishing them, but to examine the means of improving the system in order to alleviate some of their difficulties and attenuate future adverse events [32, 36].
Safety Drift and Procedural Violations
Safety Drift
Healthcare systems are vastly complex and set in an environment that is equally (if not more) complex [3, 4]. Besides consisting of a multitude of individual components (e.g. doctors and nurses, technological artefacts, regulatory pressures), systems of such complexity also possess subsystems (e.g. anaesthesiology team, general surgery team) that are working to achieve their own goals [31]. These goals are not always compatible, however, resulting in conflicts that need managing. Those involved would have to make decisions based on the situation and some of these decisions might require the sacrificing/trade-off of safety to achieve a particular production goal or to live up to other duties [37, 38]. Typically, this trade-off does not yield any immediate negative consequences [39]. Therefore, those involved would be misled into assuming that the trade-off is acceptable and it becomes part of the normal process. When another conflict emerges and another trade-off is made with no adverse results, this second trade-off might be once again be assumed to be acceptable and becomes part of the normal process. This process (known as normalisation of deviance) will repeat itself, slowly nudging the system towards greater risks until an adverse event takes place.
Despite the risks involved, those within the system are unlikely to be aware of this drift to failure as signs are typically only noticed by those outside of the system (e.g. accident investigators) after an accident has occurred [24]. To those within the system, seemingly poor decisions in hindsight are actually rational, given the contemporaneous circumstances [31]. While seemingly a bad phenomenon, the drift away from safety is not necessarily a negative indicator of an organisation’s performance [24]. Rather, it is simply a by-product of a complex system adapting to the challenges from both within itself and the environment. The challenge is to ensure that the clinicians involved understand the role and importance of these trade-offs (i.e. clinical sensemaking) [40].
Features of Drift
So what are the elements that contribute to a system drifting towards failure? At present, it is theorised that at least five factors are involved, namely (a) scarcity and competition, (b) decrementalism, (c) sensitivity to initial conditions, (d) unruly technology, and (e) contribution of protective structure [24].
Scarcity and competition refer to an organisation experiencing a lack of resources, and facing intense competition [24]. Rasmussen suggested that a typical organisation has to work within three boundaries, the first being economic, the second being safety, and the third being workload [41]. Working beyond the economic boundary means that the organisation would not be able to maintain itself financially, while crossing the safety boundary means that the organisation’s operation is highly dangerous (e.g. patient’s well-being may be endangered). Lastly, exceeding the workload boundary means that the people and/or the technologies within the organisation are no longer capable of carrying out their work. As mentioned earlier, organisations generally drift away from the safety boundary to satisfy production pressure since the loss of safety is rarely felt while the reaching (or not reaching) of production pressure is tangible [37].
Decrementalism means that an organisation moves to the edges of the safety boundary over a series of small steps (instead of instantaneously), as it attempts to meet production pressure, as explained earlier [24]. This should not be confused with normalisation of deviance, which refers to trade-offs made in response to abnormal situations (e.g. high demands) being seen as the new norm.
Sensitivity to initial conditions (otherwise known as the butterfly effect) essentially argues that seemingly small factors in a system’s starting conditions can lead to large failures, as these factors interact in novel ways to give birth to unintended consequences, pushing an organisation towards the edge of the safety boundary [24]. Unruly technology refers to the gap that exists between how designers of a technology think it will work, and how the technology actually works when exposed to the environment [24, 42]. For instance, the introduction of poorly designed health information technology in some hospitals has been argued to cause issues such as (a) making it difficult for physicians to gain a proper understanding of a patient’s condition, and (b) producing reports that lack information value, due to the technology’s insistence of using standard phrases [43].
The last factor is the contribution of protective structure, which suggests that the protective structure that was deliberately created to keep the operation safe can end up contributing to a drift towards failure [24]. One example is a safety or governance department that, through its generation of many different layers of defence and guidelines, actually contributes to complexity, thereby rendering real sources of risk less visible to the sharp end users.
Possible Means to Reduce Potential for Drift
Despite the potential for drift to result in unwanted consequences, a definitive solution to reduce an organisation’s potential for drift does not appear to exist. Nonetheless, this section will be devoted to the exploration of some of the ideas in the hopes that some would find it useful.
As suggested earlier, signs of drift are not always obvious to those within the organisation [24]. Therefore, one plausible approach of reducing an organisation’s potential for drift is to study how decision makers make sense of the information environment (e.g. why they take in certain bits of information and ignore others) as well as how they make and rationalise their decisions [44]. However, this may not be a fruitful endeavour since an organisation’s drift into failure is usually only known after an accident has occurred and any knowledge gleaned might be specific to that accident and have little applicability in other contexts.
Arguably, a decision maker must pay attention to multiple sources of information and invite doubt to make the best possible decisions [45]. But this may be an idealistic notion as decision makers may be bombarded with an enormous amount of information, which would require a long time to process, and immense cognitive resources [24]. Furthermore, tell-tale signs of drift may be weak or unbelievable, and hence go unnoticed [37].
Another potential approach would be to move the organisation away from the safety boundary, reducing the likelihood that it will be crossed and produce an accident [41]. Examples include reducing production pressure or investing in proven safety methods. However as with the above, expecting an organisation to reduce production pressure might be wishful thinking. Even if an organisation chooses to invest in proven safety methods, it is highly likely that production pressure will follow this increase as staffs would be expected to produce a greater output with the same resources (i.e. be more efficient) [37].
In sum, while there has been several suggestions on ways to diminish an organisation’s potential for drift, these suggestions each come with their own caveat . Nevertheless, this does not mean that it is impossible to reduce an organisation’s drift potential since there may be other solutions that have yet to be explored. For example, Rochlin and his colleagues have observed that the various subsystems on board a naval aircraft carrier were able to balance multiple constraints and pressures to consistently produce smooth performances [5]. Perhaps an in-depth study on how these subsystems co-operate and negotiate with one another might yield some useful information.
Procedural Violations
As argued earlier, drift is not an indicator of an organisation’s failing, but a sign of it adapting [24]. It can appear in many forms, such as procedural violation (also known as workarounds). Workarounds appear to be frowned upon as it deviates from rules and regulations, which some consider sacred [46]. Such a viewpoint may have its merits, for deviations from rules and regulations have resulted in unwanted results. For instance, it was argued that non-compliance with rules and regulations contributed to an incident where the wrong patient was given an invasive procedure.
However, it might be a mistake to assume that all forms of procedural violations are bad. For example, one form of medical guidelines in the USA specified the use of levofloxacin for community-acquired pneumonia [47]. But others have suggested that a physician should not always follow these guidelines as levofloxacin is an expensive form of antibiotics that not all patients can afford, and not having antibiotics could lead to patients’ conditions worsening [48]. To avoid this outcome, physicians need to deviate from the rules and regulations and prescribe a different and more affordable form of antibiotics. Furthermore, each patient has their own unique co-morbidities and medical history, making it near impossible to create a set of guidelines to address each case. Under such circumstances, physicians should be allowed to act as they see fit instead of being penalised for not complying with procedures. In other words , procedural violation may not always be a bad thing as it captures the local wisdom of the providers.
Stretching the Limits of Adaptive Capacity
As argued above, healthcare organisations have to adapt to multiple constraints both within itself and the environment [24, 31]. One way of doing so would be to stretch its adaptive capacity. Adaptive capacity refers to a system’s ability to adjust its actions in response to high production pressure, such as a hospital temporarily using stretchers or chairs in the hallways when there are insufficient beds to accommodate a sudden spike in demand [49, 50]. When a system attempts to adapt itself to handle a particular type of disruption, it will inevitably become less adept at handling other types of disruptions [51]. When these other disruptions actually happen, the system’s adaptive capacity will be tested and failure is a real possibility. Since failure is an unwelcome result, it is therefore important for a system to know where it stands in terms of its adaptive capacity, the type of problems that can arise in an adaptive system, and the means of stretching this finite resource if necessary [52]. For a system to figure out where it stands in terms of adaptive capacity, it should possess at least the following three characteristics: (a) capacity to reflect on how well it has adapted, (b) awareness to know what it is adapting to, and (c) changes within its environment [51].
There are three potential ways by which an adaptive system can break down [51]. The first is decompensation, which essentially refers to a system’s adaptive capacity being unable to keep up with a disruption that has occurred. In the initial phases of decompensation, the system automatically attempts to compensate when a disruption takes place and is somewhat successful in doing so, hence masking the problem as it continues to fester. Eventually, the system’s adaptive capacity would be drained, causing a sudden collapse and failure.