Chapter Outline
History of Evidence-Based Medicine
Evidence-Based Medicine Process
At this point in history, when so much information is available with the click of a mouse or with a sweep of a finger, it is important for medical providers to continue to strengthen their capacity for incorporating evidence into their clinical decision making. Although providers have easier access to current information than ever before, the sheer volume of health-related data can quickly become overwhelming for providers who must efficiently care for patients. For this reason, it is important for busy clinicians to have a grasp of the process and principles of evidence-based practice from asking the question, to finding the evidence, to evaluating the quality of the evidence, and finally to incorporating the evidence into clinical decision making.
History of Evidence-Based Medicine
The challenges of implementing the best quality evidence into medical decision-making predates the modern medical era. The well-known “scurvy” experiment dates back to the British Navy of the 1740s. A naval surgeon, James Lind, conducted an experiment in search of a cause and a treatment for sick sailors. Although he had a small sample size, he did use important experimental principles, including the establishment of control groups, a clear endpoint, and the inclusion of similar cases, in an attempt to control for potential confounding variables. In his experiment, Lind clearly demonstrated the importance of citrus in the diet, but it took 7 years for his findings to be published and 40 years before the British Navy included citrus on every voyage. This delay in the implementation of best evidence into clinical practice has been a recurring theme historically.
Another example of early experimental evidence ultimately informing medical practice includes examination of maternal mortality rates by Semmelweis in the middle 1800s. Through a comparison of deliveries performed by physicians and those by nurse midwives, Semmelweis noted that mortality rates from postpartum infection were much higher for pregnant women attended by physicians. He ultimately attributed the increase to the fact that doctors routinely performed postmortem examinations early in the morning before attending their obstetric patients. The introduction of good handwashing practices significantly lowered the mortality rates for mothers whose babies were delivered by the physicians. However, mortality rates increased again when the new practice of consistent handwashing slackened. The historical challenges of implementing practices based on best evidence and sustaining those practices mirrors challenges encountered today.
Historically, collection of high-quality evidence was limited by bias and lack of blinding as individual physicians made observations about interventions and outcomes in their own patients. The earliest reported randomized controlled trials (RCTs) only occurred in the mid to late 1940s and included a streptomycin trial and a whooping cough vaccine trial. The whooping cough trial actually included elements of the placebo control and informed consent, strengthening the rigor of the trial and the validity of the evidence.
It has been a challenge, however, to summarize and communicate research-based evidence to make it usable by practicing clinicians. In 1967, David Sackett, MD, started the first Department of Clinical Epidemiology at McMaster University in Ontario, Canada. Before Sackett’s work, epidemiology and biostatistics and their implications in public health were not readily digestible for practicing clinicians. Sackett was among the first to develop practical tools for physicians to apply research evidence to the care of individual patients. Dr. Sackett continued at McMaster University until 1994 when he became the foundation director of the Center for Evidence-based Medicine at Oxford University. After his retirement from Oxford, he returned to Canada and continued to teach clinical epidemiology to students until his death in May 2015.
Another important figure in the history of evidence-based medicine (EBM) is Dr. Gordon Guyatt. Dr. Guyatt was the director of the internal medicine residency program at McMaster University for many years and was the first to coin the term “evidence-based medicine.” Throughout the 1990s, an ongoing series of articles was published in the Journal of the American Medical Association (JAMA) titled “User’s Guides to the Medical Literature.” These later led to the development of a textbook summarizing the principles of evidence-based clinical practice. In his work, Dr. Guyatt presented a methodical, easy-to-remember approach to the practice of EBM that many clinicians use today.
Guyatt and others credit three additional researcher-clinicians from an earlier generation who influenced their work in EBM. Dr. Tom Chalmers recognized the value of rigorous study design and randomized trials as early as 1955 in his paper on bed rest and diet for hepatitis. That paper heavily influenced Guyatt’s understanding of what he later called clinical epidemiology. Alvan Feinstein from Yale was both a clinician and a researcher who was a key player in the development of an approach to studying the ways medicine is practiced on a daily basis. The third individual was Archie Cochrane. His work as a clinician, an epidemiologist, and a medical school faculty member inspired the later development of the Cochrane Collaboration, which has become a recognized leader in the development of EBM and EBM resources.
Evidence-Based Medicine Process
Effective evidence based medicine incorporates five primary tasks. These are (1) asking a clinical question, (2) searching for evidence that addresses the question, (3) assessing the quality of the evidence, (4) incorporating the evidence into a clinical decision, and (5) evaluating the process.
Task 1: Asking a Clinical Question
Typically, clinical questions are categorized as background questions or foreground questions. Background questions are very general questions most often asked by new learners or by practitioners encountering an unfamiliar diagnosis or clinical presentation. Background questions commonly begin with who, what, when, where, how, or why. Examples of background questions may include, “Where is the incidence of Lyme disease highest?” or “What are the risk factors for osteoporosis?” The answers to these questions provide background information on a particular topic.
Foreground questions are very specific questions designed to provide guidance for the clinical care of a particular patient or group of patients. A foreground question about Lyme disease, for example, may compare two antibiotic dosing regimens for speed of recovery. A useful acronym for developing foreground questions is “PICO.” PICO stands for:
P: Population or patient—How would you describe a patient or population like yours?
I: Intervention—Which intervention are you considering?
C: Comparison—What alternative approaches are you considering for your patient?
O: Outcome—What am I hoping to measure, achieve or affect?
PICO questions can be developed to address a variety of clinical question types, including diagnosis, etiology or harm, prognosis, and treatment. For example, consider a commonly diagnosed disorder such as diabetes mellitus. A physician assistant may have many questions about diabetes. See Table 11.1 for sample PICO questions of each clinical type regarding diabetes. Properly structuring the question at the outset is the key step to obtaining a meaningful evidence-based answer.
Type of Question | PICO Question |
---|---|
Diagnosis | In patients with type 2 diabetes, is a 24-hour urine collection for creatinine clearance more sensitive than a serum creatinine for detecting early-onset kidney disease? |
Etiology or harm | In middle-aged adults, is family history of diabetes a greater risk factor than obesity for the development of type II diabetes? |
Treatment | In patients with new-onset type II diabetes, are saxagliptin and metformin more effective than glipizide and metformin at decreasing the risk of renal failure? |
Prognosis | In patients with type I diabetes, is a hemoglobin A1c goal of 6.0% more effective than hemoglobin A1c of 7.0% at increasing survival? |
Task 2: Searching for Evidence
The search for evidence begins with identifying the type of evidence of interest. Evidence can be broadly divided into two categories, filtered and unfiltered. Filtered evidence is that which has already been gathered and synthesized by experts into a format that is readily usable by clinicians. Clinical guidelines developed by professional bodies are an example of filtered evidence. Other examples include critically appraised topics (CATs), evidence-based summaries, structured abstracts, and systematic reviews. For a list of evidence-based filtered resources, see Table 11.2 .
Filtered (Secondary) Evidence | Unfiltered (Primary) Evidence |
---|---|
Clinical guidelines : National Guidelines Clearinghouse CATs : BestBETs Evidence-based summaries : UpToDate, Clinical Evidence, Bandolier Structured Abstracts : EBM Online, ACP Journal Club Systematic reviews : Cochrane Library Databases : Trip Database, Essential Evidence Plus | PubMed EBSCO Ovid |
Unfiltered or primary evidence includes original research articles published in peer-reviewed journals. There are variety of databases through which a search for primary literature may be conducted. Table 11.2 contains a list of examples of databases of primary literature. Individual practitioners will need to determine which databases are available through their employing institutions. The focus of the rest of this chapter will be accessing and assessing primary literature.
A systematic approach to searching medical databases is critical to uncovering the evidence. Table 11.3 provides a format for tracking progress through a systematic literature search. The search begins with identifying an available database and then choosing search terms. Start by entering the key words of the PICO question. For example, in the prognosis question, “For patients with stage IV colon cancer, is chemotherapy plus radiation more effective than chemotherapy alone at prolonging survival?” a search of the PubMed database may begin with the search terms, “stage IV colon cancer” and “survival.” Subsequent searches will include these first two terms and add “chemotherapy” and “radiation.” For the opening search, record the number of articles identified in the search table as demonstrated in Table 11.3 . If after the second search the number of articles identified remains unwieldy, limiters may be added to the search. Limiters may include acceptable dates of publication, desired publication language, human participants, or the study design. Table 11.3 contains an example of the recording of a step-by-step search for the colon cancer prognosis question.
Database | Search Terms | Limiters | Articles |
---|---|---|---|
PubMed | “stage IV colon cancer” and “survival” | None | 577 |
PubMed | “stage IV colon cancer” and “survival” | 2010–2015; English language | 234 |
PubMed | “stage IV colon cancer” and “survival” and “chemotherapy” | 2010–2015; English language | 92 |
PubMed | “stage IV colon cancer” and “survival” and “chemotherapy” and “radiation” | 2010–2015; English language | 6 |
Continue to narrow the search, step by step, until a manageable number of relevant articles is obtained. At that point, the titles and abstracts can be reviewed, allowing the practitioner to eliminate articles that are clearly irrelevant to the clinical question. Full-text articles are then collected for review and appraisal. If the article is not available to you in full text, consult a medical librarian for interlibrary loan options. In this way, it will not be necessary to limit a search to “full-text” articles only and potentially miss some important evidence. Additional primary evidence may also be uncovered through a hand search of reference lists at the end of some of your key articles. More detailed tutorials on searching databases are available on the website for individual databases or through consultation with a medical librarian.
Evidence Essentials
Research Study Design
After the primary literature has been searched and sources of evidence identified, it is important to assess each article for usefulness and validity. Ultimately, the evidence-based practitioner aims to uncover the most valid evidence available to inform clinical decision making. An important feature of research studies that affects their validity is the study design. Generally, research study designs that address the types of clinical questions in Table 11.1 can be divided into two categories, experimental and observational. Experimental studies are those in which the investigator assigns (preferably randomly) study participants into their respective groups. The classic example of an experimental design is the RCT. RCTs are frequently used to assess the efficacy of new treatments or interventions. In an RCT, study participants are randomly assigned to either the new treatment or one or more comparison groups and then followed over time for the development of the outcome of interest. The rate of occurrence of the outcome is compared in the two groups to determine which treatment is more effective.
Observational study designs are those in which the investigator observes existing groups of patients. The three most common observational designs are cohort, case-control, and cross-sectional. In a cohort study, a group of people with a common characteristic (cohort) is assembled, and the participants are divided into two or more groups based on their level of exposure to the independent variable of interest. These groups are then followed over time to see develops the outcome of interest. Cohort studies can be used to address any of the question types in Table 11.1 . The independent variable could represent a therapeutic option, in which case one of the study groups would have undergone the therapy of interest, and the other would have experienced an alternative treatment or perhaps none at all. Alternatively, in an etiology question, the groups within the cohort are categorized as exposed or unexposed to some risk factor. Similar to the RCT, the participants in a cohort study are then followed forward in time to determine the rate of development of the outcome of interest. The outcome may be cure or improvement of symptoms in a treatment study, development of disease in an etiology or harm study, or survival or mortality in a prognosis study.
The case-control study design is very different from the cohort in that the groups of participants are defined by disease state (the outcome) rather than by exposure. Case-control studies are particularly useful in the examination of rare diseases for possible risk factors. For this type of question, a group of people with a disease (cases) are identified and then matched to a group of control patients. Ideally, the control participants will be like the cases in every respect except that they do not have the disease of interest. Then the cases and control participants are queried for their level of exposure to a possible risk factor. For example, to explore the possible association of maternal exposure to secondhand smoke with the development of congenital anomalies, investigators would assemble a group of women who have birthed babies with congenital anomalies and a group of women who have delivered healthy babies and query both groups of mothers about their exposure to secondhand smoke during their pregnancies.
Cross-sectional studies are the third common type of observational design. A cross-sectional study is sometimes referred to as a “snapshot” or “slice in time” because the exposure and outcome variables are measured at the same point in time for the study participants. Cross-sectional studies can be used to assess disease prevalence but not incidence. The cross-sectional study can be conducted more quickly and cheaply than other study types, but it is often difficult to ascertain the temporal relation between the exposure and outcome because they are measured at the same time. Causality can never be established by a cross-sectional study.
Two additional study designs that are important for evidence-based practitioners to understand are the systematic review article and meta-analysis. These both represent filtered evidence in that the authors have searched out the original research and synthesized the information to address a clinical question. In a systematic review article, the investigators perform a systematic search of all of the primary literature on a topic, locate these articles, critically review the articles, and develop a response to their clinical question based on the evidence. In a meta-analysis, this process is taken one step further. The investigators not only seek out primary research, but they also seek to gather the original data from the investigators and determine whether it is legitimate to pool those data, repeat the statistical analysis, and come to a new conclusion based on the larger sample size. The strengths and limitations to these approaches are addressed later in this chapter. However, perhaps their greatest strengths are the increased sample size and broader perspective on a clinical question.
Evidence Pyramid
Proponents of EBM have developed an evidence pyramid to help users understand the relative rigor of the various study designs. Of the epidemiologic study designs discussed, the systematic review article and meta-analysis provide the greatest rigor in terms of evidence because of their increased sample size and more representative populations. Among the individual study designs, the RCT is the most rigorous followed by the cohort study, the case-control study, and the cross-sectional study ( Fig. 11.1 ).
Important Concepts in Outcome Measurement
Although evidence-based practitioners need not be a trained statistician to be effective in critical appraisal of the literature, a working knowledge of basic statistical principles empowers them to evaluate the evidence with greater confidence. A few important statistical concepts regard types of data, types of variables and level of measurement. Generally, data can be characterized as qualitative or quantitative. Whereas qualitative data are often represented by words, quantitative data involve numerical expressions. Variables are defined as either independent or dependent. Independent variables are set by the researcher. These often include an intervention in an RCT or an exposure in an observational study. The dependent variable represents the outcome of interest. In the sample PICO questions in Table 11.1 , the dependent variables included early-onset kidney disease, type II diabetes, renal failure, and survival.
In addition to being characterized as independent or dependent, variables represent different levels of measurement. A nominal level of measurement involves only one property, and that is classification. When variables are measured at the nominal level, their values are classified into categories. Examples include eye color, vital status (alive or dead), and so on. At the ordinal level of measurement, the additional property of order is present. Variables measured at the ordinal level are classified into categories that have an inherent order. For example, cancer is recorded as stages I to IV. Interval or ratio levels of measurement are marked by their characteristics of equal intervals and a true zero. Variables measured at an interval or ratio level are often further divided into continuous or discrete. Continuous variables represent “amounts,” and discrete variables represent “counts.” Continuous variables are measured with units; discrete variables have no units. Examples of continuous data include height, weight, and systolic blood pressure. Examples of discrete data include number of pregnancies, number of hospitalizations, and number of surgeries. The level of measurement for the variables included in a research study dictates the type of statistical analysis that is indicated. Consumers of the medical literature are better positioned to confidently appraise research articles when they have a basic understanding of the connection between levels of measurement and statistical analysis. For example, in a study of a new weight loss drug, the outcome of interest may be average weight loss in the group that took the new drug compared with the group that took the standard of care treatment. The dependent variable, weight loss, is a continuous variable, and the outcome would be expressed as the mean number of pounds. The independent variable is the type of treatment, a nominal variable that splits the participants into two groups. The appropriate statistical test is an independent t -test, which evaluates the difference between means in two groups. A more detailed explanation of the appropriate statistical test for various levels of measurement may be found elsewhere.
Evidence: Translating the Greek
Most readers of the medical literature are generally familiar with the concepts of P value and confidence interval (CI). However, type I (α) and type II (β) errors are an important underpinning to the interpretation of study results and are often less well understood. Clinical trials and observational studies are usually founded on some type of a research hypothesis. The research hypothesis may be that New Drug A is more effective than Standard of Care B at preventing a particular outcome. In conducting the study, however, a more specific hypothesis is required. As a result, the researcher tests a null hypothesis (H 0 ). In this case, the null hypothesis is that there is no difference between New Drug A and Standard of Care B. Before beginning the study, the investigators must decide how great a chance they are willing to take of making a type I error. A type I error occurs when the researchers fail to reject a false null hypothesis. By comparison, a type II error occurs when the researchers reject a true null hypothesis. The degree of risk they are willing to take of making a type I error is generally referred to as “level of significance” or “ α .” By convention, α is set at 0.05. This means that the investigators are willing to accept a 5% probability that the results occurred by chance alone. Fig. 11.2 depicts the possible outcomes of a research study.