Hypothesis testing: selection and use of statistical tests

20


Hypothesis testing


selection and use of statistical tests




Introduction


Hypotheses are statements about the association between variables as pertaining to a specific person or population. For example, ‘penicillin is an effective treatment for pneumonia’ or ‘obesity is a risk factor for heart disease’. Hypotheses addressing the state of populations are tested using sample data. Inferences are conclusions based on data using samples and are therefore always open to the possibility of error. In this chapter we will examine the use of inferential statistics for establishing the probable truth of hypotheses, as tested through sample data. Inferential statistics are based on applied probability theory and entail the use of statistical tests.


There are numerous statistical tests available that are used in a similar fashion to analyse clinical data. That is, all statistical tests involve setting up the relevant hypotheses, H0 and HA, and then, on the basis of the appropriate inferential statistics, computing the probability of the sample statistics obtained occurring by chance alone. We are not going to attempt to examine all statistical tests in this introductory book. These are described in various statistics textbooks or in data analysis manuals. Rather, in this chapter we will examine the criteria used for selecting tests appropriate for the analysis of the data obtained in specific investigations. To illustrate the use of statistical tests we will examine the use of the chi-square test (χ2). This is a statistical test commonly employed to analyse categorical data. Finally, we will briefly examine the uses of the Statistical Package for Social Sciences™ (SPSS) for data analysis in general.


The aims of this chapter are to:




The logic of hypothesis testing


Hypothesis testing is the process of deciding using statistics whether the findings of an investigation reflect chance or real effects at a given level of probability or certainty. If the results seem to not represent chance effects, then we say that the results are statistically significant. That is, when we say that our results are statistically significant we mean that the patterns or differences seen in the sample data are likely to be generalizable to the wider population from our study sample.


The mathematical procedures for hypothesis testing are based on the application of probability theory and sampling, as discussed previously. Because of the probabilistic nature of the process, decision errors in hypothesis testing cannot be entirely eliminated. However, the procedures outlined in this chapter enable us to specify the probability level at which we can claim that the data obtained in an investigation support experimental hypotheses. This procedure is fundamental for determining the statistical significance of the data as well as being relevant to the logic of clinical decision making.



Steps in hypothesis testing


The following steps are conventionally followed in hypothesis testing:



1. State the alternative hypothesis (HA), which is the based on the research hypothesis. The HA asserts that the results are ‘real’ or ‘significant’, i.e. that the independent variable influenced the dependent variable, or that there is a real difference among groups. The important point here is that HA is a statement concerning the population. A real or significant effect means that the results in the sample data can be generalized to the population.


2. State the null hypothesis (H0), which is the logical opposite of the HA. The H0 claims that any differences in the data were just due to chance: that the independent variable had no effect on the dependent variable, or that any difference among groups is due to random effects. In other words, if the H0 is retained, differences or patterns seen in the sample data should not be generalized to the population.


3. Set the decision level, α (alpha). There are two mutually exclusive hypotheses (HA and H0) competing to explain the results of an investigation. Hypothesis testing, or statistical decision making, involves establishing the probability of H0 being true. If this probability is very small, we are in a position to reject the H0. You might ask ‘How small should the probability (α) be for rejecting H0?’ By convention, we use the probability of α = 0.05. If the H0 being true is less than 0.05, we can reject H0. We can choose an α of 0.05, but not more, That is, by convention among researchers, results are not characterized as significant if p > 0.05.


4. Calculate the probability of H0 being true. That is, we assume that H0 is true and calculate the probability of the outcome of the investigation being due to chance alone, i.e. due to random effects. We must use an appropriate sampling distribution for this calculation.


5. Make a decision concerning H0. The following decision rule is used. If the probability of H0 being true is less than α, then we reject H0 at the level of significance set by α. However, if the probability of H0 is greater than α, then we must retain H0. In other words, if:



It follows that if we reject H0 we are in a position to accept HA, its logical alternative. If p ≤ 0.05 then we reject H0, and decide that HA is probably true.



Illustrations of hypothesis testing


One of the simplest forms of gambling is betting on the fall of a coin. Let us play a little game. We, the authors, will toss a coin. If it comes out heads (H) you will give us ≤1; if it comes out tails (T) we will give you ≤1. To make things interesting, let us have 10 tosses. The results are:



image


Oh dear, you seem to have lost. Never mind, we were just lucky, so send along your cheque for ≤10. Are you a little hesitant? Are you saying that we ‘fixed’ the game? There is a systematic procedure for demonstrating the probable truth of your allegations:



1. We can state two competing hypotheses concerning the outcome of the game:



2. It can be shown that the probability of tossing 10 consecutive heads with a fair coin is actually p = 0.001, as discussed previously (see Ch. 19). That is, the probability of obtaining such a sample from a population where P = Q = 0.5 is extremely low.


3. Now we can decide between H0 and HA. It was shown that the probability of H0 being true was p = 0.001 (1 in a 1000). Therefore, in the balance of probabilities, we can reject it as being true and accept HA, which is the logical alternative. In other words, it is likely that the game was fixed and no ≤10 cheque needed to be posted.


The probability of calculating the truth of H0 depended on the number of tosses (n = the sample size). For instance, the probability of obtaining heads every times with five coin tosses is shown in Table 19.4. As the sample size (n) becomes larger, the probability for which it is possible to reject H0 becomes smaller. With only a few tosses we really cannot be sure if the game is fixed or not: without sufficient information it becomes hard to reject H0 at a reasonable level of probability.


A question emerges: ‘What is a reasonable level of probability for rejecting H0?’ As we shall see, there are conventions for specifying these probabilities. One way to proceed, however, is to set the appropriate probability for rejecting H0 on the basis of the implications of erroneous decisions.


Obviously, any decision made on a probabilistic basis might be in error. Two types of decision errors are identified in statistics as type I and type II errors. A type I error involves mistakenly rejecting H0, while a type II error involves mistakenly retaining the H0. Researchers can make mistakes about the truth or falsity of hypotheses using sample research data. Statistical method does not provide a guarantee against making a mistake, but it is the most rigorous way of making these decisions.


In the above example, a type I error would involve deciding that the outcome was not due to chance when in fact it was. The practical outcome of this would be to falsely accuse the authors of fixing the game. A type II error would represent the decision that the outcome was due to chance, when in fact it was due to a ‘fix’. The practical outcome of this would be to send your hard-earned ≤10 to a couple of crooks. Clearly, in a situation like this, a type II error would be more odious than a type I error, and you would set a fairly high probability for rejecting H0. However, if you were gambling with a villain, who had a loaded revolver handy, you would tend to set a very low probability for rejecting H0. We will examine these ideas more formally in subsequent parts of this chapter.


Let us look at another example. A rehabilitation therapist has devised an exercise program which is expected to reduce the time taken for people to leave hospital following orthopaedic surgery. Previous records show that the recovery time for patients had been µ = 30 days, with σ = 8 days. A sample of 64 patients were treated with the exercise program, and their mean recovery time was found to be image = 24 days. Do these results show that patients who had the treatment recovered significantly faster than previous patients? We can apply the steps for hypothesis testing to make our decision.



1. State HA: ‘The exercise program reduces the time taken for patients to recover from orthopaedic surgery’. That is, the researcher claims that the independent variable (the treatment) has a ‘real’ or ‘generalizable’ effect on the dependent variable (time to recover).


2. State H0: ‘The exercise program does not reduce the time taken for patients to recover from orthopaedic surgery’. That is, the statement claims that the independent variable has no effect on the dependent variable. The statement implies that the treated sample with image = 24, and n = 64 is in fact a random sample from the population µ = 30, σ = 8. Any difference between image and µ can be attributed to sampling error.


3. The decision level, α, is set before the results are analysed. The probability of α depends on how certain the investigator wants to be that the results show real differences. If he set α = 0.01, then the probability of falsely rejecting a true H0 is less than or equal to 0.01 (1/100). If he set α = 0.05, then the probability of falsely rejecting a true H0 is less than or equal to 0.05 or (1/20). That is, the smaller the α, the more confident the researcher is that the results support the alternative hypothesis. We also call α the level of significance. The smaller the α, the more significant the findings for a study, if we can reject H0. In this case, say that the researcher sets α = 0.01. (Note: by convention, α should not be greater than 0.05.)


4. Calculate the probability of H0 being true. As stated above, H0 implies that the sample with image = 24 is a random sample from the population with µ = 30, σ = 8. How probable is it that this statement is true? To calculate this probability, we must generate an appropriate sampling distribution. As we have seen in Chapter 17, the sampling distribution of the mean will enable us to calculate the probability of obtaining a sample mean of image = 24 or more extreme from a population with known parameters. As shown in Figure 20.1, we can calculate the probability of drawing a sample mean of image = 24 or less. Using the table of normal curves (Appendix A), as outlined previously, we find that the probability of randomly selecting a sample mean of image = 24 (or less) is extremely small. In terms of our table, which only shows the exact probability of up to z = 4.00, we can see that the present probability is less than 0.00003. Therefore, the probability that H0 is true is less than 0.00003.



5. Make a decision. We have set α = 0.01. The calculated probability was less than 0.0001. Clearly, the calculated probability is far less than α, indicating that the difference is unlikely to be due to chance. Therefore, the investigator can reject the statement that H0 is true and accept HA, that patients in general treated with the exercise program recover earlier than the population of untreated patients.



The relationship between descriptive and inferential statistics


As we have seen in the previous chapters, statistics may be classified as descriptive or inferential. Descriptive statistics describe the characteristics of data and are concerned with issues such as ‘What is the average length of hospitalization of a group of patients?’ Inferential statistics are used to address issues such as whether the differences in average lengths of hospitalization of patients in two groups are significantly different statistically. Thus, descriptive statistics describe aspects of the data such as the frequencies of scores, and the average or the range of values for samples, whereas inferential statistics enables researchers to decide (infer) whether differences between groups or relationships between variables represent persistent and reproducible trends in the populations.


In Section 5 we saw that the selection of appropriate descriptive statistics depends on the type of data being described. For example, in a variable such as incomes of patients, the best statistics to represent the typical income would be the mean and/or the median. If you had a millionaire in the group of patients, the mean statistic might give a distorted impression of the central tendency. In this situation the median statistic would be the most appropriate one to use. The mode is most commonly used when the data being described are categorical data. For example, if in a questionnaire respondents were asked to indicate their sex and 65% said they were male and 35% said they were female, then ‘male’ is the modal response. It is quite unusual to use the mode only with data that are not nominal. As a rule, the scale of measurement used to obtain the data and its distribution determine which descriptive statistics are selected.


In the same way, the appropriate inferential statistics are determined by the characteristics of the data being analysed. For example, where the mean is the appropriate descriptive statistic, the inferential statistics will determine if the differences between the means are statistically significant. In the case of ordinal data, the appropriate inferential statistics will make it possible to decide if either the medians or the rank orders are significantly different. With nominal data, the appropriate inferential statistic will decide if proportions of cases falling into specific categories are significantly different.


Thus, when the data have been adequately described, the appropriate inferential statistic will follow logically. However, when selecting an appropriate statistical test, the design of the investigation must also be taken into account.

Stay updated, free articles. Join our Telegram channel

Apr 12, 2017 | Posted by in MEDICAL ASSISSTANT | Comments Off on Hypothesis testing: selection and use of statistical tests

Full access? Get Clinical Tree

Get Clinical Tree app for offline access