Chapter 34. Descriptive Statistics
Ian Atkinson
▪ Introduction
▪ The nature of numerical data
▪ Presentation of numerical data – tables
▪ Presentation of numerical data – charts
▪ Measures of central tendency and dispersion
▪ Conclusion
Introduction
Quantitative research often involves the collection of very large amounts of numerical data. To make sense of numerical data we need to apply methods to summarise numbers into a format which is easy to assimilate. This is known as descriptive statistics. Once data are described, a further range of inferential statistics remains to be applied before a full understanding of the data can be achieved (Chapter 35).
The nature of numerical data
Information obtained as part of quantitative research is stored and analysed in numerical format. Sometimes information is directly recorded as numbers, for example age, blood pressure and body weight. Information which is not numerical can be changed by using systems of data coding. For example, a person’s gender could be recorded as ‘1’ for a man and ‘2’ for a woman. The actual numbers used are not important so long as we use a different number for each category. For example, suppose a surgical patient’s experience of pain is assessed using the terms ‘no pain’, ‘mild pain’, ‘severe pain’ and ‘very severe pain’. These categories have an obvious logical order in terms of pain severity and coding should involve ascribing the lowest number to the lowest level of pain and the highest number to most severe pain. Consequently the codes would be as follows, 1 = no pain; 2 = mild pain; 3 = severe pain; and 4 = very severe pain.
Already it can be seen that numbers are used in rather different ways. Those used to record age or weight can quite properly be added, subtracted, multiplied and divided. On the other hand, in the case of gender, it would be completely meaningless to add the above. The above measurement of pain allows us to place values in an orderly fashion but if we attempted to subtract severe pain from mild pain then our answer would have no meaning. The different ways in which numbers have been used in these examples can be referred to as different levels of measurement.
Levels of measurement
There are four different levels of measurement and these include, in order of increasing precision, nominal, ordinal, interval, and ratio levels. Each level of measurement can be defined in terms of the properties shown in the first column of Table 34.1. The different levels of measurement appear in the first row. The nominal level is the lowest level of measurement and is more a system for classification, its single property being that we can distinguish different categories. For example, information about gender falls into two different categories, i.e. men and women. These differences cannot be measured numerically nor can the categories be ranked.
Nominal | Ordinal | Interval | Ratio | |
---|---|---|---|---|
Different categories | Yes | Yes | Yes | Yes |
Categories can be ranked | – | Yes | Yes | Yes |
Equal distances between categories | – | – | Yes | Yes |
Fixed zero | – | – | – | Yes |
At the ordinal level of measurement we distinguish different categories and also place them in ascending order. An example of this type of measurement can be seen in grades of medical staff, i.e. junior house officer, senior house officer, registrar and consultant. We know there are differences between the grades and they can be meaningfully ranked in terms of seniority. However, the differences between the grades cannot be quantified.
The interval level of measurement has one additional property to the ordinal level, i.e. there are equal differences between the categories. This means that we are able to subtract one category from another to give a result which has a meaning. A very commonly used example of this is the measurement of temperature in degrees Celsius. Here there is an equal distance of one degree of heat between every point on the scale. What this scale lacks is a fixed zero point which means that, for example, 10°C is not double 5°C. In this case zero has only been set at a point where water freezes and not where there is a complete lack of heat. Examples of interval levels of measurement can be seen in visual analogue and interval scales for the measurement of attitudes (Oppenheim 1992).
The highest or most precise measurement is at a ratio level. This has all the properties associated with the interval level only this time zero is fixed. Examples of characteristics which can be measured at this level include weight, length and capacity. These scores can be added, subtracted, multiplied and divided.
Another important characteristic of a variable is whether or not it is ‘discrete’ or ‘continuous’ in nature. The so-called discrete variable can only be expressed as a whole number, for example the number of nurses or the number of beds on a ward. On the other hand, interval and ratio levels of measurement can be expressed as a fraction and in such cases data are referred to as continuous. For example, patients are weighed in kilograms expressed with one or two decimal places following the whole number, e.g. 78.63 kilos. Where measurements are continuous, the last decimal point has to be rounded off.
Numerical data
Once data from a quantitative study are obtained, descriptive methods are applied as a first stage of interpreting their meaning and obtaining answers to the research questions. Imagine we have conducted a survey of 80 people recently discharged from hospital. These people are our study ‘sample’ and have been selected from the study ‘population’ of all people discharged from the hospital. The methods of sample selection are very important but these cannot be fully discussed in this chapter and readers are advised to consult alternative texts, for example Bland (2000).
From our sample of 80 we have collected information that among other things, includes their gender, length of inpatient stay (in days), and route of admission (i.e. waiting list, emergency, transfer, outpatient department (OPD) referral and GP referral). These three recordings from each patient are referred to as variables. Examination of these variables shows that gender is recorded at a nominal level, length of stay is recorded at a ratio level and route of admission is also at a nominal level. Length of stay is recorded in actual number of days but gender is coded as 1 = man and 2 = woman. The route of admission is also coded into a number format where: 1 = waiting list, 2 = emergency, 3 = transfer, 4 = OPD referral, and 5 = GP referral.
Table 34.2 shows the data that were obtained on these variables from 80 discharged patients. It can be seen that four columns have been allocated to hold these data. The unshaded columns contain unique case identity numbers which should be linked to every person’s information for ease of data management. The second columns contain information on gender and one labelled ‘Sex’. The columns headed ‘LoS’ (length of stay) contain the number of days the person was in hospital. The fourth columns labelled ‘Route’ contain a code for the route by which each patient was admitted. Now we can see, for example, that case number 22 was a man, in hospital for 20 days after having been admitted to hospital as an emergency. The information held in Table 34.2 may be very useful but in this format visual examination tells us very little.
Case No. | Sex | LoS | Route | Case No. | Sex | LoS | Route | Case No. | Sex | LoS | Route | Case No. | Sex | LoS | Route |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 1 | 21 | 2 | 6 | 2 | 41 | 2 | 3 | 1 | 61 | 1 | 1 | 1 |
2 | 1 | 10 | 1 | 22 | 1 | 20 | 2 | 42 | 2 | 4 | 5 | 62 | 2 | 4 | 4 |
3 | 2 | 5 | 1 | 23 | 2 | 11 | 1 | 43 | 1 | 4 | 4 | 63 | 2 | 4 | 2 |
4 | 2 | 6 | 2 | 24 | 1 | 6 | 1 | 44 | 1 | 6 | 1 | 64 | 1 | 19 | 2 |
5 | 2 | 5 | 1 | 25 | 1 | 5 | 4 | 45 | 1 | 1 | 1 | 65 | 2 | 2 | 1 |
6 | 1 | 4 | 1 | 26 | 2 | 3 | 1 | 46 | 2 | 4 | 1 | 66 | 2 | 5 | 1 |
7 | 1 | 5 | 2 | 27 | 2 | 28 | 1 | 47 | 2 | 4 | 1 | 67 | 2 | 3 | 1 |
8 | 1 | 13 | 2 | 28 | 1 | 3 | 2 | 48 | 2 | 4 | 1 | 68 | 2 | 5 | 2 |
9 | 2 | 6 | 1 | 29 | 1 | 3 | 5 | 49 | 1 | 1 | 5 | 69 | 1 | 4 | 2 |
10 | 2 | 3 | 1 | 30 | 1 | 5 | 3 | 50 | 1 | 5 | 1 | 70 | 1 | 4 | 5 |
11 | 2 | 6 | 1 | 31 | 2 | 19 | 1 | 51 | 2 | 3 | 2 | 71 | 2 | 31 | 3 |
12 | 1 | 5 | 3 | 32 | 2 | 12 | 2 | 52 | 1 | 5 | 1 | 72 | 2 | 7 | 1 |
13 | 2 | 5 | 2 | 33 | 2 | 6 | 3 | 53 | 2 | 6 | 1 | 73 | 2 | 20 | 5 |
14 | 1 | 4 | 1 | 34 | 2 | 4 | 5 | 54 | 1 | 3 | 5 | 74 | 2 | 5 | 2 |
15 | 2 | 16 | 4 | 35 | 2 | 14 | 2 | 55 | 2 | 10 | 5 | 75 | 1 | 2 | 2 |
16 | 2 | 11 | 1 | 36 | 1 | 4 | 2 | 56 | 2 | 3 | 1 | 76 | 1 | 4 | 1 |
17 | 1 | 3 | 2 | 37 | 2 | 6 | 1 | 57 | 1 | 3 | 5 | 77 | 2 | 7 | 1 |
18 | 1 | 9 | 2 | 38 | 1 | 4 | 2 | 58 | 1 | 2 | 2 | 78 | 2 | 4 | 3 |
19 | 2 | 4 | 4 | 39 | 2 | 4 | 5 | 59 | 2 | 12 | 2 | 79 | 1 | 4 | 1 |
20 | 1 | 3 | 1 | 40 | 2 | 13 | 3 | 60 | 1 | 4 | 1 | 80 | 2 | 5 | 2 |
To gain further insight the frequency of occurrence of the different values must be ordered and grouped. In this chapter, the modes of data presentation are only described rather than detailing the different procedures which must be gone through to construct them. These procedures are most often performed by specialist computer software, for example the ‘Statistical Program for the Social Sciences’ (SPSS). Those readers who wish to achieve a fuller understanding of the methods involved are referred to Watson et al (2006).
Presentation of numerical data – tables
The most common way of presenting data in research papers is in the form of tables. The method allows data from a number of different variables to be presented in an easily understood format. Tables can also illustrate associations between the variables presented. Many formats can be used but certain features should always be present.