To provide a visual representation of the relationship between the two variables, we can plot the above data on a scattergram (also referred to as a scatterplot). A scattergram is a graph of the paired scores for each participant on the two variables. By convention, we call one of the variables x and the other one y. It is evident from Figure 18.1 that there is a positive relationship between the two variables. That is, students who have high scores for anatomy (variable X) tend to have high scores for physiology (variable Y). Also, for this set of data, we can fit a straight line in close approximation to the points on the scattergram. This line is referred to as a line of ‘best fit’. This topic is discussed further in statistics under ‘linear regression’. In general, a variety of relationships is possible between two variables; the scattergrams in Figure 18.2 illustrate some of these.

Figure 18.1 Scattergram of students’ scores in two examinations.

Figure 18.2 Scattergrams showing relationships between two variables: (A) positive linear correlation; (B) negative linear correlation; (C) non-linear correlation.

Figure 18.2A and B represent a linear correlation between the variables x and y. That is, a straight line is the most appropriate representation of the relationship between x and y. Figure 18.2C represents a non-linear correlation, where a curve best represents the relationship between x and y.

Figure 18.2A represents a positive correlation, indicating that high scores on x are related to high scores on y. For example, the relationship between cigarette smoking and lung damage is a positive correlation. Figure 18.2B represents a negative correlation, where high scores on x are associated with low scores on y. For example, the correlation between the variables ‘being overweight’ and ‘life expectancy’ is negative, meaning that the more you are overweight, the lower your life expectancy.

Correlation coefficients

When we need to know or express the numerical value of the correlation between x and y, we calculate a statistic called the correlation coefficient. The correlation coefficient expresses quantitatively the magnitude and direction of the correlation between the two variables.

Selection of correlation coefficients

There are several types of correlation coefficients used in statistical analysis. Table 18.2 shows some of these correlation coefficients, and the conditions under which they are used. As the table indicates, the scale of measurements used determines the selection of the appropriate correlation coefficient.

Table 18.2

Correlation coefficient

Coefficient	Conditions where appropriate
φ (phi)	Both x and y measures on a nominal scale
ρ (rho)	Both x and y measures on, or transformed to, ordinal scales
r	Both x and y measures on an interval or ratio scale

All of the correlation coefficients shown in Table 18.2 are appropriate for quantifying linear relationships between variables. There are other correlation coefficients, such as η (eta) which are used for quantifying non-linear relationships. However, the discussion of the use and calculation of all the correlation coefficients is beyond the scope of this text. Rather, we will examine only the commonly used Pearson’s r, and Spearman’s ρ (rho).

Regardless of which correlation coefficient we employ, these statistics share the following characteristics: