The process of data analysis
Qualitative data analysis (QDA) is a complex, non-linear process but also systematic, orderly and structured. Not all qualitative forms of inquiry take the same approach to analysis, as can be seen in the chapters on specific approaches. Indeed, grounded theory (GT) and phenomenology in particular have very distinct ways of analysing data (and also a different approach to data collection). Data reduction or collapsing, description and/or interpretation, however, are common to many types of QDA although the approach to these procedures is flexible and creative. There is no rigid prescription as long as the eventual research account has its roots directly in the data generated by the participants. In this chapter we shall only attempt an overview of generic data analysis in qualitative research. For complete beginners an examination of relevant chapters in clearly written introductory texts might be useful, such as that of Hansen (2006).
Data analysis is an iterative activity. Iteration means that researchers move back and forth from collection to analysis and back again, refining the questions they ask from the data. Knowledge of this process means that researchers will be able to allocate and segment their time appropriately. Health researchers often lack time at the end of their study to carry out the appropriate data analysis, because they do not foresee the complexity of the data and the length of time needed for analysing them. The iterative character of qualitative research also makes it more time consuming.
Qualitative researchers usually collect and analyse the data simultaneously, unlike those involved in quantitative inquiry who complete collection before starting analysis. Indeed in GT, data collection and analysis interact (see Chapter 11), and in several other approaches researchers often use data collection and analysis in parallel and interactively (for instance in ethnography). Even when recording and transcribing initial data, researchers reflect upon them and so start the process of analysis at an early stage.
The process of analysis goes through certain stages common to many approaches:
- Transcribing interviews and sorting fieldnotes
- Organising, ordering and storing the data
- Listening to and reading or viewing the material collected repeatedly
All this means immersion in and engagement with the data.
Other stages depend on the approach taken by the qualitative researcher:
- Coding and categorising (this is particularly appropriate in interpretive methods)
- Building themes
- Describing a cultural group (in ethnography)
- Describing a phenomenon (this is appropriate in phenomenology)
These steps also involve storing ideas, interpretations and theoretical thought which is carried out through memoing and writing fieldnotes (see Chapters 10 and 11 on fieldnotes and memos respectively)
Silverman (2006) discusses the status of interview data in particular, which must be taken into account before the process of data analysis can start. These data are rarely raw but have been processed through the mind of the interviewer and can only be seen in context. In observations too, fieldnotes do not always show how the environment might shape the interaction, in particular elements such as the presence or absence of certain people, the work climate and other factors.
Transcribing and sorting
Transcription of interviews is one of the initial steps in preparing the data for analysis. The fullest and richest data can be gained from transcribing verbatim. We advise that if possible, novice researchers transcribe their own tapes because this way they immerse themselves in the data and become sensitive to the issues of importance. Transcription takes a long time: one hour of interviewing takes between four and six hours to transcribe. For those who are not used to audio-typing, it can be much longer. Transcription is very frustrating and can take time that researchers often lack. A typist using a transcription machine could do it more quickly, but this would be expensive. On the other hand, it would give more time to the researcher to listen and analyse. The decision about this depends on the researcher. Any outsider who transcribes must, of course, be advised on the confidentiality relating to the data.
Initial interviews and fieldnotes should be fully transcribed so that the researcher becomes aware of the important issues in the data. Novice researchers should transcribe all interviews verbatim, while more experienced individuals can be more selective in their transcriptions and transcribe that which is linked to their developing theoretical ideas. It is always better that the interviews or fieldnotes are fully transcribed by the researchers themselves if they have the time. There is danger that researchers who fail to record the interviews will overlook significant issues, which they would uncover on reflection when listening to the tape or considering the transcript. Pages are numbered, and the front sheet should contain date, location and time of interview as well as the code number or pseudonym for the informant and important biographical data (but no identifier). Many researchers number each line of the interview transcript so that they can retrieve the data quickly when revisiting the transcript. Transcription pages are most useful when put into a column which takes half the sheet while the other half is left for coding and comments.
A minimum of three copies (usually more) should be made of the transcripts and a clean copy without comments for locking away in a safe place in case other copies are lost or destroyed.
Occasionally researchers use formal transcription systems (some invent their own systems); the best known of these is Gail Jefferson’s which uses symbols for non-verbal actions such as coughing, pausing, emphasising. These systems are more often applied to ‘naturally occurring data’ such as those from conversation or discourse analysis. However, for some approaches the type of transcribing Jefferson developed would be an anathema; Langridge (2007) reminds researchers that phenomenology in particular does not need a micro-level of transcription, and we would suggest the inappropriateness of this for ethnography and GT too. Silverman (2006) gives a list of simplified transcription symbols which could be helpful in conversation analysis and some forms of discourse analysis.
Of course, researchers transcribe in detail, and as accurately as possible, often more than they analyse as they choose sections from the data which answer their research questions. There is however the danger that they select according to their own assumptions about the importance of data rather than focusing on the participants’ words, hence careful reading and listening is advised.
Taking notes and writing analytic memos
Some researchers use the tape-recorder and also take notes during the interview so that participants’ facial expression, gestures and interviewers’ reactions and comments can be recorded. Making notes might disturb the participant. We would suggest this only when taping is not feasible or if interviewees do not wish to be tape-recorded. Notes can also be taken immediately after the interview.
When participants deny permission for recording or when it seems inappropriate – for instance in very sensitive situations – interviewers generally take notes throughout the interview, and these notes reflect the words of the participants as accurately as possible. As interviewers can only write down a fraction of the sentences, they select the most important words or phrases and summarise the rest, and this might distort meaning. Patton (2002) advises on conventions in the use of quotation marks while writing notes. Researchers use them only for full, direct quotations from informants. Patton suggests that researchers adopt a mechanism for differentiating between their own thoughts and informants’ words. When reading transcripts and writing memos, researchers should also collect a series of pithy quotes, which are representative of the thoughts of the participants and the phenomenon or phenomena under study.
Another method of recording is to take notes after the interview is finished. This should be done as soon as possible after the interview to capture the flavour, behaviour and words of the informants and the concomitant thoughts of the researcher. It should not be done in the presence of the participants.
The process of listening to the tapes will sensitise researchers to the data and uncover ambiguities or problems within them. At this time, any theoretical or other ideas that emerge should be written down in the field diary. The process of writing fieldnotes and memos is in itself an analytic process and not just data recording. It helps the researcher to reflect on the data and engage with them.
During the process of analysis researchers write analytical memos or notes containing ideas and thoughts about the data as well the reasons for grouping them in a particular way. Sometimes researchers draw diagrams to demonstrate this, and these diagrams can be taken directly into the report when they discuss the methods and the decision trail. Researchers might develop concepts in the memos, ask analytic questions of the data, or elaborate ideas from the literature that link directly with the data. There are different ways of keeping memos: in field journals or diaries, or on a computer. This all helps ‘tacking’, that is, going back and forth between the data and theoretical ideas, between codes and themes. This is called ‘iteration’.
Some researchers do not code or categorise because they wish to perceive the essence of the phenomenon as a whole, a Gestalt. Breaking the data into codes may lose this holistic view of the phenomenon and fragment the ideas contained in the data. Memoing goes on throughout the research process but is of particular importance in assisting analysis. (Specific types of analysis are discussed in the chapters on the various approaches.)
Ordering and organising the data
Qualitative researchers generate large amounts of data consisting of narratives from interviews, fieldnotes and documents, as well as a variety of memos about the phenomenon under study (Bryman, 2008). Many use the literature linked to the research as data.
Through organisation and management, the researcher brings structure and order to the unwieldy mass of data. This will help eventual retrieval and final analysis. All transcripts, fieldnotes and other data should have details of time, location and specific comments attached. The use of pseudonyms or numbers for participants prevents identification during the long process of analysis when the data might fall into the hands of individuals other than the researcher. Everything has to be recorded, cross-checked and labelled. Then the material has to be stored in the appropriate files for later retrieval.
From the very beginning of the study, nurses and other health professionals will recognise significant ideas and themes in the material they generated. On listening to tapes, reading transcriptions and other documents or looking at visual data common themes and patterns will begin to emerge and become crystallised.
Borkan (1999) discusses the initial process of analysis and describes two strategies from which researchers can choose depending on their approach, namely horizontal and vertical ‘passes’ of the data. The horizontal pass involves
- reading the data and looking at themes, emotions and surprises, taking in the overall picture;
- reflective and in-depth reading of the data to find supporting evidence for these themes;
- re-reading for elements that might have been overlooked;
- searching for possible alternative meanings;
- attempting to link discrepancies together.
Vertical passes involve
- concentrating on one section of the data and analysing it before moving on;
- reflecting on and reviewing the data in the section;
- looking for insights and feeding them back into the data collection process.
The horizontal is more holistic than the vertical pass. However, researchers not only analyse according to the methods they adopt, but they also have different personal styles, which demand different ways of looking at the data.
Analytical styles
Different approaches to research have different types of data analysis. Even within one approach, researchers adopt a variety of analyses. Phenomenologists, ethnographers or grounded theorists for instance, use a variety of analytic strategies. They all involve the steps of listening to, viewing and gaining a holistic view of the data as well as dividing them into units or segments of meaning. Dahlberg et al. (2008) ask that each part of the transcribed text, analysed for meaning, should be understood in relation to the whole of the text and the whole understood in terms of its parts.
Moustakas (1994), a hermeneutic phenomenologist, gives a general overview of analysis styles and comes up with overlapping steps in which researchers carry out the following:
- They reflect on each transcript and search for significant statements.
- They record all relevant statements.
- They delete repetitive and overlapping statements, leaving only invariant constituents of the phenomenon, and organise them.
- They link and relate these into themes.
- Including verbatim quotes from the data, they integrate the themes into a description of the texture of the experience as told by the participants.
- They reflect on this and their own experiences.
- They develop a description of the meanings of the experience.
At all times, researchers search for links and relationships between sections of data, categories or themes.
There are more detailed discussions of analytic procedures in the chapters on specific approaches.
Coding and categorising
Coding means marking sections of data and giving them labels or names. It is an early stage in analysis and proceeds towards the development of categories, themes or major constructs (the nomenclature depends on the language of the specific approach). It breaks the data into manageable sections.
Line-by-line coding identifies information which both participant and researcher consider important. In their initial coding, many researchers single out words or phrases that are used by participants – these are called in vivo codes. This type of coding prevents researchers from imposing their own framework and ideas on the data, because the coding starts with the words of the participants.
Example of in vivo coding
A transcript might contain the sentence ‘I was really worried when I went to the doctor with my problem; it could have been serious’. The in vivo code might be: worried when going to the doctor. At a later stage, of course, this would have to be refined by the words of the researcher but still seen from the perspective of the participant. It might become: Fear of diagnosis. As the coding process goes on, the codes might become more abstract.