Transcription Key

Transcripts of audio/video data are used in Ethnomethodology/Conversation Analysis (EM/CA) to create a persistent record of the action so that anyone may judge the analysis and draw their own conclusions. Gail Jefferson’s (2004) transcription system [PDF] is widely used in EM/CA research. A discussion of transcription issues and a comprehensive key of the Jeffersonian transcription system is available from Jonathan Potter’s transcription page.

Key to Transcription Symbols

Symbol
Example
Meaning
Word
ord
No way
No way
Speaker emphasizes the underlined portion of the word or entire word.
?
Huh?
Questioning intonation.
:
No wa:y
Elongated sound (usually a vowel).
­ ? or ?
­sil?ly
Rising or falling pitch.
-
elb-
Word is cut off.
.
no way . Did he
Micropause (less than 0.5 seconds long) (this is usually transcribed as (.))
(n.n)
he did. (1.5) I think.
A timed pause/gap (in seconds).
=
1. EVA: No way=
2. HAL: =way
Latching: one word (or turn) occurs directly after the other with no gap but also no overlap.
[]
1. KAY: own-[  ]
2. DES:     [I-] I went
Speakers overlap.
+
1. KAY: own-[  ]+
2. DES:     [I-]
3. KAY:         +why
A turn continues after overlap from the other speaker (not a new turn).
( )
(Laughs)
Described (vs transcribed) sound (usually verbal)
@ @
@Adjusts webcam@
@Points@
Physical action that is seen
@[] @
@Holds ear[]@ Hello?
Physical action performed simultaneously with prior or following speech
/ /
/Admonishingly/ Was your volume down?
Transcriber comment on emotional tone of a turn (often describes intonation)
° °
°hello°
Quiet/whispering
{ }
{elbow}
There are tiny hearable electronic noise artifacts during word.
*
elb*owelbow*
Elbo*
*lbow
e*bow
There are one or more very obvious hearable electronic noise artifacts during or after a word; OR a very obvious hearable electronic noise artifact cuts off one phoneme of a word.
{___(n.n)}
The {___(0.5)}cake
She {_______________(2.5)}
One or more words are ‘hearably missing’ with electronic noise indicating the absence. Timing is indicated by three underscores representing 0.5 seconds (additively to indicate more time) plus a numeric time.
{_*_(n.n)}
Her {_*_(0.5)} hurthe {_*__*__*_(1.5)}
{*e* aft* your p*arty}
One or more words are distorted. If syllables can be made out, they are included. If syllables can not be made out, timing is indicated by three underscores representing 0.5 seconds plus a numeric time.
{FREEZES {___(n.0}}
{UNFREEZES}
{FREEZES ____________{2.0}}
{UNFREEZES}
The video has frozen. Numeric time comes after underscores. Three underscores represent 0.5 seconds. More time is indicated by more underscores in 0.5 second increments.

Transcription example

This is an example of the PV project transcription. It is a modified version of an excerpt from a real conversation (edited to include all of the symbols).

Transcript Key1 Example

Transcript Key 1 Example (click to see full size)

Transcription decisions

Transcripts of audio/video data are used in Ethnomethodology/Conversation Analysis (EM/CA) to create a persistent record of the action so that anyone may judge the analysis and draw their own conclusions. Gail Jefferson’s (2004) transcription system is now fairly systematically used in EM/CA research. A discussion and very full key of the Jeffersonian transcription system is available from Jonathan Potter’s transcription page.

The table below is a key to a simplified version of the Jefferson system plus some new transcription symbols to indicate various forms of human action, combined human-computer action, and perturbations to human action by personal videoconferencing technology. Along with some new symbols, three choices were made for this project’s transcription of visual, verbal, and technological action.

First, this dissertation adapts the principle used by many EM/CA researchers of visually indicating the duration of non-talk events (e.g. Goodwin, 1981; Neville, 2004). While there are a multitude of such events, for this project the duration of severe network trouble was the most important non-talk event which affected interaction. To indicate the duration of network trouble in project transcriptions, the numerical time measurement was shown along with underscores to textually represent the time. Three underscores represented 0.5 seconds. More time could be added by indicating more underscores in 0.5 second increments.

Second, rather than separating verbal and visual action (e.g. Greiffenhagen & Watson, 2009), this dissertation transcribes visual, verbal, and technological events on the same line. While this makes it slightly harder to see when one participant’s visual action overlaps their own verbal action, it make it easier to see when one participant’s action of any kind overlaps with that of the co-participant.

Third, certain sounds that would normally be phonetically transcribed, such as laugher, in/out-breaths were not phonetically transcribed because this analysis did not require precision for those actions.

Similarly, fourth, participant actions that had sounds, such as yawning, burping, sighing, and also certain vocal intonations (e.g. bored, sing-song), were described rather than transcribed.

Fifth, as a rule, I did not use visual stills alongside the typographic transcripts. I only provided visual stills when they provided essential evidence for an argument. This was partially to reduce complexity of transcripts, partially to get over the problem that usually stills themselves did not do the visual features justice, but also because, as it turned out, one of the interesting findings of that project was that the participants did not treat the visual elements of videoconferencing as especially meaningful. Ideally we will eventually be able to easily incorporate video into project reports, perhaps with overlays or other transcription mechanisms that can make the essential features to be seen by readers very clear to those readers.

References

Follow

Get every new post delivered to your Inbox.

Join 28 other followers