21 Why was Zara annoyed at the start of the meeting? A The projector stopped working B The meeting room was changed at the last minute C The agenda was not shared in advance
22 Keon thinks their current labels are a problem because they are A too subjective B too rare to be useful C too similar to each other
23 What does Zara say improved after they tested the scheme on sample data? A They found more categories B They agreed more often C They worked faster
24 Keon believes examples in the guideline should be A short and realistic B focused on extreme cases C taken from published studies
25 Zara suggests limiting the scheme mainly to A match the tool’s features B reduce disagreement C satisfy the supervisor
26 Keon worries about team participation because A the task is too technical B the deadline is too flexible C some members are unpaid
Questions 27 and 28
Choose TWO letters, A–E.
Which TWO ideas do they mention but decide NOT to use?
A adding a new slice for training B colour coding inside the tool C adding a sarcasm label D changing the meeting length E replacing the annotation platform
Questions 29 and 30
Choose TWO letters, A–E.
Which TWO decisions do they make in the end?
A double-code a fixed sample every Friday B create a short calibration quiz for new coders C write a do not use list for confusing labels D remove one label completely E stop using the current tool
Keys
21 C 22 A 23 B 24 A 25 B 26 C 27 B 28 C 29 A 30 C
Transcripts
Part 3: You will hear two students discussing how to agree on a coding scheme for a research project.
Zara: I was annoyed at the start, not because anyone was late, but because there was no agenda shared. I opened the calendar invite and it just said meeting, nothing else. People arrived expecting totally different things. Keon: Same. I brought the sample comments, but I was the only one who did. And then we spent the first ten minutes arguing about what we were supposed to decide. Zara: Exactly. Still, once we got going, I think we made progress. Your point about the labels being subjective really landed. Keon: They are. The labels sound sensible, but two coders can read the same message and disagree because the label depends on interpretation. Like when someone is being polite but actually refusing. Is that negative, or is it neutral? Zara: And when we tested the scheme on the sample set, we agreed more often, which surprised me. I thought testing would expose chaos. Keon: It did expose problems, but it also showed patterns. We saw that disagreement clustered around a few labels, not everywhere. Zara: That is why I said the tool is not the issue. Some people keep blaming the annotation platform. It is fine. The real issue is our guidance. Keon: Guidance needs examples. Not long essays, just short and realistic examples, like the kind of sentence we actually have in the dataset. If we use dramatic edge cases, we will train people to see extremes. Zara: I agree. And we need a do not use list. We have labels that sound different but end up being used for the same thing, and then nobody knows what to pick. Keon: Also, definitions should include what a label is not. Like, this label is for complaints, but not for jokes, or not for requests. Zara: That is useful, but we also have to keep it manageable. I suggested limiting the scheme mainly to reduce disagreement. We could have twenty labels, but we would not use them consistently. Eight or ten well defined labels might be better. Keon: I am on board with fewer labels, but we need to protect participation. If some members are unpaid, they may drop out, and then we lose consistency. It is not just fairness. It affects the data. Zara: True. I did not think about that until you said it. If we end up with only the people who can afford to do extra hours, our sample of coders shrinks, and drift gets worse. Keon: Which brings us to drift. The supervisor mentioned coder drift like it is inevitable, but I think we can reduce it. Zara: We can, by double coding. We decided on a fixed sample every Friday, right? Keon: Yes, but it has to be fixed, not random, so we can compare week to week. Otherwise variation hides drift. Zara: Good point. We should keep the same slice of data as a reference, and then add a small new slice for training. Keon: Also, we need to communicate results. If we double code and never discuss it, nothing changes. Zara: Someone suggested colour coding inside the tool, but that feels cosmetic and it will not fix a vague definition. Keon: Also, we considered adding a sarcasm label, yet that would increase subjectivity. Zara: Better to tighten what we have and state clearly which labels not to use. Keon: Then new coders can learn faster. Zara: So, decisions. One, we write the do not use list for confusing labels. Two, we double code a fixed sample every Friday. Then we meet for fifteen minutes to compare. Keon: Agreed. And we should put those decisions in the shared document today, not next week, or everyone will invent their own version again. Zara: I will draft the do not use list tonight. Keon: I will edit the examples so they are short. Zara: Then we can send the agenda for the next meeting in advance.