Rubbish reliability results - what next?

T

Hi there

Someone kindly rated 15% of my data (videos) using a global rating scale. I rated 100% of the data using the same rating scale. When I came to calculate inter-rater reliability using Cohens Kappa, it is low.

Obviously I don't want to re-rate all the videos again. Is it OK for me to retrain the second coder and get her to code her 15% again and then re-calculate reliability?

Would this be the standard procedure?

S

Yes you can ask your second coder to recode. How did u base your calculations, was it for the 15% only? I had two second coders who looked at all mydata and then I removed outliers and got higher kappa values. Maybe someone else can give better suggestions

T

Hi Satchi

Yes - it was based on 15% only. I think it might be that the training I gave wasn't thorough enough (we only joint rated one video and because it was fine we stopped there when perhaps I should have made us do a few more together). Plus she only actually did the ratings several months after I gave her the training. So I guess she may have forgotten the training a little (although the rating scale was quite self explanatory).

I'm a bit grumbly about it because I've nearly finished writing the paper and one of the key findings relates to that measure, so it throws the whole thing into jeopardy if we can't get good reliability. But I'll see what my supervisor says (she was the one who rated the 15%...)

Thanks for the helpful response.

T

Quote From Tudor_Queen:
Hi there

Someone kindly rated 15% of my data (videos) using a global rating scale. I rated 100% of the data using the same rating scale. When I came to calculate inter-rater reliability using Cohens Kappa, it is low.

Obviously I don't want to re-rate all the videos again. Is it OK for me to retrain the second coder and get her to code her 15% again and then re-calculate reliability?

Would this be the standard procedure?


For various, Cohen's kappa may not be the best approach to use.

Have a look at Bland-Altman plots!

T

I think Cohen's Kappa is appropriate for my particular rating scale, but thanks - I've had a look and will store this for other possible future ventures (I love methodology!).

T

The funny thing is (to a research methods obsessed person) most research methods books and papers do not tell you what to do if you get rubbish inter-rater reliability on some coding or ratings. They might tell you what to do if you've designed the scale yourself and are piloting it. But if it's one you've taken from the literature (that has already managed to demonstrate good validity and reliability elsewhere), what is the best, least-biased thing to do?

I am convinced (and full of bias - but also I think I know the data better!) that I have rated the videos alright... I got myself to a degree of INTRA-rater reliability before I trained the second coder. But the second rater obviously has something different in her head when she rates the videos... (just like when she marks undergraduate students' work - very very conservative and not wanting to give the top ratings even when it meets the criteria). Grr, I am so annoyed!

She's away by the way which is why I am venting here and dithering about what to do. Hopefully she will agree to be re-trained and re-code her 15% until reliability is obtained.

T

Are you sure the 'problem' isn't you?

It sounds like you just want the second person to be trained until they agree with you!

T

EXACTLY!!!! Which is why I am saying - hey what is the standard procedure here?

49913