More exactly, k is defined as:
ObservedAgreement - ExpectedAgreement
k = -------------------------------------
1 - ExpectedAgreement
The coefficient is equal to 0 when there is no more agreement than chance,
it is equal to 1 when there is perfect agreement.
In the examples above, k would be respectively 94% and 50%.
* Original article:
Cohen, J. (1960). A coefficient of agreement for nominal scales.
Educational and Psychological Measurement, 20, 37-46.
* Extension to more than 2 annotators:
Davies, M., Fleis, J. L. (1982). measuring agreement for multinomial data.
Biometrics, 38, 1047-1051.
* Extension for partial agreement:
Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision
for scaled disagreement or partial credit. Psychological Bulletin, (70)4,
213-220.
* Recent articles using k in CL:
Bruce, R., Wiebe, J. (1998). Word sense distinguishability and inter-coder
agreement. Proceedings of the 3rd Conference on Empirical Methods in
Natural Language Processing (EMNLP-98). Association for Computational
Linguistics SIGDAT, Granada, Spain, June 1998.
Carletta, J. (1996). Assessing agreement on classification tasks: the kappa
statistics. Computational Linguistics, 22(2), 249-254.
Véronis, J. (1998a). A study of polysemy judgements and inter-annotator
agreement. Programme and advanced papers of the Senseval workshop, 2-4
September 1998. Herstmonceux Castle, England.
[http://www.up.univ-mrs.fr/~veronis/pdf/1998senseval.pdf]