Formal measures of agreement are available to describe the extent to which assessments by multiple authors were the same (Orwin 1994). We describe in Section 7.2.6.1 how a kappa statistic may be calculated for measuring agreement between two authors making simple inclusion/exclusion decisions. Values of kappa between 0.40 and 0.59 have been considered to reflect fair agreement, between 0.60 and 0.74 to reflect good agreement and 0.75 or more to reflect excellent agreement (Orwin 1994).
It is not recommended that kappa statistics are calculated as standard in Cochrane reviews, although they can reveal problems, especially in the early stages of piloting. Comparison of a value of kappa with arbitrary cut-points is unlikely to convey the real impact of any disagreements on the review. For example, disagreement about the eligibility of a large, well conducted, study will have more substantial implications for the review than disagreement about a small study with risks of bias. The reasons for any disagreement should be explored. They may reveal the need to revisit eligibility criteria or coding schemes for data collection, and any resulting changes should be reported.