Many tools have been proposed for assessing the quality of studies for use in the context of a systematic review and elsewhere. Most tools are scales, in which various components of quality are scored and combined to give a summary score; or checklists, in which specific questions are asked (Jüni 2001).
In 1995, Moher and colleagues identified 25 scales and 9 checklists that had been used to assess the validity or ‘quality’ of randomized trials (Moher 1995, Moher 1996). These scales and checklists included between 3 and 57 items and were found to take from 10 to 45 minutes to complete for each study. Almost all of the items in the instruments were based on suggested or ‘generally accepted’ criteria that are mentioned in clinical trial textbooks. Many instruments also contained items that were not directly related to internal validity, such as whether a power calculation was done (an item that relates more to the precision of the results) or whether the inclusion and exclusion criteria were clearly described (an item that relates more to applicability than validity). Scales were more likely than checklists to include criteria that do not directly relate to internal validity.
The Collaboration’s recommended tool for assessing risk of bias is neither a scale nor a checklist. It is a domain-based evaluation, in which critical assessments are made separately for different domains, described in Section 8.5. It was developed between 2005 and 2007 by a working group of methodologists, editors and review authors. Because it is impossible to know the extent of bias (or even the true risk of bias) in a given study, the possibility of validating any proposed tool is limited. The most realistic assessment of the validity of a study may involve subjectivity: for example an assessment of whether lack of blinding of patients might plausibly have affected recurrence of a serious condition such as cancer.