In a 1979 article, “The ‘file drawer problem’ and tolerance for null results”, Rosenthal described a gloomy scenario where “the journals are filled with the 5% of the studies that show Type I errors, while the file drawers back at the lab are filled with the 95% of the studies that show non-significant (e.g. P > 0.05) results” (Rosenthal 1979). The file drawer problem has long been suspected in the social sciences: a review of psychology journals found that of 294 studies published in the 1950s, 97.3% rejected the null hypothesis at the 5% level (P < 0.05) (Sterling 1959). The study was updated and complemented with three other journals (New England Journal of Medicine, American Journal of Epidemiology, American Journal of Public Health) (Sterling 1995). Little had changed in the psychology journals (95.6% reported significant results) and a high proportion of statistically significant results (85.4%) was also found in the general medical and public health journals. Similar results have been reported in many different areas such as emergency medicine (Moscati 1994), alternative and complementary medicine (Vickers 1998, Pittler 2000) and acute stroke trials (Liebeskind 2006).
It is possible that studies suggesting a beneficial intervention effect or a larger effect size are published, while a similar amount of data pointing in the other direction remains unpublished. In this situation, a systematic review of the published studies could identify a spurious beneficial intervention effect, or miss an important adverse effect of an intervention. In cardiovascular medicine, investigators who, in 1980, found an increased death rate among patients with acute myocardial infarction treated with a class I anti-arrhythmic dismissed it as a chance finding and did not publish their trial at the time (Cowley 1993). Their findings would have contributed to a more timely detection of the increased mortality that has since become known to be associated with the use of class I anti-arrhythmic agents (Teo 1993, CLASP Collaborative Group 1994).
Studies empirically examining the existence of publication bias can be viewed in two categories: indirect and direct evidence. Surveys of published results, such as those described above, can provide only indirect evidence of publication bias, as the proportion of all hypotheses tested for which the null hypothesis is truly false is unknown. There is also substantial direct evidence of publication bias. Roberta Scherer and colleagues recently updated a systematic review which summarizes 79 studies describing subsequent full publication of research initially presented in abstract or short report form (Scherer 2007). The data from 45 studies that included data on time to publication are summarized in Figure 10.2.a. Only about half of abstracts presented at conferences were later published in full (63% for randomized trials), and subsequent publication was associated with positive results (Scherer 2007).
Additional direct evidence is available from a number of cohort studies of proposals submitted to ethics committees and institutional review boards (Easterbrook 1991, Dickersin 1992, Stern 1997, Decullier 2005, Decullier 2007), trials submitted to licensing authorities (Bardy 1998), analyses of trials registries (Simes 1987) or from cohorts of trials funded by specific funding agencies (Dickersin 1993). For each cohort of research proposals the principal investigators were contacted several years later to determine the publication status of each completed study. In all these studies publication was more likely if the intervention effects were large and statistically significant.
Hopewell et al. recently completed a methodology review of such studies, limited to those that considered clinical trials separately (Hopewell 2008). The percentages of full publication as journal articles in the five studies included in the review ranged from 36% to 94% (Table 10.2.a). Positive results were consistently more likely to have been published than negative results; the odds of publication were approximately four times greater if results were statistically significant (OR = 3.90, 95% CI 2.68 to 5.68) as shown in Figure 10.2.b. Other factors such as the study size, funding source, and academic rank and sex of primary investigator were not consistently associated with the probability of publication or were not possible to assess separately for clinical trials (Hopewell 2008).