To examine the reproducibility of COVID-19 research, we create a dataset of pre-prints posted to arXiv, bioRxiv, medRxiv, and SocArXiv between 28 January 2020 and 30 June 2021 that are related to COVID-19. We extract the text from these pre-prints and parse them looking for keyword markers signalling the availability of the data and code underpinning the pre-print. For the pre-prints that are in our sample, we are unable to find markers of either open data or open code for 75 per cent of those on arXiv, 67 per cent of those on bioRxiv, 79 per cent of those on medRxiv, and 85 per cent of those on SocArXiv. We conclude that there may be value in having authors categorize the degree of openness of their pre-print as part of the pre-print submissions process, and more broadly, there is a need to better integrate open science training into a wide range of fields.
翻译:为了审查COVID-19研究的可复制性,我们制作了一套数据,包括2020年1月28日至2021年6月30日期间张贴在ARXiv、BioRxiv、MedRxiv和SocArXiv的与COVID-19有关的预印本。我们从这些预印本中提取文本,并分析它们寻找关键词标记,以显示预印背后的数据和代码的可用性。对于我们样本中的预印本,我们无法找到75%的ArXiv、BioRxiv、MedRxiv和SocArXiv的开放数据或开放代码的标记,67%的预印本,79%的预印本,85%的预印本;我们的结论是,让作者将其预印的开放程度分类为预印件提交过程的一部分,或许有价值,更广泛地说,需要将开放科学培训更好地纳入广泛的领域。