Preprints, versions of scientific manuscripts that precede peer review, are growing in popularity. They offer an opportunity to democratize and accelerate research, as they have no publication costs or a lengthy peer review process. Preprints are often later published in peer-reviewed venues, but these publications and the original preprints are frequently not linked in any way. To this end, we developed a tool, PreprintMatch, to find matches between preprints and their corresponding published papers, if they exist. This tool outperforms existing techniques to match preprints and papers, both on matching performance and speed. PreprintMatch was applied to search for matches between preprints (from bioRxiv and medRxiv), and PubMed. The preliminary nature of preprints offers a unique perspective into scientific projects at a relatively early stage, and with better matching between preprint and paper, we explored questions related to research inequity. We found that preprints from low income countries are published as peer-reviewed papers at a lower rate than high income countries (39.6\% and 61.1\%, respectively), and our data is consistent with previous work that cite a lack of resources, lack of stability, and policy choices to explain this discrepancy. Preprints from low income countries were also found to be published quicker (178 vs 203 days) and with less title, abstract, and author similarity to the published version compared to high income countries. Low income countries add more authors from the preprint to the published version than high income countries (0.42 authors vs 0.32, respectively), a practice that is significantly more frequent in China compared to similar countries. Finally, we find that some publishers publish work with authors from lower income countries more frequently than others. PreprintMatch is available at \url{https://github.com/PeterEckmann1/preprint-match}.
翻译:预印件、预印件、同行审议之前的科学手稿版本正在日益受到欢迎。它们提供了一个民主化和加速研究的机会,因为它们没有出版成本,也没有冗长的同行审议程序。预印件通常后来在同行审议地点发表,但这些出版物和原始预印件往往没有任何联系。为此目的,我们开发了一个工具,即预印件与相应的出版论文(如果存在的话)之间找到匹配。这个工具优于现有技术,以匹配预印和论文,既可以匹配业绩和速度。预印Match被用于寻找预印(从BioRxiv和 medRxiv)和PubMed之间的匹配。预印件的初步性质通常在同行审议地点为科学项目提供独特的视角,但这些出版物和原始预印(如果有预印),我们探讨了与相应的问题。我们发现,来自低收入国家的预印件以比高收入国家(39.6 ⁇ 和61.1 ⁇ )更低的预印文件比高收入国家(比以往的预印版),我们的数据与前印国(比以往的预印数据更符合以往收入水平)的版本。