Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very time-consuming and require experts. Yet the three main stages of a systematic review are easily done automatically: searching for documents can be done via APIs and scrapers, selection of relevant documents can be done via binary classification, and extraction of data can be done via sequence-labelling classification. Despite the promise of automation for this field, little research exists that examines the various ways to automate each of these tasks. We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs. We test the ability of classifiers to work well on small amounts of data and to generalise to data from countries not represented in the training data. We test different types of data extraction with varying difficulty in annotation, and five different neural architectures to do the extraction. We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation, which is only 15% of the time it takes to do the whole review manually and can be repeated and extended to new data with no additional effort.
翻译:系统审查需要从大量科学文件中提取数据,这是应用机器学习的理想途径,对许多科学和慈善领域至关重要,但耗时费时,需要专家。然而,系统审查的三个主要阶段可以自动地进行:通过API和剪切机搜索文件,通过二进制分类选择相关文件,通过序列标签分类提取数据。尽管这一领域自动化的前景大,但几乎没有研究来研究实现每项任务自动化的各种方法。我们建造了一条管道,将所有这些方面自动连接起来,并试验了许多人的时间相对于系统质量的权衡。我们测试了分类人员对少量数据进行良好工作的能力,并概括了培训数据中未包含的国家的数据。我们测试了不同种类的数据提取方法,在批注和五个不同的神经结构中进行提取。我们发现,我们对整个管道系统的精度和广度都有惊人的研究。我们只能用2周的时间来对所有这些方面进行自动连接,并试验许多人的时间相对于系统质量的权衡。我们测试了分类人员分类人员对少量数据进行工作的能力,并概括了培训数据。我们测试了不同种类的数据提取方法的难度。我们测试了五个不同的神经结构结构结构。我们发现,整个管道系统的精确和普及的数据只能进行15周的重复审查。