带有机器学习管道的系统化文学评论 (Scaling Systematic Literature Reviews with Machine Learning Pipelines)

Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very time-consuming and require experts. Yet the three main stages of a systematic review are easily done automatically: searching for documents can be done via APIs and scrapers, selection of relevant documents can be done via binary classification, and extraction of data can be done via sequence-labelling classification. Despite the promise of automation for this field, little research exists that examines the various ways to automate each of these tasks. We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs. We test the ability of classifiers to work well on small amounts of data and to generalise to data from countries not represented in the training data. We test different types of data extraction with varying difficulty in annotation, and five different neural architectures to do the extraction. We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation, which is only 15% of the time it takes to do the whole review manually and can be repeated and extended to new data with no additional effort.

翻译：系统审查需要从大量科学文件中提取数据,这是应用机器学习的理想途径,对许多科学和慈善领域至关重要,但耗时费时,需要专家。然而,系统审查的三个主要阶段可以自动地进行:通过API和剪切机搜索文件,通过二进制分类选择相关文件,通过序列标签分类提取数据。尽管这一领域自动化的前景大,但几乎没有研究来研究实现每项任务自动化的各种方法。我们建造了一条管道,将所有这些方面自动连接起来,并试验了许多人的时间相对于系统质量的权衡。我们测试了分类人员对少量数据进行良好工作的能力,并概括了培训数据中未包含的国家的数据。我们测试了不同种类的数据提取方法,在批注和五个不同的神经结构中进行提取。我们发现,我们对整个管道系统的精度和广度都有惊人的研究。我们只能用2周的时间来对所有这些方面进行自动连接,并试验许多人的时间相对于系统质量的权衡。我们测试了分类人员分类人员对少量数据进行工作的能力,并概括了培训数据。我们测试了不同种类的数据提取方法的难度。我们测试了五个不同的神经结构结构结构。我们发现,整个管道系统的精确和普及的数据只能进行15周的重复审查。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日