Medical systematic reviews typically require assessing all the documents retrieved by a search. The reason is two-fold: the task aims for ``total recall''; and documents retrieved using Boolean search are an unordered set, and thus it is unclear how an assessor could examine only a subset. Screening prioritisation is the process of ranking the (unordered) set of retrieved documents, allowing assessors to begin the downstream processes of the systematic review creation earlier, leading to earlier completion of the review, or even avoiding screening documents ranked least relevant. Screening prioritisation requires highly effective ranking methods. Pre-trained language models are state-of-the-art on many IR tasks but have yet to be applied to systematic review screening prioritisation. In this paper, we apply several pre-trained language models to the systematic review document ranking task, both directly and fine-tuned. An empirical analysis compares how effective neural methods compare to traditional methods for this task. We also investigate different types of document representations for neural methods and their impact on ranking performance. Our results show that BERT-based rankers outperform the current state-of-the-art screening prioritisation methods. However, BERT rankers and existing methods can actually be complementary, and thus, further improvements may be achieved if used in conjunction.
翻译:医学系统审查通常要求评估搜索所检索的所有文件。原因有两个方面:“完全收回”的任务目标;使用布林搜索检索的文件没有顺序排列,因此不清楚评估员如何只对子集进行检查。筛选优先排序是将(未经排序的)已检索的成套文件排位的过程,使评估员能够提前开始系统审查创建的下游进程,导致提前完成审查,甚至避免筛选文件排位最低。筛选优先排序需要高度有效的排序方法。预先培训的语言模型对许多IR任务都是最先进的,但尚未应用于系统审查筛选优先排序。在本文件中,我们对系统审查文件的排位排列工作采用若干预先培训的语言模型,既直接又经过微调。实证分析比较了有效神经方法与这项工作的传统方法相比的程度。我们还调查了神经方法的不同类型的文件表述及其对排名绩效的影响。我们的结果显示,基于ERT的排位数在很多IR任务上是最先进的,但尚未用于系统审查预先筛选的排位顺序。但是,如果在实际采用前的排位方法,则可以进一步进行。