Background: A significant barrier to conducting systematic reviews and meta-analysis is efficiently finding scientifically sound relevant articles. Typically, less than 1% of articles match this requirement which leads to a highly imbalanced task. Although feature-engineered and early neural networks models were studied for this task, there is an opportunity to improve the results. Methods: We framed the problem of filtering articles as a classification task, and trained and tested several ensemble architectures of SciBERT, a variant of BERT pre-trained on scientific articles, on a manually annotated dataset of about 50K articles from MEDLINE. Since scientifically sound articles are identified through a multi-step process we proposed a novel cascade ensemble analogous to the selection process. We compared the performance of the cascade ensemble with a single integrated model and other types of ensembles as well as with results from previous studies. Results: The cascade ensemble architecture achieved 0.7505 F measure, an impressive 49.1% error rate reduction, compared to a CNN model that was previously proposed and evaluated on a selected subset of the 50K articles. On the full dataset, the cascade ensemble achieved 0.7639 F measure, resulting in an error rate reduction of 19.7% compared to the best performance reported in a previous study that used the full dataset. Conclusion: Pre-trained contextual encoder neural networks (e.g. SciBERT) perform better than the models studied previously and manually created search filters in filtering for scientifically sound relevant articles. The superior performance achieved by the cascade ensemble is a significant result that generalizes beyond this task and the dataset, and is analogous to query optimization in IR and databases.
翻译:进行系统审查和元分析的一个重大障碍是高效率地找到科学上可靠的相关文章。 通常, 不到1%的物品符合这一要求, 导致高度不平衡的任务。 虽然为此任务研究了地貌工程和早期神经网络模型, 但仍有机会改进结果。 方法: 我们将过滤文章的问题设置为分类任务, 并培训和测试了SciBERT的多个混合结构, 这是BERT在科学文章方面经过预先培训的一种变式, 放在一个手动过滤器中, 由MEDLINE提供的大约50K类物品组成的经附加说明的升级数据集上。 由于科学上健全的文章是通过多步程序确定的, 从而导致产生与选择过程相似的新型级联连锁连锁器。 我们比较了级联锁的性能与单一的综合模型和其他类型的集合以及以往研究的结果。 结果: 串联结构实现了0. 0. 0. 0. 505 F, 与先前在50KINCLEE 文章的选定一组中提出并经过评估的CNNM 模型相比, 减少了49.19%的错误率。 在完整数据集中, 将完成一个更好的级级级连级连级连级连级连级连级连级连级连级连级连级连级联,, 将完成的S.