COVID-19文献文献分类文件 (Document Classification for COVID-19 Literature)

The global pandemic has made it more important than ever to quickly and accurately retrieve relevant scientific literature for effective consumption by researchers in a wide range of fields. We provide an analysis of several multi-label document classification models on the LitCovid dataset, a growing collection of 23,000 research papers regarding the novel 2019 coronavirus. We find that pre-trained language models fine-tuned on this dataset outperform all other baselines and that BioBERT surpasses the others by a small margin with micro-F1 and accuracy scores of around 86% and 75% respectively on the test set. We evaluate the data efficiency and generalizability of these models as essential features of any system prepared to deal with an urgent situation like the current health crisis. Finally, we explore 50 errors made by the best performing models on LitCovid documents and find that they often (1) correlate certain labels too closely together and (2) fail to focus on discriminative sections of the articles; both of which are important issues to address in future work. Both data and code are available on GitHub.

翻译：全球流行病使迅速和准确地检索相关科学文献,供研究人员在广泛领域有效使用比以往任何时候都更加重要。我们分析了LitCovid数据集的若干多标签文件分类模型,该数据集收集了23 000份关于新奇2019年科罗纳病毒的研究文件。我们发现,对这一数据进行微调的预先培训的语言模型比其他所有基线都高,生物生物-生物-生物-生物-生物-生物-生物-生物伦理学模型比其他模型高出很小的差幅,微型-F1和测试集的精确分分别为86%和75%左右。我们评估了这些模型的数据效率和可概括性,作为任何系统的基本特征,准备应对当前健康危机等紧急情况。最后,我们探索了在LitCovid文件上最优秀表现的模型造成的50个错误,发现这些错误往往(1) 将某些标签过于紧密地联系在一起,(2) 未能侧重于文章中的歧视性部分;两者都是今后工作中需要解决的重要问题。两种数据和代码都放在GitHub上。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

近期必读的五篇顶会 ACL 2020【图神经网络 (GNN) 】相关论文

专知会员服务

105+阅读 · 2020年6月9日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日