Researchers worldwide are seeking to repurpose existing drugs or discover new drugs to counter the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A promising source of candidates for such studies is molecules that have been reported in the scientific literature to be drug-like in the context of coronavirus research. We report here on a project that leverages both human and artificial intelligence to detect references to drug-like molecules in free text. We engage non-expert humans to create a corpus of labeled text, use this labeled corpus to train a named entity recognition model, and employ the trained model to extract 10912 drug-like molecules from the COVID-19 Open Research Dataset Challenge (CORD-19) corpus of 198875 papers. Performance analyses show that our automated extraction model can achieve performance on par with that of non-expert humans.
翻译:世界各地的研究人员正在寻求重新利用现有药物或发现新药物,以对付严重急性呼吸系统综合症冠状病毒2(SARS-COV-2)引起的疾病。这种研究的一个有希望的候选者来源是科学文献中报告在冠状病毒研究中类似药物的分子。我们在此报告一个项目,该项目利用人和人工智能,在自由文本中检测药物类分子的引用。我们请非专家人建立一套有标签的文字,利用这个有标签的体来培训一个有名的实体识别模型,并利用经过培训的模型从198875年COVID-19开放研究数据挑战(CORD-19)系列论文中提取10912个类似药物的分子。绩效分析表明,我们的自动提取模型可以达到与非专家人类相同的业绩。