Two key obstacles in biomedical relation extraction (RE) are the scarcity of annotations and the prevalence of instances without explicitly pre-defined labels due to low annotation coverage. Existing approaches, which treat biomedical RE as a multi-class classification task, often result in poor generalization in low-resource settings and do not have the ability to make selective prediction on unknown cases but give a guess from seen relations, hindering the applicability of those approaches. We present NBR, which converts biomedical RE as natural language inference formulation through indirect supervision. By converting relations to natural language hypotheses, NBR is capable of exploiting semantic cues to alleviate annotation scarcity. By incorporating a ranking-based loss that implicitly calibrates abstinent instances, NBR learns a clearer decision boundary and is instructed to abstain on uncertain instances. Extensive experiments on three widely-used biomedical RE benchmarks, namely ChemProt, DDI and GAD, verify the effectiveness of NBR in both full-set and low-resource regimes. Our analysis demonstrates that indirect supervision benefits biomedical RE even when a domain gap exists, and combining NLI knowledge with biomedical knowledge leads to the best performance gains.
翻译:在生物医学关系提取(RE)方面,两个关键障碍是说明不足,以及由于说明覆盖面低而未事先明确界定标签的情况普遍存在。现有办法将生物医学资源作为多级分类任务处理,往往导致低资源环境的笼统化,无法对未知案例作出有选择的预测,但无法从所看到的关系中猜测,从而妨碍这些办法的适用性。我们提出生物医学资源,通过间接监督将生物医学资源转换为自然语言推论的配方。通过将关系转换为自然语言假设,国家生物科学理事会能够利用语义提示来减轻批注稀缺。通过纳入基于排序的损失,隐含校准禁欲实例,国家生物科学理事会学会了更清晰的决定界限,并指示对不确定的情况不作任何判断。关于三种广泛使用的生物医学资源资源基准,即ChemProt、DDI和GAD的大规模实验,验证了国家生物科学研究所在全套和低资源制度中的有效性。我们的分析表明,即使存在领域差距,间接监督也有利于生物医学资源,将国家科学研究所的知识与生物医学知识与生物医学知识结合起来,从而取得最佳业绩。