The Interaction between Drugs and Targets (DTI) in human body plays a crucial role in biomedical science and applications. As millions of papers come out every year in the biomedical domain, automatically discovering DTI knowledge from biomedical literature, which are usually triplets about drugs, targets and their interaction, becomes an urgent demand in the industry. Existing methods of discovering biological knowledge are mainly extractive approaches that often require detailed annotations (e.g., all mentions of biological entities, relations between every two entity mentions, etc.). However, it is difficult and costly to obtain sufficient annotations due to the requirement of expert knowledge from biomedical domains. To overcome these difficulties, we explore the first end-to-end solution for this task by using generative approaches. We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations. Further, we propose a semi-supervised method, which leverages the aforementioned end-to-end model to filter unlabeled literature and label them. Experimental results show that our method significantly outperforms extractive baselines on DTI discovery. We also create a dataset, KD-DTI, to advance this task and will release it to the community.
翻译:人体中药物与目标之间的相互作用(DTI)在生物医学科学和应用方面发挥着关键作用。随着生物医学领域每年发表的数以百万计的论文,生物医学文献中自动发现DTI知识(生物医学文献中通常有三重毒品、目标及其相互作用)成为行业的迫切需要。现有的生物知识发现方法主要是往往需要详细说明的采掘方法(例如,所有生物实体都提到生物实体,每两个实体都提到关系等)。然而,由于生物医学领域的专家知识需要,要获得足够的说明既困难又昂贵。为了克服这些困难,我们利用基因化方法探索这项任务的第一个端对端解决办法。我们把DTI三重毒品视为一个序列,使用基于变异器的模型直接生成生物知识,而不用实体和关系的详细说明。此外,我们建议一种半统一的方法,利用上述端对端模型过滤无标签的文献并贴标签。实验结果显示,我们的方法大大超出了DTI发现时的采掘基线。我们还将创建一个数据集、KDTHTI和该社区。