The development of deep neural networks has improved representation learning in various domains, including textual, graph structural, and relational triple representations. This development opened the door to new relation extraction beyond the traditional text-oriented relation extraction. However, research on the effectiveness of considering multiple heterogeneous domain information simultaneously is still under exploration, and if a model can take an advantage of integrating heterogeneous information, it is expected to exhibit a significant contribution to many problems in the world. This thesis works on Drug-Drug Interactions (DDIs) from the literature as a case study and realizes relation extraction utilizing heterogeneous domain information. First, a deep neural relation extraction model is prepared and its attention mechanism is analyzed. Next, a method to combine the drug molecular structure information and drug description information to the input sentence information is proposed, and the effectiveness of utilizing drug molecular structures and drug descriptions for the relation extraction task is shown. Then, in order to further exploit the heterogeneous information, drug-related items, such as protein entries, medical terms and pathways are collected from multiple existing databases and a new data set in the form of a knowledge graph (KG) is constructed. A link prediction task on the constructed data set is conducted to obtain embedding representations of drugs that contain the heterogeneous domain information. Finally, a method that integrates the input sentence information and the heterogeneous KG information is proposed. The proposed model is trained and evaluated on a widely used data set, and as a result, it is shown that utilizing heterogeneous domain information significantly improves the performance of relation extraction from the literature.
翻译:深层神经网络的发展改善了各个领域的代表性学习,包括文字、图形结构和关系三重表述;这一发展为在传统文本导向关系提取之外进行新的关系提取打开了大门;然而,关于同时考虑多种多元域信息的有效性的研究仍在探索之中,如果模型能够利用综合多种信息,则预计将对世界上许多问题做出重大贡献;关于药物-药物互动的论文,作为案例研究,从文献中收集药物-药物相关物品,并利用多种域信息进行提取;首先,制作了深度神经关系提取模型,并分析了其关注机制;其次,提出了将药物分子结构信息和药物描述信息与投入句信息相结合的方法,并展示了利用药物分子结构和药物描述进行整合的有效性;随后,为了进一步利用多种现有数据库的药物相关物品,如蛋白条目、医学术语和途径,并利用多种域域信息格式的新数据集(KG),从知识图表(KG)中制作了深度神经关系提取的提取模型并分析了其关注度机制;随后,提出了将毒品分子结构信息和药物描述信息整合后,提出了一套混合域数据模型。