Biomedical knowledge graphs (KG) are heterogenous networks consisting of biological entities as nodes and relations between them as edges. These entities and relations are extracted from millions of research papers and unified in a single resource. The goal of biomedical multi-hop question-answering over knowledge graph (KGQA) is to help biologist and scientist to get valuable insights by asking questions in natural language. Relevant answers can be found by first understanding the question and then querying the KG for right set of nodes and relationships to arrive at an answer. To model the question, language models such as RoBERTa and BioBERT are used to understand context from natural language question. One of the challenges in KGQA is missing links in the KG. Knowledge graph embeddings (KGE) help to overcome this problem by encoding nodes and edges in a dense and more efficient way. In this paper, we use a publicly available KG called Hetionet which is an integrative network of biomedical knowledge assembled from 29 different databases of genes, compounds, diseases, and more. We have enriched this KG dataset by creating a multi-hop biomedical question-answering dataset in natural language for testing the biomedical multi-hop question-answering system and this dataset will be made available to the research community. The major contribution of this research is an integrated system that combines language models with KG embeddings to give highly relevant answers to free-form questions asked by biologists in an intuitive interface. Biomedical multi-hop question-answering system is tested on this data and results are highly encouraging.
翻译:生物医学知识图(KG)是由生物实体组成的杂交网络,作为节点和它们之间的关系作为边缘。这些实体和关系来自数以百万计的研究论文,并统一在一个资源中。生物医学多点解答知识图(KGQA)的目标是帮助生物学家和科学家通过以自然语言提问获得宝贵的洞见。有关答案可以先理解问题,然后查询KG, 正确的节点和关系组合,以找到答案。模拟问题,RoBERTA和BioBERT等语言模型用来从自然语言问题中理解背景。KGQA的挑战之一是在KG中缺少链接。知识图嵌入(KGE)的目标是帮助生物学家和科学家通过以更密集、更高效的方式将节点和边缘编码来克服这一问题。在本文中,我们使用一个公开的KGG(Hetionetetet),这是从29个不同的基因、化合物、疾病和更多的生物伦理数据库中收集的综合性生物医学知识网络网络。我们通过创建一个免费的医学答案来丰富KGG数据集数据库,这是在生物医学研究中进行一项高层次的、高层次数据解的系统,这是在生物医学研究中将一个可以将一个对生物医学数据解的答案的共融化的系统进行一个共同的答案。