Knowledge Graph has been proven effective in modeling structured information and conceptual knowledge, especially in the medical domain. However, the lack of high-quality annotated corpora remains a crucial problem for advancing the research and applications on this task. In order to accelerate the research for domain-specific knowledge graphs in the medical domain, we introduce DiaKG, a high-quality Chinese dataset for Diabetes knowledge graph, which contains 22,050 entities and 6,890 relations in total. We implement recent typical methods for Named Entity Recognition and Relation Extraction as a benchmark to evaluate the proposed dataset thoroughly. Empirical results show that the DiaKG is challenging for most existing methods and further analysis is conducted to discuss future research direction for improvements. We hope the release of this dataset can assist the construction of diabetes knowledge graphs and facilitate AI-based applications.
翻译:事实证明,知识图在结构化信息和概念知识的建模方面十分有效,特别是在医疗领域,然而,缺乏高质量的附加说明的公司对于推进这一任务的研究和应用仍是一个关键问题。为加快医学领域特定领域知识图的研究,我们引入了中国糖尿病知识图高质量数据集DiaKG,该数据集包含22 050个实体和总共6 890个关系。我们采用最近称为实体识别和联系采掘的典型方法,作为彻底评价拟议数据集的基准。经验性结果表明,DiaKG对大多数现有方法都具有挑战性,并进行了进一步分析,以讨论今后的研究改进方向。我们希望该数据集的发布能够帮助糖尿病知识图的构建,并促进基于AI的应用。