Modern supervised learning neural network models require a large amount of manually labeled data, which makes the construction of domain-specific knowledge graphs time-consuming and labor-intensive. In parallel, although there has been much research on named entity recognition and relation extraction based on distantly supervised learning, constructing a domain-specific knowledge graph from large collections of textual data without manual annotations is still an urgent problem to be solved. In response, we propose an integrated framework for adapting and re-learning knowledge graphs from one coarse domain (biomedical) to a finer-define domain (oncology). In this framework, we apply distant-supervision on cross-domain knowledge graph adaptation. Consequently, no manual data annotation is required to train the model. We introduce a novel iterative training strategy to facilitate the discovery of domain-specific named entities and triples. Experimental results indicate that the proposed framework can perform domain adaptation and construction of knowledge graph efficiently.
翻译:现代监管的学习神经网络模型需要大量手工标签数据,这使得具体领域知识图的构建耗时费时费力。与此同时,尽管在远程监督的学习基础上对名称实体的识别和关系提取进行了大量研究,但从大量没有人工说明的文本数据收集中构建一个特定领域知识图仍然是一个亟待解决的问题。作为回应,我们提议了一个综合框架,用于从一个粗略领域(生物医学)到一个精细-定义域(肿瘤)对知识图进行修改和再学习。在这个框架内,我们应用了跨域知识图调整的远视。因此,不需要人工说明来培训模型。我们引入了一个新的迭代培训战略,以便利发现特定领域实体和三重数据。实验结果表明,拟议的框架可以有效地进行领域调整和构建知识图。</s>