Taxonomies are of great value to many knowledge-rich applications. As the manual taxonomy curation costs enormous human effects, automatic taxonomy construction is in great demand. However, most existing automatic taxonomy construction methods can only build hypernymy taxonomies wherein each edge is limited to expressing the "is-a" relation. Such a restriction limits their applicability to more diverse real-world tasks where the parent-child may carry different relations. In this paper, we aim to construct a task-guided taxonomy from a domain-specific corpus and allow users to input a "seed" taxonomy, serving as the task guidance. We propose an expansion-based taxonomy construction framework, namely HiExpan, which automatically generates key term list from the corpus and iteratively grows the seed taxonomy. Specifically, HiExpan views all children under each taxonomy node forming a coherent set and builds the taxonomy by recursively expanding all these sets. Furthermore, HiExpan incorporates a weakly-supervised relation extraction module to extract the initial children of a newly-expanded node and adjusts the taxonomy tree by optimizing its global structure. Our experiments on three real datasets from different domains demonstrate the effectiveness of HiExpan for building task-guided taxonomies.
翻译:对于许多知识丰富的应用来说,分类学对于许多知识丰富的应用具有巨大的价值。由于人工分类法的分类法计算成本巨大的人类效应,自动分类学的建设需求很大。然而,大多数现有自动分类学的构建方法只能建立超统性分类法,其中每个边缘都仅限于表达“是”关系。这种限制限制了这些分类法对更为多样化的现实任务的适用性,因为父母子女可能具有不同的关系。在本文件中,我们的目标是从一个特定领域系统中建立一个任务引导的分类法,允许用户输入一个“种子”分类法,作为任务指南。我们提议一个基于扩展的分类学建设框架,即HiExplan,它自动从文体中生成关键术语列表,并反复发展种子分类学。具体地说,HiExpan观察每个分类法节下的所有儿童形成一个连贯的组合,并通过不断扩展所有这些组合来建立分类学。此外,HiExplainan将一个薄弱的、超强的关系提取的提取模块用于提取新扩展节点的初始子,作为任务指导。我们从三个税制的税制模型中,通过优化全球范围来展示其税制研究。