Taxonomy is formulated as directed acyclic concepts graphs or trees that support many downstream tasks. Many new coming concepts need to be added to an existing taxonomy. The traditional taxonomy expansion task aims only at finding the best position for new coming concepts in the existing taxonomy. However, they have two drawbacks when being applied to the real-scenarios. The previous methods suffer from low-efficiency since they waste much time when most of the new coming concepts are indeed noisy concepts. They also suffer from low-effectiveness since they collect training samples only from the existing taxonomy, which limits the ability of the model to mine more hypernym-hyponym relationships among real concepts. This paper proposes a pluggable framework called Generative Adversarial Network for Taxonomy Entering Evaluation (GANTEE) to alleviate these drawbacks. A generative adversarial network is designed in this framework by discriminative models to alleviate the first drawback and the generative model to alleviate the second drawback. Two discriminators are used in GANTEE to provide long-term and short-term rewards, respectively. Moreover, to further improve the efficiency, pre-trained language models are used to retrieve the representation of the concepts quickly. The experiments on three real-world large-scale datasets with two different languages show that GANTEE improves the performance of the existing taxonomy expansion methods in both effectiveness and efficiency.
翻译:Taxonomy被构建为支持许多下游任务的有向无环概念图或树。许多新概念需要添加到现有的Taxonomy中。传统的Taxonomy扩展任务目的只是在现有的Taxonomy中找到新来概念的最佳位置。但当应用于实际场景时,他们存在两个缺点。以前的方法效率低下,因为当大多数新概念确实是噪声概念时,它们会浪费很多时间。仅从现有的Taxonomy中收集训练样本,这限制了模型挖掘实际概念之间更多上下位关系的能力,因此也存在效率低下的问题。本文提出了一种可插拔的框架,称为Taxonomy插入评估的生成对抗网络(GANTEE),以减轻这些缺点。该框架中设计了一个生成对抗网络,其鉴别模型可减轻第一个缺点,而生成模型可减轻第二个缺点。GANTEE使用两个判别器提供长期和短期奖励。此外,为了进一步提高效率,使用预训练的语言模型快速检索概念的表示。在三个真实的大规模数据集中进行了实验,分别使用两种不同的语言,结果显示GANTEE在效果和性能方面均优于现有的Taxonomy扩展方法。