Entity synonyms discovery is crucial for entity-leveraging applications. However, existing studies suffer from several critical issues: (1) the input mentions may be out-of-vocabulary (OOV) and may come from a different semantic space of the entities; (2) the connection between mentions and entities may be hidden and cannot be established by surface matching; and (3) some entities rarely appear due to the long-tail effect. To tackle these challenges, we facilitate knowledge graphs and propose a novel entity synonyms discovery framework, named \emph{KGSynNet}. Specifically, we pre-train subword embeddings for mentions and entities using a large-scale domain-specific corpus while learning the knowledge embeddings of entities via a joint TransC-TransE model. More importantly, to obtain a comprehensive representation of entities, we employ a specifically designed \emph{fusion gate} to adaptively absorb the entities' knowledge information into their semantic features. We conduct extensive experiments to demonstrate the effectiveness of our \emph{KGSynNet} in leveraging the knowledge graph. The experimental results show that the \emph{KGSynNet} improves the state-of-the-art methods by 14.7\% in terms of hits@3 in the offline evaluation and outperforms the BERT model by 8.3\% in the positive feedback rate of an online A/B test on the entity linking module of a question answering system.
翻译:实体同名词的发现对于实体杠杆化应用至关重要。然而,现有研究存在几个关键问题:(1) 投入的提及可能来自词汇外(OOOV),可能来自实体的不同语义空间;(2) 提及与实体之间的联系可能隐藏,无法通过表面匹配建立;(3) 一些实体很少出现,原因是长尾效果。为了应对这些挑战,我们为知识图表提供便利,并提议一个名为 emph{KGSynNet}的新实体同名词发现框架。具体地说,我们在通过跨C-TransE联合模型学习实体知识嵌入知识的同时,还可能来自实体的不同语义空间(OOOV);(2) 提及与实体之间的联系可能隐藏起来,无法通过表面匹配而建立;(3) 一些实体很少出现长尾效果效应。为了应对这些挑战,我们进行了广泛的实验,以展示我们emph{KGSynnet}在利用知识图时,使用大规模特定域名词嵌入和实体的子词嵌入;更重要的是,为了获得实体的全面代表性,我们使用专门设计的“emph{vuncil”门将实体的知识信息输入A&B的正标本系统。在数据库中改进了A\\\\\ bestemexemexstal exstal ex astal suder a a suder a suderst a sution a