Can machines know what twin prime is? From the composition of this phrase, machines may guess twin prime is a certain kind of prime, but it is still difficult to deduce exactly what twin stands for without additional knowledge. Here, twin prime is a jargon - a specialized term used by experts in a particular field. Explaining jargon is challenging since it usually requires domain knowledge to understand. Recently, there is an increasing interest in extracting and generating definitions of words automatically. However, existing approaches, either extraction or generation, perform poorly on jargon. In this paper, we propose to combine extraction and generation for jargon definition modeling: first extract self- and correlative definitional information of target jargon from the Web and then generate the final definitions by incorporating the extracted definitional information. Our framework is remarkably simple but effective: experiments demonstrate our method can generate high-quality definitions for jargon and outperform state-of-the-art models significantly, e.g., BLEU score from 8.76 to 22.66 and human-annotated score from 2.34 to 4.04.
翻译:机器能知道双质元素是什么吗? 从这一短语的构成中, 机器可能猜想双质元素是某种质素, 但是在没有额外知识的情况下, 仍然很难精确地推断出双质元素代表什么。 在这里, 双质元素是一个术语—— 专家在特定领域使用的一个专门术语。 解释术语具有挑战性, 因为它通常需要域知识来理解。 最近, 人们越来越有兴趣自动提取和生成单词定义。 但是, 现有的方法, 无论是提取还是生成, 在行语上表现不佳。 在本文中, 我们提议将精选和生成结合起来, 用于术语定义建模: 首先从网上提取目标行语的自相关定义信息, 然后通过纳入提取的定义信息来生成最终定义。 我们的框架非常简单但有效: 实验表明我们的方法可以产生高质量的词典和超常规的模型定义, 例如, BLEU的评分从8.76到22. 66, 人类评分从2.34到4. 。